We all have those projects. you know, the ones that you think "okay this should take me 15 minutes tops" and sure enough its 3 days later and you have had major epiphanies, an idea for a new project that will take easily years, and does not relate much to the original project you were trying to do, and you haven't really even started on that one yet.
Yea, this post is about one of those projects.
The initial project
I coach a teen Bible quiz team, and a big part of our questions are based on unique words in the given book of study.
This year, for the book of John, the listings of unique words was not in chronological order, but alphabetical. I took a day, months ago, and painstakingly hand typed each unique word and reference in to a spreadsheet and then sorted them by alphabetical (but on the reference, not the word) to get them in order. Then i went through and fixed any that still weren't quite right. (For instance, John 2:13 may have 3 unique words. After doing the above, they would still only be listed in alphabetical order by word, but in the right spot overall, between John 2:11 and John 2:13).
A few weeks back I decided I didn't want to go through that again. Besides, it should only take me 15 minutes to create a script that goes through the next years Books and grabs all the unique words, spits them out into a spreadsheet, and even splits them by chapter on different sheets. This was after preliminary searching into python manipulation of spreadsheets.
the problems.
Problem 1
I had to face what I was ignoring, how do I search the given books.
My first choice was web scraping, something im somewhat familiar with.
I faced two problems.
One, I wasn't just dealing with a single book this year, we are using Hebrews, 1 & 2 Peter and Jude. Creating a one go algorithm that goes through each page and figures out what is what and ignores Section titles was going to be a nightmare.
Two, I had to deal with text encodings. A hill I could climb, but not one I wanted to.
Problem 2
It is at this point I remembered XML files exist. Sure enough, I could find an XML file for the NIV translation.
But soon I realized I had a problem here as well. Naturally, the download I found was for the 1984 translation of the NIV, not the 2011 NIV translation, which TBQ uses.
Solution?
I realized the xml file was still my best bet, and started going through Hebrews and comparing to the 2011 text, making any changes that needed to be made to bring this XML file up to date.
But this is when I had an epiphany
Epiphany
I was learning a lot about Hebrews, the NIV translation committee, and words in general. I was having fun, even by the end of Hebrews 1, making all these changes. I should find a way to track all of these, something visual. I guess i'd have to learn python tools for datascience.
And then I realized, Github is perfect for this very thing. Luckily, I was only a chapter in, because I started over. I wanted github to track every single change I made and show a visual for it.
I created a script that went through the entire NIV xml file (now back in it's original state of the 1984 translation) and split it by book. Then I modified the script to go in by book and split that by chapter. My file structure is like this
.
├── New Testament
│ ├── John
│ ├── Luke
│ ├── Mark
│ └── Matthew
│ ├── Matthew 1.xml
│ ├── Matthew 2.xml
│ ├── Matthew 3.xml
│ ├── Matthew 4.xml
│ └── Matthew.xml
├── NIV-1984.xml
├── NIV.xml
├── Old Testament
│ ├── Deuteronomy
│ ├── Exodus
│ ├── Genesis
│ │ ├── Genesis 1.xml
│ │ ├── Genesis 2.xml
│ │ ├── Genesis 3.xml
│ │ ├── Genesis 4.xml
│ │ └── Genesis.xml
│ ├── Leviticus
│ └── Numbers
└── worker.py
Shortening for obvious reasons, but that's the gist. The script would create the folder, and files needed.
Then I went back in and went chapter by chapter through Hebrews, doing a commit for each chapter. Along the way I also made notes in a notebook as to what stood out to me, about the text, about why different changes were made, etc.
Okay, so what's my point
I'm continuing to utilize github for this purpose, although for the time being i've shifted to the NASB, given that they have a new translation coming out in the next year.
I'll be blogging, on my own blog, about the differences I find, and what i notice. I'll be utilizing github's ability to show graphical differences in commits to see, quickly, what has changed. I also plan on then learning Python datascience tools to produce graphs and charts and other stuff diving into the different translations of the Bible.
I don't plan on stopping with the NASB and NIV, but also looking at what are the differences between full translations, not just updated ones. This one will have to utilize github less, because I'm not going through and hand typing all those changes.
I also have really enjoyed the experience overall, as someone who wants to be always studying God's Word. I've noticed similarities between Hebrews and Peter's letters, I've noticed different connections in Hebrews to itself, all because I've been forced to read it and reread it at a much slower pace.
I don't yet have a good spot to get updates on this project. For obvious legal reasons I can't make the github repo public.
About the 15 minute thing
I was partially right, by the way. After updating Hebrews, 1 & 2 Peter and Jude, it did only take me 15 minutes to write the python script that found all the unique words and put them in a spreadsheet by chronological order.
Top comments (0)