In this post I’ll talk about my experience building a command-line interface application with Ruby; generating a gem directory, scraping the data with Nokogiri, and putting together the user interface.
Draw Your Own Map
After several weeks of gradually levelling up my Ruby-wielding skills through Flatiron School, this was my first solo project --a command-line interface application. In the exercises leading up to it, there had been tests to pass or fail, to tell me when I was indeed headed in the right direction. But this time my only guides were my vision of the end result and these project requirements:
- Provide a CLI.
- CLI must provide access to data from a webpage.
- The data provided must go at least one level deep, generally by showing the user a list of available data and then being able to drill down into a specific item.
- Use good object-oriented design patterns.
Finding North
Before I could write a line of code, I needed to pick a website to scrape. This would dictate the kind of data my application would provide access to i.e., its purpose. And, I needed to choose something I could scrape from without too much difficulty.
Thankfully, I had a website in my back pocket that I suspected would work just fine. The Hack Design website provides lessons about design in various categories. Its pages lent themselves to the one layer deep model that was required. And they were rendered via static HTML putting them just in the range of difficulty I was hoping for. Being able to access these lessons from the minimal environment of the command-line struck me as a cool idea. So after a cursory assessment of whether I’d be able to scrape the content I needed, I decided to go for it.
Set Up
I knew at the beginning that I wanted to be able to wrap my application in the self-contained, distributable format of a gem. Bundler made it easy to get started with a scaffold directory.
The first time you run the bundle gem
command, you have the option of including a CODE_OF_CONDUCT.md
and LICENSE.txt
. I chose the license. I updated my .gemspec
file with details about my gem: a short summary of what my gem would do and a slightly more detailed description of the same. As I nudged its functionality forward, I added a short list of other gems it would depend on to function.
A View of the Destination
I started by writing code for a user interface that would mimic the intended behaviour of my application. The user would be greeted by a welcome message and options to: view categories, view lessons, view a random lesson, or quit. I didn’t have any categories or lessons yet so, I threw in some fillers to start. This code lives in lib/hack_design/cli.rb
. I would call this code in an executable file called bin/hack-design
and run my program.
I didn’t have any tests but, I knew what I expected my code to do. Debugging was a matter of trial-and-error-ing my way to success. This technique would take me through the development of my app.
Fits and Starts
I intended to make the data two layers deep --providing content (including exercises) from within a lesson, and lessons from within a category. Taking my cue from the UI I had built, I created classes to model a Category
, a Lesson
, an Exercise
, and a Scraper
. I used ./bin/console
to test drive these classes by looking for expected behaviour. However, as I set out to teach my Scraper
how to find and gather categories, life threw me a few curve balls which effectively stalled my progress for about 2 weeks.
When I returned, I was anxious about the time I had lost. Closer investigation of the data from my Scraper
soon revealed that scraping each category would prove to be more difficult that I first imagined. So, I thought about it. First and foremost, I wanted to provide users with lesson content. Was listing categories really that essential to the purpose of the application? I decided not, and began to simplify.
I threw out the Category
and Exercise
classes. And I was left with what I needed: a Lesson
class whose objects would organize data about each lesson, a Scraper
to gather this data from the website, and a CLI
to manage the user interface. I refactored the UI to reflect the changes. It would list lessons to choose from, not categories. And, the data was now just one layer deep.
Keep Your Data Sources Close
With my UI was up and running, it was time to supply it with real data. I knew it would take a few tries to get the search queries right. From my previous attempt at scraping, I also knew that too many queries would very likely get me blocked. I decided to solve this problem by copying the pages I’d be scraping to a fixtures folder. This would allow me to keep my requests to the live site to a minimum.
The way Hack Design is set up, all 51 lessons can be found listed on a lesson homepage. From that page, each lesson is links to its own page. Copying down the source code of the homepage was simply accomplished by using cURL to get the HTML and shovelling it into a new file.
curl http://hackdesign.org/lessons/ >> fixtures/site/lessons.html
The other 51 pages however, would pose a greater challenge. No way was I going to navigate to each page and copy down hundreds of lines of HTML for 51 individual files. I wrote a little Bash script to do it instead.
#!/bin/bash
#get-lesson.sh
typeset -i - END
let END=50 i=0
while ((i<=END)); do
echo “Script starting now…”
curl https://hackdesign.org/lessons/$i -O
let i++
done
echo “Done”
Scraping Forward with Nokogiri
Before long, I had all the files I needed to get to scraping. For this task, I used an HTML Parser called Nokogiri. It has the ability to search documents using CSS. I used CSS selectors to zero in on the HTML elements containing the data I wanted. In keeping with the setup of the Hack Design website, the Scraper
has two methods.
::scrape_lessons_page
scrapes the lesson homepage. creates a hash for each lesson, adds the lesson title and url to that hash, then adds that hash to an array of all the lessons.
::scrape_lesson
scrapes a lesson's content page. It adds the instructor name and other content for a particular lesson to a hash and returns the result.
Homecoming: Return to the CLI
The CLI
is where the Scraper
and Lesson
classes join forces to produce Lesson
objects that model individual lessons filled with content. Here, hashes act as the glue between the Scraper
and Lesson
classes, carrying data from the Scraper
to the Lesson
, allowing them to work together while remaining independent and focused in purpose.
A title and a URL are all I need to create a Lesson
object. I created 51 Lesson
objects by iterating over the array of hashes from ::scrape_lessons_page
. Then I passed the URL of each Lesson
object to ::scrape_lesson
which returned a hash of data I could add to my ready-made Lesson
objects. This process went smoothly until… it didn’t.
One of the pages was breaking my Scraper
. Using Pry and some temporary tweaks to my code, I was able to track down the page that was causing trouble. Turns out lesson 41 has a list within one of its exercises. However, my Scraper
was identifying exercises as list items. On discovering this list within a list, it would class the list item as another exercise and then crash when it didn't find the typical contents of an exercise inside. It needed a way to differentiate between an exercise and its content. To do this, I made the CSS selectors a little more specific. I had fun debugging this one.
51 completed Lesson
objects made filling out the user interface wonderfully straightforward. I could just loop through all the Lesson
objects and display the information they contained accordingly. I added a few methods to enable navigation from one lesson’s content to another without returning to the list, and added a bit of color with the colorize gem. And then, I was done. I had built a Ruby command-line-application!
To wrap up, I packaged, installed, and tested my gem. Bundler makes packaging and releasing your gem as easy as it does getting started. I chose not to release my gem but, I have included instructions on how to package and install it locally. Check out my code, here.
Fond Farewell
Completing this project was an exercise in confronting fear of failure. Every step was absolutely worth it. I am proud of this simpler, final product and really excited about what’s next.
Thanks for taking the time to read this post. See you in the next one!
Cover Art: Mayumi Matsumoto
Top comments (9)
Nice writeup. Ruby's the perfect language for these sorts of projects.
Ahh, I remember my first project with Flatiron.
The sinking feeling when I realized I was on my own for the first time, the determination when I realized that if I didn't take control it would not be happening, and the triumph when I finally had it all wrapped up. Here's the blog post where I shared my feelings about the project: blog.yechiel.me/gem-install-swim-8...
Good luck with the rest of the program!
I like gli
Cool thanks
you might like: piotrmurach.github.io/tty/ && github.com/sparklemotion/mechanize
Best pic ever
Thanks for sharing
Awesome write up. Anyway, are you a Ghanaian?
This is beautiful! Thanks for sharing :)