Programming is a superpower. With just a little imagination, programmers can do things that would be expensive or inaccessible to others. I had used it to build more or less useful things and had a chance to use it again recently when our family and I decided to get a boat. As this was our first boat, we had no experience buying one and didn't know the market. To make up for this, I turned to software.
The idea
Even though I rented a boat a few times, I had no idea how to buy one. I decided that the best way to learn would be to look at many boats to see what's available and at what price.
Finding different dealerships and checking their inventory was one option. I dismissed it almost immediately because it was time-consuming and tedious work, the inventory was limited, and the prices seemed inflated. Craigslist seemed like a much better option. People and dealerships post their ads daily, so there is a continuous supply of diverse inventory. The downside was that many posts were low-quality or uninteresting from my perspective.
So, I came up with this idea: build an application that pulls Craigslist posts, filters out ones that are clearly uninteresting, and allows curating the remaining ones. Curated posts shouldn't show up again, even if reposted or updated. Downloaded posts should be accessible even if the original post on Craigslist was deleted. With an application like this, I could see similar boats, learn about different models and equipment, and compare prices.
Technical overview
Once I had a general idea of what to build, I started thinking about the implementation. I needed to pull posts on a regular cadence - e.g., daily. This could be easily implemented by using cron. It required a server to run, but fortunately, I already had a Raspberry Pi that powers a few small tasks around my house, like allowing me to open my garage door from my phone. The same Raspberry Pi could host my new application.
I needed a database to store posts and curation results. At first, I considered MySQL or PostgreSQL, but given how simple my requirements were, I realized that SQLite would be more than enough.
I decided to run my application in a docker container to make it easy to manage. I separated the data from the application by storing the SQLite DB file on a separate Docker volume. This way, I could easily maintain historical data when updating the application: I would simply spin a container with the new version and mount the volume with the DB file.
Here is a diagram depicting the architecture of my solution:
I implemented the script to pull the post and the application using TypeScript and used NodeJS to run them.
Nitty-gritty details
Pulling the posts
Pulling the posts turned out to be more difficult than I anticipated. I knew Craigslist killed the API long ago, but I thought the posts could be fetched by sending a simple HTTP request. At first, I tried using the python-craigslist wrapper, but I couldn't make it work. After some investigation, it turned out that getting the post gallery with an HTTP request wouldn't work. The gallery worked by downloading a few JavaScript files that fetched additional information to dynamically build the DOM. As I didn't want to give up on my idea, I figured I could use the Puppeteer (headless Chrome browser) to download the post gallery (individual posts could still be downloaded with fetch). This got the job done but required writing more code than I had planned for. The Puppeteer-based solution was also slow (especially on an older Raspberry Pi). It didn't matter too much, though - the script that used it was executed by cron every night at 2 a.m. and ran in the background for at most a couple of minutes.
While using Puppeteer on my Macbook just worked, making it work on a Raspberry Pi took some effort. I wrote a separate post about this because this is a longer story.
I thought more people might be interested in accessing Craigslist posts programmatically, so I made my solution open-source and published it as an npm package.
Note: I checked recently, and it is now possible to get the Craigslist post gallery again with a simple HTTP request. From that perspective, using Puppeteer is no longer needed. The good news is that the document structure didn't change, and my npm package continues to work.
Post curation
Web application
I needed to build an application to present results and allow manual curation of posts. I used the Express framework to do this. Even though it was the first time I had ever used it, the online tutorials and resources made it easy. To say that the user interface was simple would be an understatement - it was bare. Nevertheless, it had all the functionality I needed. Here is how it looks:
- No https, as there is no need for it. The application only runs on my local network.
- The application is running on my Raspberry Pi.
- The number of reposts and updates.
- Asking price.
- Price range.
- The picture of the boat. It let me tell quickly if this was the kind of boat I wanted. The image is also a link to the Craigslist post.
- The curation button. Clicking it will hide the post and add it to the list of uninteresting posts.
With this setup, I believed it was only a matter of time before I spotted the right opportunity. And sure enough, after a few months, I found a boat I loved and eventually bought.
Learnings
In my day job, I work on backend systems that process large volumes of data. I spend a lot of time on system design, code mostly in C++, and ensure our services run smoothly. This side project allowed me to learn about technologies I had rarely, if ever, used. I shared what I learned along the way. Here is a list
- the first prototype of using Puppeteer to read Craigslist posts
- the craigslist-automation npm package and its code
- the journey to run Puppeteer in a Docker container on Raspberry Pi
- my attempt to avoid running the application inside a Docker container as the
root
Finally, the code that ties everything together: https://github.com/moozzyk/boat-scraper
💙 If you liked this article...
I publish a weekly newsletter for software engineers who want to grow their careers. I share mistakes I’ve made and lessons I’ve learned over the past 20 years as a software engineer.
Sign up here to get my articles delivered to your inbox.
Top comments (13)
I thought that javascript helped you to make money to buy it :) . that sounds more interesting, anyway good job what you did should help others with enough money to find boats
C++ paid for it :D
I was also waiting for the "And then I added XYZ as a paid feature and made $10000000000000!" 😂
This is so dope! Looks like a great boat too.
Really cool that you made this free and open source! live near a lake and Craigslist is popping in my area... hmmm, should I buy a boat? 🤔
Thanks!
Nice inspirational story for developing.
But I think it was a whole tech stack that helped with getting the boat.
I enjoyed reading. Thanks
Thanks!
they had us in the first half, not gonna lie.
Lol really interesting story of how you purchased your boat.
Could you share the code so that others can also purchase their dream boat.
I made the repo public: github.com/moozzyk/boat-scraper
Nice story, nice depth of details :)
Thanks!
I wonder if you could have just run the application in the background of your pc when your pc is open instead of dealing with Raspberry
In theory this could work but I don't think it would be reliable in practice. I would be afraid of conflicts with my dev environment. It would be annoying to constantly think about this application running in the background - what if I didn't open my laptop for a few days and got no new posts? How should I configure pulling the posts so that it is simple and works reliably? How would I make sure that my daily activities don't break the app? I don't have any of these problems with a dedicated server. I don't mind dealing with Raspberry Pi too much except for the fact it is a bit sluggish. I am sure I learned more than I would have if I ran it on a mini PC home server (which is probably the best option).
Some comments may only be visible to logged-in visitors. Sign in to view all comments.