DEV Community

Cover image for Data Engineering Saga
Cris Crawford
Cris Crawford

Posted on

Data Engineering Saga

I'm taking the DataTalksClub's Data Engineering Zoomcamp for the third time. The first time, I got through the whole course, but I had trouble loading the data for the final project. The second time, I quit early on. I had a job and the job was a priority.

What have I learned this time around? It feels like backsliding. Last night I spent three hours asking ChatGPT why I had so many errors trying to run postgres. I didn't tell it that I was supposed to run it in a Docker container because I didn't think of that. Why not? Because I quit the instructional video halfway through and started watching it the next day from where I left off. I forgot that I was supposed to run postgres in a Docker container. I finally posted my question on the Slack channel for the course. Thanks to Michael Shoemaker for pointing out my error so quickly and for encouraging me. I hope I can give something back when I finally get this.

I got past that roadblock. Next I had to load a .csv file into the database. I used a jupyter notebook (for python) with pandas (dataframe) and copied what the instructor did in the video. Success. Then I had to query the database using SQL. I don't use SQL, generally. So I asked ChatGPT how to query the database to answer the homework questions. Was this unfair? I don't think so. I would have had to look up the queries in a book, otherwise. I think it seems unfair because ChatGPT gives an answer right away, and it's specific. I think I'm slightly ahead of the game because I'm familiar with the data (New York Taxi Trip data) and because somewhere in the back of my mind, I remember what I can do with SQL.

I'm posting this for a few reasons. 1. The instructors want me to. I get points for posting this and for mentioning the course. 2. It will keep me accountable. I would like to finish the course this time around. 3. There is a slight chance it will help someone. I have my doubts.
The course is at https://github.com/DataTalksClub/data-engineering-zoomcamp. It started last week, but you can join anytime. It's one of the best courses I've taken.

Top comments (0)