Here's my learning in public post about Google's Looker Studio. This was the final video from the dbt presentation for the data talks club's data engineering zoomcamp. My work is pretty much exactly like what the instructor showed us in the video. I didn't really get it the first time around, so here is an updated version.
I started by going to Google data studio, now called Looker Studio. I created a new data source by clicking the "create" button on the left and selecting "data source". I selected BigQuery. I chose the project "data-engineering-2024", "prod", and "fact-trips", then hit "connect".
First I got a list of columns and their default aggregations. I changed most of these that were "sum" to "none", other than passenger count. We could also add fields here if we know what we want.
Then I clicked on the "create report" button on the top right. I dismissed the popup window and deleted the table that it started with.
Now I added a Time Series chart from the "Add a chart" menu. Then I added the service_type variable to the Breakdown Dimension field in the right-hand menu. There were some out-of-range dates in the data, so I added a date range from the "Add a control" tab and set the date range to January 2019 to December 2020. (I had to click the month and then the year of the date range start and end to get the years menu.)
I added text "Number of trips per day and service type" to the top of the chart by selecting the text icon "A" from the top menu.
Next I added a scorecard chart from the "Add a chart" menu. This represented the total number of trips. I could change this from total number to abbreviated number using the style menu. I labeled it "Total trips recorded".
I also added a pie chart. This showed the overall percentage of green and yellow trips. It was already divided by service type, because the previous graph used that parameter to divide the graph into two. I labeled it "Service type distribution".
Then I added a heatmap table. I went into the left-hand menu and removed the date variables from the dimensions. I left the date range dimension blank and added the pickup_zone as a dimension. This arranged the zones in the chart with the zone on top that had the most pickups, and so on down the chart, with the colors fading towards the bottom. I labeled it "Trips per pickup zone".
The final chart was a stacked bar chart. The instructor wanted it arranged by month, but we didn't have months in the data. We were able to create a month variable. We could have done this right from the start, before we started the display, but we can also do this here. The instructor pulled up a menu, but I was able to select a new data type from the dimension menu. The bottom of the dimension menu has the option "Add field". We also added pickup_year to drill down, and we sorted by month, ascending. For some reason my chart has numbers rather than text of months, but I think for now I'm done. Here a picture of the final result:
Top comments (0)