Are you aspiring to become an Analytics Engineer and you do not know how to get started? You can start with Udemy Analytics Engineering Bootcamp like I did too. Click to view course.
Here is a brief documentation of what I learnt. (You need to take the course to have a complete understanding of what data modelling, methodology and technologies is all about)
The methodology used is Kimball's Warehouse Methodology and what is this all about?
Kimball's Warehouse Methodology
Created by Ralph Kimball
Defines data warehouse as a copy of transaction data specifically structured for query and analysis.
Starts with identifying key business process and requirements (Bottom up)
Focus of this approach is to enable business intelligence fast.
Data marts are created first instead of enterprise data warehouse.
Dimensional model - STAR SCHEMA design (deformalized)
The model design is built first on fact and dimension tables.
Process of Kimball Methodology Start Schema
- Facts
- Dimensions
- Follows dimensional modeling technique.
- Enterprise bus matrix is used to document & show how schema is designed.
- Data marts are built with star schema being core element of dimensional model.
- Multiple start schema can exist in same model.
- Conformed data dimensions (shared attributes) are used to create dimensional data warehouse.
The dataset used is the Northwind Dataset from MySQL and here is the ERD Diagram of the dataset.
Data modelling with BigQuery, dbt and GitHub for version control
First, we need to understand the business requirements.
What is it that we are trying to do?
Northwind traders are export import company who trades around the world for specialty foods.
Their existing architecture is mostly a mix of on-premises & legacy systems, including their MySQL database which is primary operational database.
They are struggling to keep up with reporting requirements and causing database to slow down impacting their day-to-day business.
Northwind traders wants to modernize their data & reporting solutions & move away from on-prem.
Why are we doing it?
Northwind traders wants to modernize their existing infrastructure for.
- Better Scalability
- Reduced load on operational system
- Improved reporting speed
- Improved data security
How are we going to achieve it?
Northwind traders will migrate their existing database system to google cloud platform (GCP). BigQuery was selected to run OLAP.
Dimensional Data warehouse will be built on BigQuery to support with reporting requirements.
Next, we define the Business Process
Requirements
Sales Overview
Overall sales reports to understand better our customers what is being sold, what sells the most, where and what sales are the least, the goal is to have a general overview of how the business is going.
Sales Agent Tracking
Track sales & performance of each seller agent to adjust commissions, reward high achievers and empower low achievers.
Product Inventory
Understand the current inventory levels, how to improve stock management, what supplies to we have how much is being purchased. This will allow to understand stock management and potentially broker better deals with suppliers.
Customer Reporting
Allow customers to understand their purchase orders, how much and when they are buying, empowering them to make data-driven decisions and utilize the data to join to their sales data.
To get started we need to set up google account — Click here to set up an account.
Select an existing project if you have one or create a new project if you don’t.
Your project dashboard should look similar to this.
Now we set up a GitHub repository for our project.
You can create a new GitHub account if you don’t have one or login into your existing account — Click here to visit GitHub page.
This is what my GitHub profile looks like
Create a new repository and give it any name of your choice mine would be — analytics-engineering-bootcamp.
keep the project as private for now later on you can make it public, leave every other setting and just click create repository by scrolling down the page.
Click here to learn how to install Linux on windows — Click here.
If you are a MacBook user, click here to install Linux.
Alternatively, you can take the bootcamp course to learn how to install Linux on your preferred OS. - Click here to take the bootcamp course.
My preferred Terminal as a windows user is windows terminal on Microsoft store.
My Linux terminal is opened.
# Create a new folder on your local machine with Command Line optional
# You can use linux to create the folder too
# I am using anlytics-engineering-bootcamp as my folder name
mkdir C:\path\to\your\designated\file\location\anlytics-engineering-bootcamp
# Open the Folder with Ubuntu
cd /mnt/c/path/to/your/folder/analytics-engineering-bootcamp
Copy everything from the (…or create a new repository on the command line) and paste into your terminal.
To input the GitHub password, you will need a token.
Here is how to get your GitHub Token
Click on the far-right icon of your GitHub profile and click settings.
Scroll down to developer’s settings.
Click on any of the token's access but I am using Token classic.
Scroll down and select the necessary settings you might need Choose the repo so you can read and write into your repository.
Your Folder like mine analytics-engineering-bootcamp should look like this containing the ReadMe file in your local machine when opened.
My next blog post we will be working on
- Uploading our dataset into BigQuery
- Bus Matrix for our business process and data dictionary
- Setting up dbt
- Building our dbt models
- Visualizing and creating our report with PowerBI
Top comments (0)