Setting up a data engineering environment
When diving into the data engineering world, it is important for the set up to be ready.
To set up a data engineering environment it is advisable to work on Linux Environment.
To start with we started by using linux as a host machine.
Then we installed Virtual machine VMware. Then on the Vmware we installed the downloaded Windows operating system.
Now in order to make communication between host(Ubuntu) and virtual machine(Windows) we need to make configurations and networking on Vmware.You need to enable VNC connections as shown below.
Installation of MSSQL Server.
In this guide we walk you through the MSSQL Server installation in Windows on VMware.
The first step is to download SQL SERVER 2012. In this case it's the one we are using currently.
You can download it from here or download a later version. Then follow the instructions to install it,
then launch the SSMS(Sql Server Management Studio).The UI should be something like this:
Installation of Postgres Database on Ubuntu 20.04.
We will now install Postgres in our Ubuntu. To install this follow the installation steps here.
For Postgres configuration steps:
To check for Postgres version
psql --version
It is advisable to check if the Postgres is running using this command before you proceed
sudo systemctl status postgresql
Connection To PostgreSQL running.
sudo -u postgres psql
Installation of AdventureWorks2019 and AdventureWorksDW2019
First to start using the databases we start with installing Adventureworks2019 and DW.
To install follow the steps found here. Once installed we need to configure them in MSSQL Server on a virtual machine using these steps here.
After installation and configuration open SSMS and this should be the UI in SSMS:
3. Introduction toTalend OpenStudio
It is important to start with low code ETL tools. In this article I will show how to install and get started with Talend. We will use Talend later to Migrate Data from MSSQL Server to Postgres.
To get started we download Talend open Studio for linux here.After downloading you will receive multiple files,run the linux file which has a .sh file and launch talend. This is how it should open:
Migrating Data from SQL Server to Postgres Using Talend ETL
At the Talend Studio we create a job then create tBDInput for SQL Server and tBDOutput for Postgres. At this point, you can create a connection between the two databases.
Right Click on tBDInput Trigger the On subjob OK and drag it to tBDOutput to make the connection complete.
Ensure to enter correctly the database credentials i.e username, password and database name ,table name ,which hosts the table or database to migrate.
After all the configuring and connecting both databases it's time now to start moving our data from Sql server to postgres. We write an Sql query to migrate the data.To start with we start with SQL server by using the following command which gets data from users table:
“select userid, user_name, user_email,convert(varchar, getdate(), 0)run_date_time from users"
This is used for migration and logging using the get_date function.
Now we now have to write an SQL query to insert data into postgres.In the UI we create a table and drop it if it exists then perform the action of insertion to custdatademo table as shown;
Now it's time where we have been waiting for so long and it's time to see our output in Postgres. Let's dive in fast and confirm our hard work.
As you can see we have migrated the data successfully. Thank you for being with us in this long journey.
This is a complete Extract, Transform and Load(ETL) using Talend.
Nicholas Data Eng Tips: Nicholas
Thanks: Christopher
Respects: Neville
Top comments (0)