DEV Community

Allan Ramirez
Allan Ramirez

Posted on

My first Steps in the world of Data Engineering

Data is the new oil they say....

Image description

Table Of Contents

  • Introduction
  • Data Engineer, What is it?
  • Pipelines in Focus
  • ETL VS LTE
  • Data Lake vs Data warehouse
  • Reflection
  • What's Next?

Introduction

For my Capstone,I was given the opportunity from my Fellowship to explore a branch that not normally from our curriculum, Data Engineering. Since I am in a ever lasting journey of self knowledge, and figuring out who am i as a person, I knew I had to take this chance to explore my options, and I was up for the task!

Image description


Data Engineer, What is it?

Image description

Okay, so what exactly is a data engineer? In a nutshell from my understanding, we the architects and builders of the data world. We design, construct, and maintain the pipelines that transform raw data into valuable insights. Think of us like the plumbers of the digital age, but instead of pipes and water, we work with data streams and information flow.

So far, I've learned that data engineering isn't just about technical skills. It's about understanding the bigger picture of how data is used to drive decisions and solve problems. We're the bridge between the raw information and the people who need it, and that's pretty cool.


Pipeline in Focus

Image description

One of the most fundamental concepts I've encountered is the data pipeline. it's like an assembly line, but for data. Raw data comes in, gets cleaned, transformed, and then loaded into a place where it can be analyzed.

There are two main types of pipelines:

-ETL (Extract, Transform, Load): This is the traditional approach where data is extracted from its source, transformed into a usable format, and then loaded into a data warehouse.

  • ELT (Extract, Load, Transform):This is a newer approach where data is extracted and loaded into a data lake first, and then transformed as needed.

Understanding how pipelines work and the differences between ETL and ELT is essential for any aspiring data engineer. It's the foundation for building efficient and reliable systems that can handle large volumes of data.


Data Lake vs Data Warehouse

Image description

Now, let's talk about the difference between a data lake and a data warehouse. It's like comparing a vast, wild lake to a well-organized warehouse.

  • Data Lake: This is where you dump all your raw, unprocessed data, regardless of its format or structure. It's a treasure trove of potential insights waiting to be discovered. Think of it as a giant sandbox where you can experiment and play with data before deciding what to do with it.

-Data Warehouse: This is a more structured environment where data is cleaned, organized, and optimized for analysis. It's like a carefully curated library where you can easily find the information you need.

Both data lakes and data warehouses have their advantages. Data lakes are great for storing massive amounts of diverse data, while data warehouses are better for quick and easy analysis. The choice depends on your specific needs and use cases.

Reflection

So far, That what I learn in my week!

I have definitely learned a-lot about the power that data has and how it is managed! I can not wait to continue this path that I am going and I make sure to keep you guys Updated ! Thank you so much for reading!

Stay learning!

Image description

Top comments (0)