Path to become a junior+ data engineer?

#help #distributedsystems #spark #hadoop

Henlo!

I'm an I.T. student and I'd like to work as a data engineer but I'm like a fish lost in an ocean of big data tools.

First of, I've got a strong Web background, mainly doing back-end stuff such as building and deploying kind of micro-services around the internet. But what I like most is to work with data, Big Data.

But I don't know where to start. Today I'm quite confident with Apache Beam, SQL/NoSQL, Messaging Queues, Cloud solutions... but I feel like it's nothing compared to the great diversity of Big Data tools.

Should I go for Open-Source stuff such as Kafka, Cassandra, HDFS etc, or should I focus on the Cloud side (Cloud Dataflow, AWS EMR, Pub/Sub, Kinesis...) ?

I'd appreciate any help ;)

Top comments (1)

Dmitry • Oct 13 '19

Try to setup your first hadoop cluster(powered by azure/aws), then use clustered database(hive or another) for your regular tasks, then you'll get the basics of big data tools

DEV Community

Path to become a junior+ data engineer?

Top comments (1)

Read next

How to Configure a Remote Data Store for Prometheus

Django, Flask, FastAPI, and More: Choosing the Right Python Framework for Your Project

Winter Solstice

WIP Notes working though Render hosting Flask + Vite + React + Wouter