Henlo!
I'm an I.T. student and I'd like to work as a data engineer but I'm like a fish lost in an ocean of big data tools.
First of, I've got a strong Web background, mainly doing back-end stuff such as building and deploying kind of micro-services around the internet. But what I like most is to work with data, Big Data.
But I don't know where to start. Today I'm quite confident with Apache Beam, SQL/NoSQL, Messaging Queues, Cloud solutions... but I feel like it's nothing compared to the great diversity of Big Data tools.
Should I go for Open-Source stuff such as Kafka, Cassandra, HDFS etc, or should I focus on the Cloud side (Cloud Dataflow, AWS EMR, Pub/Sub, Kinesis...) ?
I'd appreciate any help ;)
Top comments (1)
Try to setup your first hadoop cluster(powered by azure/aws), then use clustered database(hive or another) for your regular tasks, then you'll get the basics of big data tools