Hi everyone,
I am a little bit obsessed with data engineering and lately I have been working on several open source projects about this topic, here is a list of repositories and technologies used in each one, if you decide to go deeper into this funny world then these repositories could help you as a guide.
❤ means "I like this one"
❤ Tracking your Uber Rides and Uber Eats expenses through a data engineering process
Technologies and skills:
Python, Docker, Apache Airflow, AWS Redshift, Power BI, data modelling, Task schedulling, ETL and ELT processes, Data warehousing, Cloud
❤ Scheduling Big Data Workloads and Data Pipelines in the Cloud with pyDag
Technologies and skills:
Python, Docker, Big Data, Cloud, BigQuery, Workflow Engines, GCP, Task scheduler, Google Cloud Platform, Dataproc cluster, GCS, Google Cloud Storage, Redis, DAG, Parallel Processing, Apache Spark
❤ Building Big Data Pipelines in the Cloud with AWS EMR
Technologies and skills:
Python, PySpark, AWS EMR, Task Schedulling, IAC, EC2 Instances, Apache Spark, Cloud
❤ Building a Lossless Data Compression and Data Decompression Pipeline
Technologies and skills:
Python, Data compression, BZIP2, Parallel programming
Learn how to dockerize an Apache Spark Standalone Cluster
Technologies and skills:
Python, Jupyter Notebook, Apache Spark, Docker, docker-compose, Hive
❤ Dockerizing and Consuming an Apache Livy environment
Technologies and skills:
Python, Big Data, Docker, docker-compose, Apache Livy, Apache Spark, PostgreSQL, PySpark, Jupyter Notebook
❤ Design, Development and Deployment of a simple Data Pipeline
Technologies and skills:
Python, data Modelling, Docker, docker-compose, PostgreSQL, data pipeline, FastApi
Dockerizing a Python Script for Faster Web Scraping
Technologies and skills:
Python, Docker, Sqlite, Dockerfile, Web scraping, Data pipeline, FastApi
Understanding Similarity Measures for Text Analysis
Technologies and skills:
Python, Machine Learning, Similarity measures, Distance metrics, Text Analysis
❤ Learn how to build a content-based Movie Recommender System
Technologies and skills:
Python, Machine Learning, TF-IDF, Cosine similarity, BM25, BERT, NLP, word2vec, Text Analysis, recsys
A Text Analysis of Speeches
Technologies and skills:
Python, Machine Learning, NLP, word2vec, Text Analysis, Sentiment Analysis, PCA, t-SNE, Word Embeddings, Text Preprocessing, Web scraping, Data Visualization, Mexico
❤ Dropout Students Prediction
Technologies and skills:
R, Genetic algorithm, Neural Networks, K-Means, Clustering, Machine Learning
Top comments (2)
thanks for sharing <3
amazing post! It helps me a lot