DEV Community

# dataengineering

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
Day 16: Delta Lake Explained - How Spark Finally Became Reliable for Production ETL

Day 16: Delta Lake Explained - How Spark Finally Became Reliable for Production ETL

Comments
2 min read
Migrate the legacy Greenplum to Apache Cloudberry with cbcopy

Migrate the legacy Greenplum to Apache Cloudberry with cbcopy

Comments
7 min read
Apache Dev List Digest: Iceberg, Polaris, Arrow & Parquet (Dec 9th - Dec15th, 2025)

Apache Dev List Digest: Iceberg, Polaris, Arrow & Parquet (Dec 9th - Dec15th, 2025)

Comments
7 min read
Building a dbt-UI I Wish Existed

Building a dbt-UI I Wish Existed

Comments
3 min read
Unpacking the Google File System Paper: A Simple Breakdown

Unpacking the Google File System Paper: A Simple Breakdown

Comments
3 min read
Day 15: Running Spark in the Cloud - Dataproc vs Databricks

Day 15: Running Spark in the Cloud - Dataproc vs Databricks

Comments
2 min read
How to Sync Data from an Oracle Table to Elasticsearch using Kafka Connect

How to Sync Data from an Oracle Table to Elasticsearch using Kafka Connect

1
Comments 1
5 min read
The Myth of Distributed Computing as a Silver Bullet for Big Data

The Myth of Distributed Computing as a Silver Bullet for Big Data

5
Comments
10 min read
Rethinking Stream-Batch Unification: Real-Time Processing with Incremental Materialized Views in Apache Cloudberry

Rethinking Stream-Batch Unification: Real-Time Processing with Incremental Materialized Views in Apache Cloudberry

Comments
5 min read
Data Engineering Processes: From Raw Data to Cleaned, Processed, Analytics-Ready Data.

Data Engineering Processes: From Raw Data to Cleaned, Processed, Analytics-Ready Data.

Comments
5 min read
Navigating the Future: Top Data Engineering Trends Shaping 2024 and Beyond

Navigating the Future: Top Data Engineering Trends Shaping 2024 and Beyond

Comments
4 min read
Day 14: Building a Real Retail Analytics Pipeline Using Spark Window Functions

Day 14: Building a Real Retail Analytics Pipeline Using Spark Window Functions

Comments
1 min read
Day 13: Window Functions in PySpark

Day 13: Window Functions in PySpark

Comments
2 min read
Why Idempotency Is So Important in Data Engineering

Why Idempotency Is So Important in Data Engineering

Comments
6 min read
REST API Calls for Data Engineers: A Practical Guide with Examples

REST API Calls for Data Engineers: A Practical Guide with Examples

Comments
3 min read
Is CsvPath an easy or hard language?

Is CsvPath an easy or hard language?

Comments
16 min read
Understanding Salesforce Data 360 Objects: The Core of the Unified Customer Profile

Understanding Salesforce Data 360 Objects: The Core of the Unified Customer Profile

Comments
3 min read
Day 12: UDF vs Pandas UDF

Day 12: UDF vs Pandas UDF

Comments
2 min read
The Data Engineers Descent Into Datetime Hell

The Data Engineers Descent Into Datetime Hell

1
Comments
5 min read
Day 11: Choosing the Right File Format in Spark

Day 11: Choosing the Right File Format in Spark

Comments
2 min read
Navigating the Future: Key Data Engineering Trends for 2024 and Beyond

Navigating the Future: Key Data Engineering Trends for 2024 and Beyond

Comments
6 min read
Day 7: Mastering Joins, Unions, and GroupBy in PySpark - The Core ETL Operations

Day 7: Mastering Joins, Unions, and GroupBy in PySpark - The Core ETL Operations

Comments
2 min read
map

map

Comments
1 min read
Data Engineering in 30 Days - Day 2

Data Engineering in 30 Days - Day 2

Comments
2 min read
Why Frontend Teams Should Care About Data Modeling for Real-Time Dashboards

Why Frontend Teams Should Care About Data Modeling for Real-Time Dashboards

Comments
2 min read
loading...