DEV Community

# dataengineering

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
Understanding Salesforce Data 360 Objects: The Core of the Unified Customer Profile

Understanding Salesforce Data 360 Objects: The Core of the Unified Customer Profile

Comments
3 min read
Day 12: UDF vs Pandas UDF

Day 12: UDF vs Pandas UDF

Comments
2 min read
Analyzing and Optimizing a Parquet ClickHouse Ingestion Pipeline

Analyzing and Optimizing a Parquet ClickHouse Ingestion Pipeline

2
Comments
3 min read
 Day 2: Data Engineering vs Data Science vs Data Analytics

 Day 2: Data Engineering vs Data Science vs Data Analytics

Comments
2 min read
Day 11: Choosing the Right File Format in Spark

Day 11: Choosing the Right File Format in Spark

Comments
2 min read
Navigating the Future: Key Data Engineering Trends for 2024 and Beyond

Navigating the Future: Key Data Engineering Trends for 2024 and Beyond

Comments
6 min read
Day 7: Mastering Joins, Unions, and GroupBy in PySpark - The Core ETL Operations

Day 7: Mastering Joins, Unions, and GroupBy in PySpark - The Core ETL Operations

Comments
2 min read
Top Open-Source Data Engineering Tools- Unravelling the Best in 2026

Top Open-Source Data Engineering Tools- Unravelling the Best in 2026

Comments
10 min read
map

map

Comments
1 min read
Data Engineering in 30 Days - Day 2

Data Engineering in 30 Days - Day 2

Comments
2 min read
Why Frontend Teams Should Care About Data Modeling for Real-Time Dashboards

Why Frontend Teams Should Care About Data Modeling for Real-Time Dashboards

Comments
2 min read
Refactoring a Mature Airflow Project: A Practical Guide to Scaling from Solo Development to Team Collaboration

Refactoring a Mature Airflow Project: A Practical Guide to Scaling from Solo Development to Team Collaboration

Comments
4 min read
Apache Dev List Digest: Iceberg, Polaris, Arrow & Parquet (Nov 24-Dec 8, 2025)

Apache Dev List Digest: Iceberg, Polaris, Arrow & Parquet (Nov 24-Dec 8, 2025)

Comments
6 min read
How to Guarantee True Ordering in Complex Kafka Replays: Solving the Determinism Nightmare

How to Guarantee True Ordering in Complex Kafka Replays: Solving the Determinism Nightmare

Comments
4 min read
Day 9: Spark SQL Deep Dive - Temp Views, Query Execution & Optimization Tips for Data Engineers

Day 9: Spark SQL Deep Dive - Temp Views, Query Execution & Optimization Tips for Data Engineers

Comments
2 min read
AWSChallenge - Week 2

AWSChallenge - Week 2

Comments
4 min read
Day 10: Partitioning vs Bucketing - The Spark Optimization Guide Every Data Engineer Needs

Day 10: Partitioning vs Bucketing - The Spark Optimization Guide Every Data Engineer Needs

Comments
2 min read
Deepening My Roots in the Data Ecosystem - Choosing Depth Over Breadth

Deepening My Roots in the Data Ecosystem - Choosing Depth Over Breadth

Comments
2 min read
Automate Python Manual Extraction: Build End-to-End PDF -> LLM -> SQL Flows with CocoIndex, Ollama, and Postgres

Automate Python Manual Extraction: Build End-to-End PDF -> LLM -> SQL Flows with CocoIndex, Ollama, and Postgres

Comments
3 min read
DP-600 Fabric Analytics Engineer – Structured Study Notes

DP-600 Fabric Analytics Engineer – Structured Study Notes

Comments
11 min read
The Boring Debug Checklist That Fixes Most “RAG Failures”

The Boring Debug Checklist That Fixes Most “RAG Failures”

Comments
2 min read
Function Calling and Tool Use: Turning LLMs into Action-Taking Agents

Function Calling and Tool Use: Turning LLMs into Action-Taking Agents

Comments
18 min read
dremioframe & iceberg: Pythonic interfaces for Dremio and Apache Iceberg

dremioframe & iceberg: Pythonic interfaces for Dremio and Apache Iceberg

Comments
8 min read
Lightweight big data processing technology

Lightweight big data processing technology

5
Comments
9 min read
SQL: Doing GROUP BY in CsvPath

SQL: Doing GROUP BY in CsvPath

Comments
5 min read
loading...