DEV Community

# dataengineering

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
Building a Medical-Grade Knowledge Graph: Mapping Drug Interactions with Neo4j and LlamaIndex 🩺💻

Building a Medical-Grade Knowledge Graph: Mapping Drug Interactions with Neo4j and LlamaIndex 🩺💻

Comments
3 min read
Build a Local Lead Gen Machine: Scraping Google Maps with n8n (Reliably)

Build a Local Lead Gen Machine: Scraping Google Maps with n8n (Reliably)

Comments
3 min read
Apache Data Lakehouse Weekly: December 30, 2025 – January 5, 2026

Apache Data Lakehouse Weekly: December 30, 2025 – January 5, 2026

Comments
4 min read
RAG Isn’t a Modeling Problem. It’s a Data Engineering Problem.

RAG Isn’t a Modeling Problem. It’s a Data Engineering Problem.

Comments
6 min read
TDD for dbt: unit testing the way it should be

TDD for dbt: unit testing the way it should be

Comments
12 min read
Building a Modern Data Platform — Dagster - Dbt - Iceberg

Building a Modern Data Platform — Dagster - Dbt - Iceberg

Comments
3 min read
Rewriting My Apache Airflow PR: When Your First Solution Isn't the Right One

Rewriting My Apache Airflow PR: When Your First Solution Isn't the Right One

Comments
6 min read
Marmot: Data catalog without the complex infrastructure

Marmot: Data catalog without the complex infrastructure

1
Comments
3 min read
The 5 things we broke building our first major ML pipeline at Besttech (and how we fixed them).

The 5 things we broke building our first major ML pipeline at Besttech (and how we fixed them).

Comments
3 min read
When models suggest deprecated Pandas APIs: a small mistake that cascades

When models suggest deprecated Pandas APIs: a small mistake that cascades

Comments
3 min read
The Gaming Analytics Tech Stack: From Ingestion to Insights

The Gaming Analytics Tech Stack: From Ingestion to Insights

Comments
4 min read
When code-gen suggests deprecated Pandas APIs: a case study in subtle breakage

When code-gen suggests deprecated Pandas APIs: a case study in subtle breakage

Comments
3 min read
Making Data Workflows Work: AI-Driven Automation for Reliable Enterprise Pipelines

Making Data Workflows Work: AI-Driven Automation for Reliable Enterprise Pipelines

Comments
2 min read
Organizing Architecture Patterns for Triggering AWS Glue Python Shell with S3 Events

Organizing Architecture Patterns for Triggering AWS Glue Python Shell with S3 Events

4
Comments
6 min read
CSV Processing Gotchas: Don’t Let Invalid Data Slip Through the Cracks!!!

CSV Processing Gotchas: Don’t Let Invalid Data Slip Through the Cracks!!!

Comments
1 min read
Streaming SQL Engine: Lightweight Cross-Data Source Integration for Resource-Constrained Environments.

Streaming SQL Engine: Lightweight Cross-Data Source Integration for Resource-Constrained Environments.

10
Comments
1 min read
Stop Manually Tracing Azure Synapse Dependencies

Stop Manually Tracing Azure Synapse Dependencies

Comments
1 min read
Building Pangolin: My Holiday Break, an AI IDE, and a Lakehouse Catalog for the Curious

Building Pangolin: My Holiday Break, an AI IDE, and a Lakehouse Catalog for the Curious

Comments
6 min read
Part 8: Databricks Pipeline & Dashboard

Part 8: Databricks Pipeline & Dashboard

Comments
2 min read
Part 4: Building the Bronze Layer with Auto Loader and Delta Lake

Part 4: Building the Bronze Layer with Auto Loader and Delta Lake

Comments
2 min read
End-to-End Real-Time Data Engineering on Databricks Using Spark Structured Streaming and Delta Lake

End-to-End Real-Time Data Engineering on Databricks Using Spark Structured Streaming and Delta Lake

Comments
1 min read
Part 1: Creating Databricks Workspace and Enabling Unity Catalog

Part 1: Creating Databricks Workspace and Enabling Unity Catalog

Comments
2 min read
Part 5: Building a ZIP Code Dimension Table

Part 5: Building a ZIP Code Dimension Table

Comments
2 min read
Part 2: Project Architecture

Part 2: Project Architecture

Comments
2 min read
Building Bulletproof Data Pipelines: Orchestration, Testing, and Monitoring (Part 3 of 3)

Building Bulletproof Data Pipelines: Orchestration, Testing, and Monitoring (Part 3 of 3)

Comments
8 min read
loading...