DEV Community

# dataengineering

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
Data Quality at Scale: Why Your Pipeline Needs More Than Green Checkmarks

Data Quality at Scale: Why Your Pipeline Needs More Than Green Checkmarks

Comments
8 min read
Apache Dev List Digest: Iceberg, Polaris, Arrow & Parquet (Nov 18–24, 2025)

Apache Dev List Digest: Iceberg, Polaris, Arrow & Parquet (Nov 18–24, 2025)

Comments
5 min read
While We're Measuring Developer Productivity, Won't Someone Think of the Data Engineers?

While We're Measuring Developer Productivity, Won't Someone Think of the Data Engineers?

Comments
9 min read
Building a Real-Time Data Lake on AWS: S3, Glue, and Athena in Production

Building a Real-Time Data Lake on AWS: S3, Glue, and Athena in Production

1
Comments
5 min read
Embeddings and Vector Similarity: How Machines Understand Meaning

Embeddings and Vector Similarity: How Machines Understand Meaning

Comments
19 min read
From Raw to Refined: Data Pipeline Architecture at Scale

From Raw to Refined: Data Pipeline Architecture at Scale

Comments
12 min read
Agent Cost Optimization: A Data Engineer's Guide

Agent Cost Optimization: A Data Engineer's Guide

Comments
13 min read
INTRODUCTION TO DBT(Data Build Tool)

INTRODUCTION TO DBT(Data Build Tool)

1
Comments
2 min read
Breaking Into Gaming Analytics: From 1 Billion Mobile Users to 5B Daily Events

Breaking Into Gaming Analytics: From 1 Billion Mobile Users to 5B Daily Events

Comments 1
6 min read
What to use for data preparation in report, query or analysis business?

What to use for data preparation in report, query or analysis business?

5
Comments
10 min read
Optimizing Data Processing on AWS with Data Compaction

Optimizing Data Processing on AWS with Data Compaction

2
Comments
7 min read
Taming the Data Beast: Build Pipelines That Bend, Not Break by Arvind Sundararajan

Taming the Data Beast: Build Pipelines That Bend, Not Break by Arvind Sundararajan

Comments
2 min read
Designing a Cost-Efficient Parallel Data Pipeline on AWS Using Lambda and SQS

Designing a Cost-Efficient Parallel Data Pipeline on AWS Using Lambda and SQS

1
Comments
6 min read
Understanding Kafka Architecture, Schema Registry, ksqlDB, PostgreSQL, Couchbase, and Microservices

Understanding Kafka Architecture, Schema Registry, ksqlDB, PostgreSQL, Couchbase, and Microservices

1
Comments
3 min read
Introduction to the Confluent REST Proxy

Introduction to the Confluent REST Proxy

2
Comments
4 min read
Why We Need Schema Registry in Kafka

Why We Need Schema Registry in Kafka

2
Comments
17 min read
Azure Synapse Analytics

Azure Synapse Analytics

Comments
5 min read
Debugging Windows Race Conditions in Dagster

Debugging Windows Race Conditions in Dagster

Comments
3 min read
6 Different Data Formats Commonly Used in Data Analytics

6 Different Data Formats Commonly Used in Data Analytics

Comments
3 min read
The Offline Data Engineer: Building Resilient API Pipelines that Work on an Airplane

The Offline Data Engineer: Building Resilient API Pipelines that Work on an Airplane

4
Comments
5 min read
Part 1: Snowflake's Autonomous Future

Part 1: Snowflake's Autonomous Future

Comments
8 min read
Scaling Customer Analytics: Designing ML Pipelines for Millions of Users

Scaling Customer Analytics: Designing ML Pipelines for Millions of Users

Comments
7 min read
Apache Dev Mail Digest: Iceberg & Polaris (Nov 12–17, 2025)

Apache Dev Mail Digest: Iceberg & Polaris (Nov 12–17, 2025)

Comments
4 min read
A Developer’s Guide to Apache Kafka: From Basics to Architecture in One Read

A Developer’s Guide to Apache Kafka: From Basics to Architecture in One Read

1
Comments
5 min read
Why Parquet Is Everywhere - And What Makes It Actually Fast?

Why Parquet Is Everywhere - And What Makes It Actually Fast?

2
Comments
3 min read
loading...