DEV Community

# bigdata

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
SeaTunnel-Powered Data Integration: How 58 Group Handles Over 500 Billion+ Data Points Daily

SeaTunnel-Powered Data Integration: How 58 Group Handles Over 500 Billion+ Data Points Daily

5
Comments
5 min read
The Heart of DolphinScheduler: In-Depth Analysis of the Quartz Scheduling Framework

The Heart of DolphinScheduler: In-Depth Analysis of the Quartz Scheduling Framework

8
Comments
3 min read
Big Data

Big Data

Comments
1 min read
System Design 09 - Data Partitioning: Dividing to Conquer Big Data

System Design 09 - Data Partitioning: Dividing to Conquer Big Data

Comments
2 min read
Simplifying Real-Time Data Ingestion with Apache NiFi

Simplifying Real-Time Data Ingestion with Apache NiFi

Comments
3 min read
Understanding Star Schema vs. Snowflake Schema

Understanding Star Schema vs. Snowflake Schema

Comments
1 min read
Best Practices for Data Security in Big Data Projects

Best Practices for Data Security in Big Data Projects

Comments
6 min read
Big Data Storage Trends and Insights

Big Data Storage Trends and Insights

Comments
7 min read
5 Big Data Use Cases that Retailers Fail to Use for Actionable Insights

5 Big Data Use Cases that Retailers Fail to Use for Actionable Insights

Comments
3 min read
From ETL and ELT to Reverse ETL

From ETL and ELT to Reverse ETL

Comments
4 min read
Introduction to Big Data Analysis

Introduction to Big Data Analysis

8
Comments
13 min read
How Big Data is Powering the Internet of Things (IoT) Revolution - MasTech InfoTrellis

How Big Data is Powering the Internet of Things (IoT) Revolution - MasTech InfoTrellis

Comments
4 min read
Why Scala is the Best Choice for Big Data Applications: Advantages Over Java and Python

Why Scala is the Best Choice for Big Data Applications: Advantages Over Java and Python

Comments
6 min read
Processando 20 milhões de registros em menos de 5 segundos com Apache Hive.

Processando 20 milhões de registros em menos de 5 segundos com Apache Hive.

10
Comments
8 min read
SeaTunnel Community Monthly Report For September

SeaTunnel Community Monthly Report For September

Comments
14 min read
Effizientes Scrapen von JavaScript-Webseiten

Effizientes Scrapen von JavaScript-Webseiten

Comments
3 min read
Tracking Data Over Time: Slowly Changing Dimensions (SCD)

Tracking Data Over Time: Slowly Changing Dimensions (SCD)

Comments
6 min read
Big Data Challenges and Solutions: Navigating the Complex Landscape

Big Data Challenges and Solutions: Navigating the Complex Landscape

Comments
7 min read
Fünf Schritte zum Scraping mehrerer Bilder mit Python

Fünf Schritte zum Scraping mehrerer Bilder mit Python

Comments
2 min read
Introduction to Big Data

Introduction to Big Data

6
Comments 2
2 min read
Why Apache Spark RDD is immutable?

Why Apache Spark RDD is immutable?

Comments
3 min read
Reducing Delivery Times and Costs: How Machine Learning Optimizes Delivery Routes Efficiently

Reducing Delivery Times and Costs: How Machine Learning Optimizes Delivery Routes Efficiently

1
Comments 1
3 min read
Hands-on introduction to Apache Iceberg

Hands-on introduction to Apache Iceberg

7
Comments 2
8 min read
Embarking on the Big Query Quest: Exploring the Depths of its Inner Workings

Embarking on the Big Query Quest: Exploring the Depths of its Inner Workings

Comments
5 min read
The Journey From a CSV File to Apache Hive Table

The Journey From a CSV File to Apache Hive Table

6
Comments
6 min read
How to Become an Apache SeaTunnel Committer?

How to Become an Apache SeaTunnel Committer?

1
Comments
4 min read
Building a Big Data Playground Sandbox for Learning

Building a Big Data Playground Sandbox for Learning

5
Comments
5 min read
Data Analysis: The Power of Big Data and Analytics in Decision Making 📊

Data Analysis: The Power of Big Data and Analytics in Decision Making 📊

Comments
3 min read
Cassandra vs. MongoDB: Choosing the Right NoSQL Database

Cassandra vs. MongoDB: Choosing the Right NoSQL Database

Comments
3 min read
Which Data Synchronization Method is More Senior?

Which Data Synchronization Method is More Senior?

1
Comments
8 min read
Journey Through Spark SQL

Journey Through Spark SQL

Comments
11 min read
Connecting AI with Excel - Talk to Your Spreadsheets

Connecting AI with Excel - Talk to Your Spreadsheets

1
Comments
6 min read
Scala vs. Java: The Superior Choice for Big Data and Machine Learning

Scala vs. Java: The Superior Choice for Big Data and Machine Learning

1
Comments 1
11 min read
Understanding Data Schemas

Understanding Data Schemas

Comments
5 min read
The Ultimate Guide to Data Analytics: Unlocking the Power of Data

The Ultimate Guide to Data Analytics: Unlocking the Power of Data

Comments
3 min read
Data Showdown: OLAP vs. OLTP – The Battle of Real-Time and Analytics Titans

Data Showdown: OLAP vs. OLTP – The Battle of Real-Time and Analytics Titans

Comments
5 min read
Optimize ETL Processes with Apache Iceberg: A Game Changer

Optimize ETL Processes with Apache Iceberg: A Game Changer

Comments
4 min read
The Must-Have Features of Modern Data Transformation Tools

The Must-Have Features of Modern Data Transformation Tools

Comments
6 min read
An End-to-End Guide to dbt (Data Build Tool) with a Use Case Example

An End-to-End Guide to dbt (Data Build Tool) with a Use Case Example

2
Comments
4 min read
To Index Data is To Sort Data

To Index Data is To Sort Data

8
Comments
5 min read
How to install Apache Kafka on Ubuntu with KRaft Mode (without Zookeeper): A Step-by-Step Guide

How to install Apache Kafka on Ubuntu with KRaft Mode (without Zookeeper): A Step-by-Step Guide

Comments
10 min read
Using ReAct Agents LLMs to Draw Insights from Tabular Data

Using ReAct Agents LLMs to Draw Insights from Tabular Data

5
Comments
7 min read
Data Driven Dreams: Building My Data Science Career

Data Driven Dreams: Building My Data Science Career

Comments
4 min read
A Beginner's Guide To Data Engineering Concepts, Tools, And Responsibilities.

A Beginner's Guide To Data Engineering Concepts, Tools, And Responsibilities.

Comments
1 min read
Optimizing Transformations in Pentaho: Case Study

Optimizing Transformations in Pentaho: Case Study

Comments
3 min read
Loading data to Google Big Query using Dataproc workflow templates and cloud Schedule

Loading data to Google Big Query using Dataproc workflow templates and cloud Schedule

2
Comments
12 min read
Data Visualisation Basics

Data Visualisation Basics

9
Comments
7 min read
Demystifying Data Science: A Beginner’s Guide!

Demystifying Data Science: A Beginner’s Guide!

Comments
3 min read
Data Lakes vs. Data Warehouses: Choosing the Right Big Data Architecture

Data Lakes vs. Data Warehouses: Choosing the Right Big Data Architecture

1
Comments
4 min read
How to Install Hadoop on Ubuntu: A Step-by-Step Guide

How to Install Hadoop on Ubuntu: A Step-by-Step Guide

Comments
10 min read
🤔 Is It Possible to Achieve 100% Test Automation?

🤔 Is It Possible to Achieve 100% Test Automation?

Comments
2 min read
Data ingestion – definition, types and best practices

Data ingestion – definition, types and best practices

Comments
8 min read
How to Handle Databases with Billions of Records

How to Handle Databases with Billions of Records

2
Comments
1 min read
Effective Strategies for Scaling Databases: Enhancing Performance for Growing Data Needs

Effective Strategies for Scaling Databases: Enhancing Performance for Growing Data Needs

4
Comments
5 min read
Databricks - Variant Type Analysis

Databricks - Variant Type Analysis

Comments
7 min read
Working with Parquet files in Java using Carpet

Working with Parquet files in Java using Carpet

1
Comments
6 min read
Optimizing ETL Processes for Efficient Data Loading in EDWs

Optimizing ETL Processes for Efficient Data Loading in EDWs

Comments
4 min read
Patient-Centered Care and Data Integration in Population Health Management

Patient-Centered Care and Data Integration in Population Health Management

Comments
4 min read
The Basics of Big Data: What You Need to Know

The Basics of Big Data: What You Need to Know

Comments
3 min read
Why Apache Doris is the Best Open Source Alternative to Rockset

Why Apache Doris is the Best Open Source Alternative to Rockset

3
Comments
3 min read
loading...