🌐 Get started: What is MongoDB operational data layer? (Part 2) 🌐

#database #data #dataengineering #mongodb

Operational Data Layer Data loading:
✅ Data must sync with source systems
✅ Appropriate data loading strategy
✅ Producer systems: frequency and quantity of data changes
✅ Consuming systems: clear requirements for data currency

Step 1: Batch extract and load:
📁 Initial batch load
📁 Copy application database data to Operational Data Layer
📁 One-time operation to load data from source systems

*Step 2: Delta extract and load: *
🔄 Starts immediately following initial batch load
🔄 Real-time synchronization
🔄 Incremental updates from source systems into the ODL
🔄 Use Change Data Capture (CDC)
🔄 Catch changes from source systems
🔍 Matching, merging, reconciling data

Data flow and maturity model:
🏗️ Simple start
🌱 Grows in scope and strategic importance
🏆 Delivering increased benefits to business

Phase 1: Simple ODL, offloading reads:
💻 Serve only read operations
💰 Cut costs
🔒 High availability: takes over during source system downtime
⚡ Improve performance
📊 Handle long-running analytics queries
📈 Handle high read traffic peak

Phase 2: Enriched ODL for new use cases:
🔍 Create single customer view
💳 Example: credit card transactions enriched by categorizing purchases
💰 Determine their spend on each category (travel)

Phase 3: Offloading reads and writes:
📥 Y-loading un both latency, new systems in parallel

Phase 4: ODL first:
✍️ All writes are directed to Operational Data Layer (ODL)

Phase 5: System of Record:
🗃️ Operational Data Layer serve as the System of Record
🏳️ Source system can be decommissioned for cost savings
🏛️ Architectural simplicity

MongoDB for Operational Data Layer:
🤗 Ease: MongoDB's document model easily manage data
🧩 Flexibility: integrate multiple source systems into a single ODL, without pre-define schema
⚡ Speed: better performance when accessing data
🎨 Versatility: satisfy a range of application requirements by flexibility of document model

Example:
📁 Embedding of arrays and sub-documents
📊 Modeling complex relationships and hierarchical data
🗄️ Ability to manipulate deeply nested data without rewrite entire document
🗂️ Model flat, table-like structures, simple key-value pairs, text
🌍 Geospatial data
🕸️ Nodes and edges used in graph processing

Processing pipelines:
🔍 Lookups and range queries
📊 Data analytics
🔨 Transformations
🔍 Faceted search
🌎 Geospatial processing
🕵️ Graph traversals

Intelligently distribute (Operational Data Layer) ODL:

Availability:
💻 Multiple copies of data using replica sets
🔄 Failover and recovery is fully automated

Scalability:
🆙 Challenge: new source systems, adding data volume, new consuming systems, increasing workload
🗄️ Large data sets
⚡ High throughput requirements
🧩 Solution - sharding:
🤖 MongoDB provides horizontal scale-out on low-cost
🔁 Automatically partitions and distributes data across multiple physical instances

Workload isolation:
🔍 Operational Data Layer able to safely serve disparate workloads
🔍 Analytical queries on up-to-date data without impact on production applications

Data locality:
🌎 Allows precise control over where data is physically stored
🗺️ Control geographic region for latency, governance requirements

Reference:

https://www.mongodb.com/resources/basics/implementing-an-operational-data-layer
Implementing an Operational Data Layer

https://www.mongodb.com/resources/solutions/use-cases/mainframe-modernization-reference-architecture
Mainframe Modernization Reference Architecture

Editor