Kinesis Producers
A producer for Amazon Kinesis Data Streams is an application that feeds user data records into a Kinesis data stream (also called data ingestion). The Kinesis Producer Library (KPL) makes it easier to construct producer applications by allowing developers to achieve high write throughput to a Kinesis data stream.
There are different methods to stream data into Amazon kinesis streams:
- Kinesis SDK
- Kinesis Producer Library (KPL)
- Kinesis Agent
Other third-party libraries include:
Spark, Log4J, Appenders, Flume, Kafka Connect, NiFi
Kinesis Producer SDK - PutRecord(s)
- PutRecord (one record) and PutRecords (many records) APIs are utilized.
- PutRecords leverages batching and enhances performance, resulting in fewer HTTP calls.
- AWS Mobile SDKs: Android, iOS, etc...
-
Managed Amazon Web Services sources for Kinesis Data Streams:
- AWS IoT
- CloudWatch Logs
- Kinesis Data Analytics
Use cases:
low throughput, higher latency, simple API, AWS Lambda
Kinesis Producer Library (KPL)
- Easy to use and highly configurable C++/Java library
- Used for building high-performance, long-running producers
- Automated and configurable retry mechanism
- Synchronous or Asynchronous APIs (better performance for async)
- Submits metrics to CloudWatch for monitoring.
- Batching (both turned on by default) β increase throughput, decrease cost:
- Collect Records and Write to multiple shards in the same PutRecords API call.
- Aggregate β increased latency.
Kinesis Producer Library (KPL) Batching
By inserting some delay using RecordMaxBufferedTime, batching efficiency can be impacted (default 100ms)
NOTE: When not to use the Kinesis Producer Library
- The KPL can incur an additional processing delay of up to RecordMaxBufferedTime within the library (user-configurable)
- Larger values ββof RecordMaxBufferedTime result in higher packing efficiencies and better performance
- Applications that cannot tolerate this additional delay may need to use the AWS SDK directly
Kinesis Agent
Monitor Log files and sends them to Kinesis Data Streams
Java-based agent, built on top of KPL
Install in Linux-based server environments
Features:
- Write from multiple directories and write to multiple streams
- Routing feature based on directory/log file
- Pre-process data before sending to streams (single line, CSV to JSON, log to JSON)
- The agent handles file rotation, checkpointing, and retry upon failures
- Emits metrics to CloudWatch for monitoring
AWS Kinesis API - Exceptions
- Provisioned Throughput Exceeded Exceptions
- Happens when sending more data (exceeding MB/s or TPS for any shard)
-
Make sure you don't have a hot shard (such as your partition key is bad and too many data goes to that partition) Solution:
- Retries with backoff
- Increase shards (scaling)
- Ensure your partition key is a good one
Top comments (0)