DEV Community

chenquan
chenquan

Posted on

1

ArkFlow - High-performance Rust stream processing engine

ArkFlow

github: https://github.com/chenquan/arkflow

High-performance Rust stream processing engine, providing powerful data stream processing capabilities, supporting multiple input/output sources and processors.

Features

  • High Performance: Built on Rust and Tokio async runtime, offering excellent performance and low latency
  • Multiple Data Sources: Support for Kafka, MQTT, HTTP, files, and other input/output sources
  • Powerful Processing Capabilities: Built-in SQL queries, JSON processing, Protobuf encoding/decoding, batch processing, and other processors
  • Extensible: Modular design, easy to extend with new input, output, and processor components

Installation

Building from Source

# Clone the repository
git clone https://github.com/chenquan/arkflow.git
cd arkflow

# Build the project
cargo build --release

# Run tests
cargo test
Enter fullscreen mode Exit fullscreen mode

Quick Start

  1. Create a configuration file config.yaml:
logging:
  level: info
streams:
  - input:
      type: "generate"
      context: '{ "timestamp": 1625000000000, "value": 10, "sensor": "temp_1" }'
      interval: 1s
      batch_size: 10

    pipeline:
      thread_num: 4
      processors:
        - type: "json_to_arrow"
        - type: "sql"
          query: "SELECT * FROM flow WHERE value >= 10"
        - type: "arrow_to_json"

    output:
      type: "stdout"
Enter fullscreen mode Exit fullscreen mode
  1. Run ArkFlow:
./target/release/arkflow --config config.yaml
Enter fullscreen mode Exit fullscreen mode

Configuration Guide

ArkFlow uses YAML format configuration files, supporting the following main configuration items:

Top-level Configuration

logging:
  level: info  # Log level: debug, info, warn, error

streams:       # Stream definition list
  - input:      # Input configuration
      # ...
    pipeline:   # Processing pipeline configuration
      # ...
    output:     # Output configuration
      # ...
Enter fullscreen mode Exit fullscreen mode

Input Components

ArkFlow supports multiple input sources:

  • Kafka: Read data from Kafka topics
  • MQTT: Subscribe to messages from MQTT topics
  • HTTP: Receive data via HTTP
  • File: Read data from files
  • Generator: Generate test data
  • SQL: Query data from databases

Example:

input:
  type: kafka
  brokers:
    - localhost:9092
  topics:
    - test-topic
  consumer_group: test-group
  client_id: arkflow
  start_from_latest: true
Enter fullscreen mode Exit fullscreen mode

Processors

ArkFlow provides multiple data processors:

  • JSON: JSON data processing and transformation
  • SQL: Process data using SQL queries
  • Protobuf: Protobuf encoding/decoding
  • Batch Processing: Process messages in batches

Example:

pipeline:
  thread_num: 4
  processors:
    - type: json_to_arrow
    - type: sql
      query: "SELECT * FROM flow WHERE value >= 10"
    - type: arrow_to_json
Enter fullscreen mode Exit fullscreen mode

Output Components

ArkFlow supports multiple output targets:

  • Kafka: Write data to Kafka topics
  • MQTT: Publish messages to MQTT topics
  • HTTP: Send data via HTTP
  • File: Write data to files
  • Standard Output: Output data to the console

Example:

output:
  type: kafka
  brokers:
    - localhost:9092
  topic: output-topic
  client_id: arkflow-producer
Enter fullscreen mode Exit fullscreen mode

Examples

Kafka to Kafka Data Processing

streams:
  - input:
      type: kafka
      brokers:
        - localhost:9092
      topics:
        - test-topic
      consumer_group: test-group

    pipeline:
      thread_num: 4
      processors:
        - type: json_to_arrow
        - type: sql
          query: "SELECT * FROM flow WHERE value > 100"
        - type: arrow_to_json

    output:
      type: kafka
      brokers:
        - localhost:9092
      topic: processed-topic
Enter fullscreen mode Exit fullscreen mode

Generate Test Data and Process

streams:
  - input:
      type: "generate"
      context: '{ "timestamp": 1625000000000, "value": 10, "sensor": "temp_1" }'
      interval: 1ms
      batch_size: 10000

    pipeline:
      thread_num: 4
      processors:
        - type: "json_to_arrow"
        - type: "sql"
          query: "SELECT count(*) FROM flow WHERE value >= 10 group by sensor"
        - type: "arrow_to_json"

    output:
      type: "stdout"
Enter fullscreen mode Exit fullscreen mode

License

ArkFlow is licensed under the Apache License 2.0.

Sentry image

See why 4M developers consider Sentry, “not bad.”

Fixing code doesn’t have to be the worst part of your day. Learn how Sentry can help.

Learn more

Top comments (0)

A Workflow Copilot. Tailored to You.

Pieces.app image

Our desktop app, with its intelligent copilot, streamlines coding by generating snippets, extracting code from screenshots, and accelerating problem-solving.

Read the docs

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay