The term "stream processing" might sound intimidating to many people. We often hear statements like:
- "Stream processing is too difficult to learn and use!" π±
- "Stream processing is very expensive!" π±
- "I donβt see any business use cases for stream processing!" π±
However, we believe this isn't true. β
Streaming data is everywhere, generated from operational databases, messaging queues, IoT devices, and many other sources. People can leverage modern stream processing technology to easily address classic real-world problems, using SQL as the programming language.
In this repository, we provide a series of executable demos demonstrating how stream processing can be applied in practical scenarios:
-
- Install Kafka, PostgreSQL, and RisingWave, and run minimal toy examples on your device.
- Integrate RisingWave with other data platforms.
-
Basic stream processing examples β
Learn the fundamentals of ingesting, processing, transforming, and offloading data from streaming systems.
- Querying and processing event streaming data (π Kafka users, you may start here! π‘)
- Directly query data stored in event streaming systems (e.g., Kafka, Redpanda).
- Continuously ingest and analyze data from event streaming systems.
- Bringing analytics closer to operational databases (π Postgres users, you may start here! π‘)
- Offload event-driven queries (e.g., materialized views and triggers) from operational databases (e.g., MySQL, PostgreSQL).
- Perform ETL continuously and incrementally.
-
- A collection of simple, self-contained demos showcasing how stream processing can be applied in specific industry use cases.
-
- A collection of comprehensive demos showcasing how to build a stream processing pipeline for real-world applications.
-
RAG & Metrics Comparisons β
- RisingWave RAG Demo
- Build a Retrieval-Augmented Generation system using RisingWave. The pipeline stores documentation chunks and their embeddings, retrieves the most similar documents for a user query, and calls an LLM to generate grounded answers.
- Compare Metrics (RisingWave vs. Flink)
- Run the same workloads on both systems using the same message queues and queries to observe and compare performance metrics side by side.
- Agent Demo β
- Use AI agents to analyze data ingested into RisingWave. This client app connects RisingWaveβs MCP with Anthropicβs LLM to parse natural-language questions, discover relevant tables/schemas, call data tools, and iteratively return clean results (e.g., formatted tables).
- Data Engineering Agent Swarm β
- A multi-agent system for common data engineering tasks with RisingWave and Kafka integration. Includes a planner that delegates to specialized agents for database ops, stream processing, and pipeline orchestration; supports automatic schema inference and an interactive chat loop.
- RisingWave + Apache Iceberg β End-to-End Streaming Lakehouse Demos β
- Self-contained pipelines (Docker Compose + SQL) showing RisingWave writing to Apache Iceberg and querying with external engines.
- streaming_iceberg_quickstart β Build your first streaming Iceberg table with RisingWave (self-hosted catalog) and query with Spark.
- postgres_to_rw_iceberg_spark β PostgreSQL CDC β RisingWave β Iceberg β Spark using the Iceberg Table Engine and hosted catalog.
- mongodb_to_rw_iceberg_spark β MongoDB change streams β RisingWave β Iceberg β Spark with JSONβtyped projection.
- mysql_to_rw_iceberg_spark β MySQL binlog CDC β RisingWave β Iceberg β Spark end-to-end.
- risingwave_lakekeeper_iceberg_duckdb β RisingWave β Lakekeeper (REST) β Iceberg β DuckDB with upsert streaming.
- risingwave_s3tables_iceberg_duckdb β Use AWS S3 Tables catalog to stream from RisingWave to Iceberg and query with DuckDB (no local catalog containers).
- logistics_multiway_streaming_join_iceberg β Seven-topic logistics streaming join in RisingWave writing to Iceberg (hosted catalog), real-time analysis, then query with Spark.
- risingwave_lakekeeper_iceberg_clickhouse β RisingWave β Lakekeeper (REST) β Iceberg β ClickHouse with streaming writes and shared REST catalog.
- Self-contained pipelines (Docker Compose + SQL) showing RisingWave writing to Apache Iceberg and querying with external engines.
We use RisingWave as the default stream processing system to run these demos. We also assume that you have Kafka and/or PostgreSQL installed and possess basic knowledge of how to use these systems. These demos have been verified on Ubuntu and Mac.
All you need is a laptop π» - no cluster is required.
Any comments are welcome. Happy streaming!
Join our Slack community to engage in discussions with thousands of stream processing enthusiasts!