🏫 Awesome Stream Processing 🏫

The term "stream processing" might sound intimidating to many people. We often hear statements like:

"Stream processing is too difficult to learn and use!" 😱
"Stream processing is very expensive!" 😱
"I don’t see any business use cases for stream processing!" 😱

However, we believe this isn't true. ❌

Streaming data is everywhere, generated from operational databases, messaging queues, IoT devices, and many other sources. People can leverage modern stream processing technology to easily address classic real-world problems, using SQL as the programming language.

In this repository, we provide a series of executable demos demonstrating how stream processing can be applied in practical scenarios:

Getting started ✅
- Install Kafka, PostgreSQL, and RisingWave, and run minimal toy examples on your device.
- Integrate RisingWave with other data platforms.
Basic stream processing examples ✅

Learn the fundamentals of ingesting, processing, transforming, and offloading data from streaming systems.
1. Querying and processing event streaming data (👈 Kafka users, you may start here! 💡)
- Directly query data stored in event streaming systems (e.g., Kafka, Redpanda).
- Continuously ingest and analyze data from event streaming systems.
1. Bringing analytics closer to operational databases (👈 Postgres users, you may start here! 💡)
- Offload event-driven queries (e.g., materialized views and triggers) from operational databases (e.g., MySQL, PostgreSQL).
1. Real-time ETL (Extract, Transform, Load)
- Perform ETL continuously and incrementally.
Simple demonstrations ✅
- A collection of simple, self-contained demos showcasing how stream processing can be applied in specific industry use cases.
Solution demonstrations ✅
- A collection of comprehensive demos showcasing how to build a stream processing pipeline for real-world applications.
RAG & Metrics Comparisons ✅

RisingWave RAG Demo
- Build a Retrieval-Augmented Generation system using RisingWave. The pipeline stores documentation chunks and their embeddings, retrieves the most similar documents for a user query, and calls an LLM to generate grounded answers.
Compare Metrics (RisingWave vs. Flink)
- Run the same workloads on both systems using the same message queues and queries to observe and compare performance metrics side by side.

Agent Demo ✅
- Use AI agents to analyze data ingested into RisingWave. This client app connects RisingWave’s MCP with Anthropic’s LLM to parse natural-language questions, discover relevant tables/schemas, call data tools, and iteratively return clean results (e.g., formatted tables).
Data Engineering Agent Swarm ✅
- A multi-agent system for common data engineering tasks with RisingWave and Kafka integration. Includes a planner that delegates to specialized agents for database ops, stream processing, and pipeline orchestration; supports automatic schema inference and an interactive chat loop.
RisingWave + Apache Iceberg — End-to-End Streaming Lakehouse Demos ✅
- Self-contained pipelines (Docker Compose + SQL) showing RisingWave writing to Apache Iceberg and querying with external engines.
  - streaming_iceberg_quickstart — Build your first streaming Iceberg table with RisingWave (self-hosted catalog) and query with Spark.
  - postgres_to_rw_iceberg_spark — PostgreSQL CDC → RisingWave → Iceberg → Spark using the Iceberg Table Engine and hosted catalog.
  - mongodb_to_rw_iceberg_spark — MongoDB change streams → RisingWave → Iceberg → Spark with JSON→typed projection.
  - mysql_to_rw_iceberg_spark — MySQL binlog CDC → RisingWave → Iceberg → Spark end-to-end.
  - risingwave_lakekeeper_iceberg_duckdb — RisingWave → Lakekeeper (REST) → Iceberg → DuckDB with upsert streaming.
  - risingwave_s3tables_iceberg_duckdb — Use AWS S3 Tables catalog to stream from RisingWave to Iceberg and query with DuckDB (no local catalog containers).
  - logistics_multiway_streaming_join_iceberg — Seven-topic logistics streaming join in RisingWave writing to Iceberg (hosted catalog), real-time analysis, then query with Spark.
  - risingwave_lakekeeper_iceberg_clickhouse — RisingWave → Lakekeeper (REST) → Iceberg → ClickHouse with streaming writes and shared REST catalog.

We use RisingWave as the default stream processing system to run these demos. We also assume that you have Kafka and/or PostgreSQL installed and possess basic knowledge of how to use these systems. These demos have been verified on Ubuntu and Mac.

All you need is a laptop 💻 - no cluster is required.

Any comments are welcome. Happy streaming!

Join our Slack community to engage in discussions with thousands of stream processing enthusiasts!

Name		Name	Last commit message	Last commit date
Latest commit History 707 Commits
00-get-started		00-get-started
01-basic-streaming-workflow		01-basic-streaming-workflow
02-simple-demos		02-simple-demos
03-solution-demos		03-solution-demos
04-rag-demo		04-rag-demo
04-solution-demos/compare-metrics		04-solution-demos/compare-metrics
05-agent-demo		05-agent-demo
06-data-agent		06-data-agent
07-iceberg-demos		07-iceberg-demos
.gitignore		.gitignore
.gitpod.Dockerfile		.gitpod.Dockerfile
.gitpod.yml		.gitpod.yml
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🏫 Awesome Stream Processing 🏫

About

Uh oh!

Releases

Packages

Contributors 15

Languages

License

risingwavelabs/awesome-stream-processing

Folders and files

Latest commit

History

Repository files navigation

🏫 Awesome Stream Processing 🏫

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 15

Languages

Packages