Excited to share my latest Data Engineering Project! I built a real-time data pipeline that processes live weather data at scale 🌦️. Here’s how it works: - Pulling data from the OpenWeather API (handling ~3000 API calls/minute). - Ingesting data into Kafka Producer. - A Flink Consumer processes and aggregates the weather data every minute, then pushes it into another Kafka topic. - Another consumer stores the processed data into Postgres (via Supabase). - Finally, I integrated Grafana with Postgres to visualize real-time insights, refreshing every 5 minutes. - To orchestrate this workflow, I’m using Apache Airflow to schedule the Flink jobs at 5-minute intervals. This project helped me explore: - Building scalable streaming pipelines - Real-time data aggregation - Orchestration with Airflow - Visualization of live data Tech stack highlights: Kafka Flink Airflow Postgres Supabase Grafana 🔗 GitHub Repository I’ve open-sourced the entire project here: https://lnkd.in/g2cgcAmz Would love to hear feedback and ideas from the community to make this even more production-ready 🙌 #DataEngineering #ApacheKafka #ApacheFlink #ApacheAirflow #RealTimeData #DataPipeline #Postgres #Grafana
Fantastic project! Love how you’ve combined real-time ingestion, processing, and visualization into a full end-to-end pipeline. The integration of Flink with Kafka and Airflow for orchestration is particularly impressive.
Data Engineering Intern @MPL | B.E. Computer Science Student | Skilled in Python, SQL, and Data Pipelines
1moHere’s a sneak peek of the Grafana dashboard I built on top of this pipeline — visualizing real-time weather patterns across the globe. The data refreshes every 5 minutes, powered by Postgres + Supabase + Grafana.