Simple stream processing pipeline
-
Updated
Jun 17, 2024 - Python
Simple stream processing pipeline
Build super simple end-to-end data & ETL pipelines for your vector databases and Generative AI applications
High Performance Tensorflow Data Pipeline with State of Art Augmentations and low level optimizations.
kedro cli plugin for generating a static kedro viz site (html, css, js) that can be deployed on many serverless tools.
Modeling tool like DBT to use SQL Alchemy core with a DataFrame interface like
Extract transform load CLI tool for extracting small and middle data volume from sources (databases, csv files, xls files, gspreadsheets) to target (databases, csv files, xls files, gspreadsheets) in free combination.
Материалы для курса Введение в Data Engineering: дата пайплайны
Simple Airflow on Kubernetes (GKE)
DBT and clickhouse test project with dagster
This is an ETL project - extracting data from an ecommerce transactional database on RDS, transforming the data using AWS glue job, and loading it to a Redshift data warehouse, and connected it to Tableau for BI
An ETL data pipeline that extracts data from source and loads it to destination, automated using mage.ai
This is a project which demonstrates creation of a data pipeline by scraping data using twitter API and creating a data delivery stream using Kinesis Firehose for ingesting data to Amazon S3.
Dagster Tutorial to get you started with Dagster as an absolute Beginner. The tutorial covers various topics like Dagster Installation, Dagster Asset, Dagster Job, Dagster Scheduler, Dagster Ops, and more. It is completely free on YouTube and is beginner-friendly without any prerequisites.
An Automated data pipeline using "Apache Airflow" performing "ETL" on RAW data using "Pandas" library then stage data into "PostgreSQL" then process it distributed cluster and parallelly using "Spark" and loaded final useful data into "ElasticSearch" NoSQL DB warehouse
A data pipeline to daily pull public transport data from the opentransportdata.swiss portal. This pipeline has three tasks, pull the right data from opentransportdata.swiss, push the data to s3 for storage, and transform and load the transformed data to a database. Hopefully this repository helps people explain ETL / Batch data pipeline.
Add a description, image, and links to the datapipeline topic page so that developers can more easily learn about it.
To associate your repository with the datapipeline topic, visit your repo's landing page and select "manage topics."