Skip to content
View AndreaBozzo's full-sized avatar
:octocat:
:octocat:

Block or report AndreaBozzo

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
AndreaBozzo/README.md

πŸ‘‹ Andrea Bozzo

Data Engineer Chronicles - A day in the life

Data Engineer | Software Developer | Analytics Architect
Hi, I'm Andrea, usually trying to not set the database on fire while building scalable data solutions. In my spare time I explore systems programming with Rust & Go. (Coffee consumption not to scale)

🌐 Landing Page β€’ πŸ“ Blog β€’ πŸ’Ό LinkedIn β€’ πŸ“§ Email

profile views


πŸ› οΏ½οΏ½ Tech Stack

Languages
Rust Go Python JavaScript

Data Engineering & Databases
Apache Spark Apache Kafka DuckDB PostgreSQL MongoDB Redis

Analytics & BI
Power BI Databricks Apache Superset RisingWave

Cloud & DevOps
Docker Kubernetes AWS Azure GitHub Actions


πŸš€ Featured Project

Fast, lightweight data profiling library built in Rust with Python bindings.

PyPI Downloads Crates.io Downloads GitHub Stars

A high-performance CLI tool and library designed for data engineers to profile datasets locally without sending data to external servers.

  • πŸ”₯ Performance: Written in Rust using Apache Arrow for memory efficiency.
  • 🐍 Python Integration: Full Python bindings via PyO3 for seamless integration in notebooks and pipelines.
  • 🏭 Production Ready: Over 100k+ downloads across platforms, widely used in CI/CD pipelines for automated data quality checks.
  • πŸ”’ Privacy First: Zero telemetry, 100% local execution.

🌟 Open Source Contributions

Contributing to the broader open source ecosystem beyond my own projects.

πŸ€– This section is automatically updated daily via GitHub Actions

  • pola-rs/polars ⭐ 36343 - 1 merged PR
    • Extremely fast Query Engine for DataFrames, written in Rust
  • risingwavelabs/risingwave ⭐ 8563 - 1 merged PR
    • Streaming data platform. Real-time stream processing, low-latency serving, and Iceberg table management.
  • datapizza-labs/datapizza-ai ⭐ 2024 - 3 merged PRs
    • Build reliable Gen AI solutions without overhead πŸ•
  • supabase/etl ⭐ 2022 - 1 merged PR
    • Stream your Postgres data anywhere in real-time. Simple Rust building blocks for change data capture (CDC) pipelines.
  • mariocandela/beelzebub ⭐ 1707 - 1 merged PR
    • A secure low code honeypot framework, leveraging AI for System Virtualization.
  • lakekeeper/lakekeeper ⭐ 1063 - 1 merged PR
    • Lakekeeper is an Apache-Licensed, secure, fast and easy to use Apache Iceberg REST Catalog written in Rust.
  • italia-opensource/awesome-italia-opensource ⭐ 311 - 1 merged PR
    • Italian Open-Source is the first platform dedicated to Italian open-source world
  • CortexFlow/CortexBrain ⭐ 67 - 3 merged PRs
    • CortexBrain is an ambitious open-source project created by CortexFlow, aiming to develop an intelligent, lightweight, and efficient service mesh architecture that seamlessly connects cloud and edge devices
  • piopy/fantacalcio-py ⭐ 41 - 4 merged PRs
    • Piccolo tool per guidarci all'asta spendendo poco
  • informagico/fantavibe ⭐ 3 - 1 merged PR

πŸ“Š GitHub Stats

GitHub Stats Top Languages

GitHub Streak

Contribution Graph


πŸ’‘ Currently

  • πŸ”­ Working on: Building high-performance data pipelines with Rust
  • 🌱 Learning: Advanced systems programming and distributed computing patterns
  • πŸ‘― Looking to collaborate on: Data engineering projects, Python/Rust/Go libraries, open source tools
  • πŸ’¬ Ask me about: Data pipelines, ETL design, Rust best practices, system architecture
  • ⚑ Fun fact: I debug code faster after the third espresso β˜•

🀝 Let's Connect

LinkedIn β€’ Email β€’ GitHub β€’ πŸ’Ž Sponsor

Open to: Consulting on data engineering β€’ Open source collaborations β€’ Interesting data challenges β€’ Python, Rust & Go projects


Pinned Loading

  1. dataprof dataprof Public

    Fast, reliable data quality assessment for CSV, Parquet, and databases

    Rust 8 1

  2. rust-ita/rust-docs-it rust-ita/rust-docs-it Public

    Documentazione Rust tradotta in italiano

    Shell 2 1

  3. Osservatorio Osservatorio Public archive

    Osservatorio - Open Data Processing Platform ( WIP)

    Python 5 5