ABC (@Ubunta) / X

ABC

5,786 posts

ABC

@Ubunta

Data & AI Infrastructure for Healthcare | DhanvantriAI | HotTechStack | ChatWithDatabase 🇩🇪Berlin & 🇮🇳Kolkata

Berlin, Germany

abhishekchoudhary.net

Joined August 2009

Pinned
ABC
@Ubunta
Oct 11, 2025
Using Postgres as a Data Warehouse - Start with Postgres 18+ — asynchronous I/O makes table scans 2-3x faster than Postgres 15 - One command runs everything: `docker-compose up`. If partitioning breaks on localhost, it'll break in prod — test the real structure first - Async
51K
ABC
@Ubunta
Nov 26, 2022
Lazydocker - A very useful terminal UI based application to manage Docker This is really a brilliant application for simplifying docker management
GIF
ABC
@Ubunta
Mar 4, 2025
"Hello World" in modern Data Engineering - Create a Dockerfile or setup a dev environment with Python, Sqlalchemy, DuckDB, Polars, Daft installed. - Read CSV/Excel file and convert it to Parquet - Upload the Parquet file in DuckDB - Connect to DuckDB using Polars / Daft. - Make
36K
ABC
@Ubunta
Aug 9, 2025
Replying to @LundukeJournal
It's way more polite than many stackoverflow comments
38K
ABC
@Ubunta
Nov 29, 2023
As a Senior Staff Data Engineer, my top five tasks over the past 2 years include: 1. Simplifying Kubernetes for Data Scientists/Engineers: Developed user-friendly libraries and containers, enabling Data Scientists to utilize Kubernetes effortlessly. Achieved a complete
54K
ABC
@Ubunta
Sep 23, 2022
People are debating on Snowflake vs Databricks and I am rebuilding my Data/ML stack on @duckdb, Apache Arrow, @IbisData and @flyteorg
ABC
@Ubunta
Aug 8, 2024
DrawDB is an excellent tool for database design and ER modeling. I found it very user-friendly, and it also allows you to upload existing schemas. 📌You can check it out here: (github.com/drawdb-io/draw…). I used the generated SQL for PostgreSQL!
GIF
21K
ABC
@Ubunta
Sep 24, 2025
The Current Shift in Data Engineering - CSV, Excel, and JSON will outlive most tools — formats persist because they're human-friendly - Postgres is still the first "data warehouse" most teams touch before scaling up - "Data pipeline" will remain a vague term nobody fully agrees
17K
ABC
@Ubunta
Oct 8, 2025
Building a Data Engineering Pipeline for Production in 2025/2026 - Local first — docker-compose.yml with Postgres, Redis, DuckDB, Marimo, and Airflow - One command runs your entire data stack: `docker-compose up` - If it doesn't work on localhost, it won't work in prod - Python
14K
ABC
@Ubunta
Aug 20, 2024
Data Engineering and Machine Learning are currently in one of their most exciting phases: - Single-node data stacks, like @DataPolars and Apache Arrow, are now capable of handling 80% of data use cases, even with terabytes of data. - @duckdb is rapidly gaining traction, with
28K
ABC
@Ubunta
Oct 12, 2023
Data Engineering offers good pay if you're skilled in several technologies - Streaming engines: Flink & Kafka - DWH: Spark , snow, trino, clickhs - Distributed DB: hbase, cockroachdb, yugabyte - Infrastructure: elk stack, docker + Python & sql "Ability to explain ☝️ these"
30K
ABC
@Ubunta
Jan 19, 2023
Apache Arrow is on Fire 🔥🔥🔥 🙏 Data Fusion 🔥 @duckdb ⚡Polars Data To me, @ApacheArrow is now the most important component in the data and ML community
25K
ABC
@Ubunta
Oct 19, 2025
Designing Postgres for Large Data Engineering Workloads, it works - Postgres 18's async I/O made old queries feel new — sequential scans that crawled at 40s now finish in 12s, no tuning required - Batch writes are non-negotiable — COPY and execute_values turned 40-second ingests
15K
ABC
@Ubunta
Nov 9, 2025
How to Keep DuckDB in Sync with Postgres- the Easy Local CDC Way - You have live data in Postgres and want it in DuckDB for analytics — this should take 10 minutes to set up, not 10 days -No need to install Kafka, Zookeeper, and Debezium like you're building super Data
16K