I am a Data Engineer & MLOps Architect focused on building robust, high-throughput data pipelines and production-grade machine learning models. I specialize in solving complex infrastructure bottlenecks, distributed parallel processing, and ensuring absolute data integrity at scale.
My engineering philosophy:
- Performance & Scale: Bypassing API rate limits via multi-node mesh architectures.
- Data Integrity First: Writing idempotent ETL pipelines with rigorous deduplication and validation.
- Optimized ML Runtimes: Transitioning from slow SQL queries to highly compressed Parquet exports for blazing-fast XGBoost training and Optuna hyperparameter tuning.
An end-to-end Air Quality prediction system spanning global satellite data ingestion, massive PostgreSQL databases, and autonomous machine learning pipelines.
- Multi-VM Distributed Backfill: Engineered a 4-node parallel mesh architecture to process 1.8M+ rows of Aerosol Optical Depth (AOD) satellite data, bypassing strict API rate limits and reducing ETL time from 11 hours to under 3 hours.
- Advanced Feature Engineering: Built trigonometric wind encodings and multi-horizon lagging systems to train state-of-the-art XGBoost models capable of understanding physical weather phenomena.
- Production Infrastructure: Hosted on Azure Flexible PostgreSQL with robust local-to-cloud synchronization workflows.
- Automated Workflows: Fully orchestrated via GitHub Actions cron schedules for daily data fetching, deduplication, and model syncing.
I am currently deepening my expertise in:
- Data Engineering: Idempotent ETL pipelines, PostgreSQL query optimization, and Parquet data serialization.
- MLOps: Managing model registries, hyperparameter tuning with Optuna, and deploying XGBoost/Tree-based models to production.
- Distributed Systems: Managing multi-server task distribution, handling network interruptions gracefully, and state synchronization.
I am actively seeking roles as a Data Engineer, MLOps Engineer, or Backend Developer where I can tackle complex data architecture problems and build production-ready ML infrastructure.
