Skip to content
View divyanshailani's full-sized avatar
💭
I may be slow to respond.
💭
I may be slow to respond.

Highlights

  • Pro

Block or report divyanshailani

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
divyanshailani/README.md

Typing SVG

LinkedIn GitHub Open To

Python PostgreSQL XGBoost Docker Azure

About Me

I am a Data Engineer & MLOps Architect focused on building robust, high-throughput data pipelines and production-grade machine learning models. I specialize in solving complex infrastructure bottlenecks, distributed parallel processing, and ensuring absolute data integrity at scale.

My engineering philosophy:

  1. Performance & Scale: Bypassing API rate limits via multi-node mesh architectures.
  2. Data Integrity First: Writing idempotent ETL pipelines with rigorous deduplication and validation.
  3. Optimized ML Runtimes: Transitioning from slow SQL queries to highly compressed Parquet exports for blazing-fast XGBoost training and Optuna hyperparameter tuning.

🌍 Featured Project: Global AQ Intelligence Pipeline

An end-to-end Air Quality prediction system spanning global satellite data ingestion, massive PostgreSQL databases, and autonomous machine learning pipelines.

Explore the Repository

Engineering Highlights:

  • Multi-VM Distributed Backfill: Engineered a 4-node parallel mesh architecture to process 1.8M+ rows of Aerosol Optical Depth (AOD) satellite data, bypassing strict API rate limits and reducing ETL time from 11 hours to under 3 hours.
  • Advanced Feature Engineering: Built trigonometric wind encodings and multi-horizon lagging systems to train state-of-the-art XGBoost models capable of understanding physical weather phenomena.
  • Production Infrastructure: Hosted on Azure Flexible PostgreSQL with robust local-to-cloud synchronization workflows.
  • Automated Workflows: Fully orchestrated via GitHub Actions cron schedules for daily data fetching, deduplication, and model syncing.

🛠️ Specialized Focus

I am currently deepening my expertise in:

  • Data Engineering: Idempotent ETL pipelines, PostgreSQL query optimization, and Parquet data serialization.
  • MLOps: Managing model registries, hyperparameter tuning with Optuna, and deploying XGBoost/Tree-based models to production.
  • Distributed Systems: Managing multi-server task distribution, handling network interruptions gracefully, and state synchronization.

Open To

I am actively seeking roles as a Data Engineer, MLOps Engineer, or Backend Developer where I can tackle complex data architecture problems and build production-ready ML infrastructure.


Kanpur, India • Building globally relevant data systems

Pinned Loading

  1. global-aq-intelligence-pipeline global-aq-intelligence-pipeline Public

    End-to-end air quality analysis pipeline: OpenAQ API → 5-phase cleaning → EDA → feature engineering → ML prediction

    Jupyter Notebook 1

  2. calamity-matrix-core calamity-matrix-core Public

    5-source data pipeline, pgvector RAG engine, and temporal key rectification

    Python

  3. global-aq-intelligence-web global-aq-intelligence-web Public

    TypeScript

  4. anagram-quest-qwen3-0.6b-grpo-mlx anagram-quest-qwen3-0.6b-grpo-mlx Public

    Fine-tuned Qwen3-0.6B anagram solver model card, eval notes, and release packaging for Anagram Quest.