π Founding Member & Head of AI @ Mixar | π€ AI-native fork of Blender
Building foundational ML models for the next-gen AI-native 3D editor.
I build tools to make 3D content creation more intuitive and efficient. Currently deep in the AI-native development paradigm β training ML models from scratch for 3D workflows like UV unwrapping, procedural texturing, 3D object generation, and scene reconstruction.
- Scene Reconstruction Pipeline β Single image β full 3D scene in Blender, using VGGT (geometry), SAM 3 (segmentation), SAM3D (per-object 3D reconstruction), MoGe (monocular geometry), and Claude VLM (semantic labeling)
- Seam Prediction Model β ML model for automated UV unwrapping
- Scene-AST β Structured scene representation for better 3D context retrieval with LLMs
ML & 3D
Engineering
- Building in AI β Shipping AI/ML features at Mixar to make 3D content creation more intuitive and efficient
- Scene Reconstruction R&D β Combining VGGT, SAM3D, and MoGe into a unified pipeline that turns a single photo into an editable Blender scene
- Reading papers β 3D computer vision, mesh generation, PBR texture generation, monocular depth/geometry estimation, LLM reasoning, and agentic workflows
- Exploring agents β How autonomous systems can change 3D content creation
- π¦ Twitter/X: @_satyam_ai
- π§ Email: satyam@mixar.app
- π Website: mixar.app
- IIT Graduate
- Believe in shipping fast over perfect
- Coffee-powered
β‘ Ship fast. Learn faster.
- π Built a 3D-to-video pipeline on top of LTX-2 (22B open-weight video diffusion). Blender renders depth + cryptomatte + a first-frame style anchor; the runner stacks IC-LoRA adapters and generates per shot. Geometry and camera come from the 3D scene; material and light come from the prompt. ~80s per 5s clip on a single RTX 6000.
- π§ Tuned the baseline empirically: mapped which LoRA stacks help vs. degrade the 22B distilled checkpoint, identified the 128-divisible resolution constraint (compound stride through VAE, IC-LoRA reference downscale, and patchifier), found the 121-frame single-clip coherence ceiling, and debugged attention-mask semantics that were wiping out background geometry.
- π Built a self-scaling GPU orchestration layer β spin up any number of GPU VMs and they auto-register, advertise services, and start receiving routed workloads with zero frontend or backend deploys. Routes 6 GPU services across 6β22 GB VRAM budgets with 4 priority levels. Think Kubernetes-lite for GPU inference.
- π Built a real-time GPU fleet dashboard from scratch with per-worker health tracking, job lifecycle metrics, and live queue stats across all 6 services.
- π§ Implemented VRAM-aware job dispatching on a 2s loop with 3-attempt exponential backoff (5sβ120s), dead-letter queue for credit refunds, and timeout handling for stuck jobs.
- π§ Optimized queue throughput by pipelining 8 Redis ops per stats query and batch-loading jobs via MGET. Fixed race conditions in GLB uploads and hardened reliability with duplicate completion prevention and automatic pruning of offline workers.
- [Shipping] Built SeamNet, a GNN-based edge classifier for seam prediction on 3D meshes using PyG TransformerConv β achieved 96.4% train F1 on 72 meshes.
- [Shipping] Scaled training to 38k objects over 100k steps, reaching 0.600 val F1 and 0.618 test F1 across 776 unseen meshes.
- [Learning] Explored PyTorch Geometric's TransformerConv with learned edge projections for mesh-based classification tasks.
- [Refining] Eliminated NaN instabilities in training by adding edge projection normalization and switching to BCEWithLogitsLoss on raw logits.
- [Refining] Tuned generalization with regularization experiments (dropout, weight decay, connectivity loss) and added mixed precision support.
- [Shipping] Built a complete AI Scene Creation Pipeline (VGGT β SAM3 β VLM β Blender) that converts single images into full 3D blockout scenes.
- [Shipping] Integrated SAM3D for 3D object reconstruction and MoGe for monocular depth estimation into the scene pipeline.
- [Learning] Adopted SAM3D (PyTorch3D convention) and MoGe for geometry-aware 3D reconstruction from 2D images.
- [Refining] Fixed EXIF orientation handling, improved ground plane detection with a hybrid camera-extrinsics approach, and resolved multi-frame segmentation for non-overlapping inputs.
- π Shipped end-to-end scene reconstruction using SAM3D β single image to full 3D Blender scene with per-object mesh recovery, pose estimation, and semantic labeling via Claude VLM.
- π Integrated MoGe monocular geometry estimation into the pipeline, enabling pointmap sharing across models and eliminating redundant depth computation.
- π Adopted SAM3D's PyTorch3D coordinate conventions and voxel-based shape representation, wiring up a multi-model inference stack (MoGe + DiNO ViT-L + diffusion) on Blackwell GPU.
- π§ Added depth map debug visualizations, streamlined the Makefile, and implemented
--framesparameter parsing for selective multi-frame processing.
- π Released gh-profile-updater β a Claude Code plugin that generates impact-driven GitHub profile READMEs from git activity, with automated PR creation and velocity archiving.
- π Built auto-clone support so the plugin works without a pre-existing local copy of the profile repo.
- π§ Hardened plugin schema and marketplace metadata for reliable discovery and installation.



