Building machine learning systems that integrate vision, audio, and language, with a focus on real-world deployment and performance.
- Medical Imaging — multi-model chest X-ray classification + segmentation + explainability
- Multimodal RAG Assistant — text, image, and audio grounded QA system
- Digital Scarecrow (YOLO) — real-time animal detection system
- Bird Classification — audio (spectrogram) + image-based pipeline
- College ERP System — backend with normalized schema and role-based workflows
- Multimodal ML systems
- Real-world robustness
- Backend + ML integration
- Low-latency systems
- LinkedIn: https://www.linkedin.com/in/saket-patayeet/
- Email: saket.patayeet23@vit.edu