A foundational computer vision project that performs real-time human pose estimation using Google's MediaPipe framework. This project served as my entry point into computer vision and laid the groundwork for more advanced applications in sports analytics, medical imaging, and crowd management systems.
Note: This was my first computer vision project (pre-2024) that sparked my journey into AI/ML development. It has since evolved into the foundation for multiple production-ready applications including sports assessment platforms and medical image processing systems.
- Real-Time Processing: Detects 33 human body landmarks at 30+ FPS
- Cross-Platform: Works on Windows, macOS, and Linux
- Lightweight: Optimized for CPU-only inference
- Extensible: Clean and simple architecture for building advanced pose-based applications
- Educational: Well-commented code perfect for learning computer vision concepts
| Component | Technology | Purpose |
|---|---|---|
| Pose Detection | MediaPipe Pose | 33-landmark human pose estimation |
| Computer Vision | OpenCV | Video capture and frame processing |
| Backend | Python 3.7+ | Core application logic |
| Visualization | Matplotlib | Landmark visualization and debugging |
Python 3.7 or higher
Webcam or video input device-
Clone the repository
git clone https://github.com/dipan313/PoseEstimation.git cd PoseEstimation -
Install dependencies
pip install -r requirements.txt
Or manually install:
pip install mediapipe opencv-python numpy matplotlib
-
Run the application
python pose_estimation.py
PoseEstimation/
βββ pose_estimation.py # Main application
βββ requirements.txt # Dependencies
βββ utils/
β βββ pose_utils.py # Pose processing utilities
β βββ visualization.py # Drawing and display functions
βββ examples/
β βββ basic_demo.py # Simple pose detection demo
β βββ angle_calculation.py # Joint angle measurements
βββ assets/
β βββ demo_images/ # Sample outputs
βββ README.md
This foundational project has been extended into several real-world applications:
- Khelo Sathi: Mobile sports assessment platform serving India's 792M smartphone users
- Real-time form analysis for cricket, football, and fitness activities
- Pose-based performance metrics and coaching feedback
- Medical Image Processing: SRGAN-enhanced pose estimation for low-resource healthcare
- Physical therapy progress tracking
- Postural analysis for rehabilitation
- Crowd Management: Panic detection and behavioral analysis in public spaces
- Anomaly detection in surveillance systems
- Real-time safety monitoring
- Motion capture for virtual environments
- Real-time avatar control
- Immersive fitness applications
- 33 Landmarks: Full-body keypoint detection including face, hands, and body
- Model Complexity: Configurable from 0 (lite) to 2 (heavy)
- Detection Confidence: Minimum 0.5 threshold for reliable tracking
- Tracking Confidence: 0.5 threshold for landmark consistency
- Latency: < 33ms per frame (30 FPS)
- Accuracy: 95%+ landmark detection on well-lit scenes
- Memory: < 100MB RAM usage
- CPU Usage: 15-25% on modern processors
This project marked the beginning of my computer vision journey:
2024 - Foundation
- β Real-time pose detection
- β Basic landmark visualization
- β OpenCV integration
2024-2025 - Advanced Applications
- β Mobile deployment with TensorFlow Lite
- β Integration with sports assessment algorithms
- β Medical image processing pipelines
- β Crowd analysis and safety systems
- β Production-ready mobile applications
- Mobile Optimization: TensorFlow Lite conversion for Android deployment
- Pose Classification: ML models for specific pose recognition (yoga, sports)
- Multi-Person Detection: Extend to multiple simultaneous poses
- 3D Pose Estimation: Depth-aware landmark detection
- Real-Time Analytics: Performance metrics and pose scoring
- Cloud Integration: AWS deployment for scalable processing
Contributions are welcome! This project serves as a learning resource for the computer vision community.
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit changes (
git commit -m 'Add amazing feature') - Push to branch (
git push origin feature/amazing-feature) - Open a Pull Request
This foundational project has contributed to:
- π Multiple Hackathon Wins: Sports and healthcare applications
- π± Production Apps: Serving thousands of users in sports assessment
- π Educational Resource: Helped fellow students learn computer vision
- π Open Source Community: Foundation for derivative projects
This project is licensed under the MIT License - see the LICENSE file for details.
- Google MediaPipe Team: For the incredible pose estimation framework
- OpenCV Community: For robust computer vision tools
- Python Ecosystem: For making AI/ML accessible to everyone
From First Steps to Production Systems π
This project represents the beginning of a journey that led to building production-ready computer vision applications serving thousands of users across sports, healthcare, and public safety domains.