This repository showcases my hands-on learning and implementation of Machine Learning models using real-world datasets.
It demonstrates my ability to build end-to-end ML pipelines — from data ingestion and preprocessing to model training, evaluation, and visualization — using Python and industry-standard libraries.
Focused on learning by implementing real ML workflows, not just theory.
- Practical understanding of Machine Learning fundamentals
- Ability to work with real-world datasets
- Experience with data preprocessing and feature engineering
- Model training, evaluation, and interpretation
- Clean, readable, and modular ML code
This repository is intended as a portfolio for internships and entry-level roles in Data Science / Machine Learning.
- Python
- Pandas – data manipulation
- NumPy – numerical computation
- Matplotlib – data visualization
- scikit-learn – ML models & evaluation
- KaggleHub – dataset integration
- Linear Regression (Regression problems)
- More models will be added as learning progresses
Each project includes:
- Dataset loading (CSV / Kaggle)
- Feature–target separation
- Categorical data encoding (One-Hot Encoding)
- Train–test split
- Model training
- Performance evaluation
- Visualization of results
Problem Statement:
Predict medical insurance charges based on personal and lifestyle attributes.
Dataset: US Health Insurance Dataset (Kaggle)
Target Variable: charges (continuous numerical value)
- Loaded dataset using KaggleHub
- Performed data preprocessing and encoding of categorical variables
- Split dataset into training and testing sets
- Trained a Linear Regression model
- Evaluated model performance using regression metrics
- Visualized actual vs predicted values
- Mean Absolute Error (MAE) – average prediction error
- Mean Squared Error (MSE) – penalizes large errors
- Root Mean Squared Error (RMSE) – error in original units
- R² Score – overall model fit
These metrics help assess both accuracy and reliability of predictions.
- Clone the repository:
git clone https://github.com/your-username/your-repo-name.git
- Install dependencies:
pip install pandas numpy matplotlib scikit-learn kagglehub
- Run the model script or notebook:
python model_file.py
Aspiring Machine Learning Engineer with a strong interest in applied ML and data-driven problem solving.
Kaggle for datasets
scikit-learn documentation
Open-source ML community