This project focuses on detecting and categorizing network intrusions using Machine Learning. Network security is crucial, and this system aims to identify anomalies effectively, ensuring the protection of sensitive data and infrastructure.
The dataset simulates a military network environment with a mix of normal and attack traffic. Each connection, represented as a sequence of TCP packets, is labeled as either Normal or Anomalous (specific attack types). The dataset contains 41 features:
- 38 quantitative features
- 3 qualitative features
π Dataset Link: KDD Cup 1999 Dataset
To enhance model performance, Mutual Information was used for feature selection, reducing the dataset to the most impactful variables. Exploratory Data Analysis (EDA) was conducted using Plotly to uncover patterns and insights within the data.
Several machine learning models were tested to achieve high accuracy in intrusion detection:
- Decision Trees
- Random Forests
- Gradient Boosting (XGBoost, LightGBM)
- Logistic Regression
- Support Vector Machines (SVM)
- NaΓ―ve Bayes
Hyperparameter tuning was performed using Optuna, and model evaluation was based on:
- Accuracy
- F1-score
- Precision & Recall
A Voting Classifier ensemble approach delivered the best results, combining the strengths of multiple models to enhance detection accuracy.
To make the system accessible, a Streamlit-based UI was developed, allowing users to:
- Upload network data
- View real-time predictions
Deployment is in progress on Streamlit Cloud for public accessibility.
- π Kaggle Notebook: Explore the code & insights
Ensure you have Python installed along with the following dependencies:
pip install -r requirements.txt- Clone this repository:
git clone https://github.com/yourusername/NetworkAnamolyIntrusionDetection.git cd NetworkAnamolyIntrusionDetection - Install dependencies:
pip install -r requirements.txt
- Run the Streamlit UI:
streamlit run app.py
Let's build a more secure digital world together! ππ