Skip to content

Vicky9890/Hate_Speech_Detection

Repository files navigation

🧠 Hate Speech Detection Application

A Streamlit-based web application that detects whether a given text contains hate speech, offensive language, or clean content (Not hate) using a Decision Tree Classifier trained on labeled tweet data.


πŸš€ Overview

This project demonstrates a simple yet effective Natural Language Processing (NLP) pipeline for classifying text as Hate Speech, Offensive Language, or Not Hate Speech. The app allows users to input any text, which is then cleaned, processed, and classified using a trained Decision Tree model.

It includes a visually appealing Streamlit interface with color-coded results for easy interpretation.

πŸ”— Want to try it out? Visit the live app here:
πŸ‘‰ https://hatespeechdetection-vikash.streamlit.app


🧩 Features

  • 🧼 Text Preprocessing: URL removal, punctuation cleaning, stopword filtering, and stemming.
  • πŸ’¬ Real-Time Prediction: Classifies input text instantly when submitted.
  • 🎨 Custom UI Styling: Styled using HTML and CSS for a clean and modern look.
  • πŸ“Š Model Training & Evaluation: Decision Tree Classifier trained using CountVectorizer features.
  • πŸ’Ύ Pickle Integration: Uses preprocessed data and trained model stored as .pkl files.

βš™οΈ Installation & Setup

1. Clone the repository

git clone https://github.com/yourusername/hate-speech-detection.git
cd hate-speech-detection

2. Create a virtual environment (recommended)

python -m venv venv
source venv/bin/activate       # On Linux/Mac
venv\Scripts\activate        # On Windows

3. Install required dependencies

Create a requirements.txt file with the following:

streamlit
nltk
scikit-learn
pandas
numpy
matplotlib
seaborn

Then install them:

pip install -r requirements.txt

4. Download NLTK stopwords

python -m nltk.downloader stopwords

5. Run the app

streamlit run app.py

🧠 Model Details

  • Vectorizer: CountVectorizer()
  • Algorithm: DecisionTreeClassifier()
  • Training Data: Preprocessed tweets labeled as
    • Hate Speech
    • Offensive Language
    • Not Hate Speech

The model is trained on cleaned text data (clean_data.pkl) and label data (dataset.pkl).


🧹 Text Cleaning Steps

The cleaning() function performs several preprocessing steps:

  1. Lowercasing
  2. Removing URLs, HTML tags, mentions, digits, and punctuation
  3. Removing stopwords
  4. Stemming words using SnowballStemmer

πŸ–₯️ User Interface

πŸ”Ή Normal View:

image

πŸ”Ή Offensive:

image

πŸ”Ή Not hate sentence:

image

πŸ”Ή Hate sentence:

image

πŸ§‘β€πŸ’» Technologies Used

Category Tools/Libraries
Frontend UI Streamlit, HTML, CSS
NLP NLTK
ML scikit-learn
Data Handling pandas, numpy
Visualization matplotlib, seaborn

🏁 Future Improvements

  • πŸ” Integrate advanced models like Logistic Regression, Random Forest, or BERT.
  • πŸ—£οΈ Add multilingual support for hate speech detection.
  • πŸ“Š Include data visualization dashboards for model insights.

About

I created this Model to detect the Hate Speech tweeted by the users in social media plateform.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published