A Streamlit-based web application that detects whether a given text contains hate speech, offensive language, or clean content (Not hate) using a Decision Tree Classifier trained on labeled tweet data.
This project demonstrates a simple yet effective Natural Language Processing (NLP) pipeline for classifying text as Hate Speech, Offensive Language, or Not Hate Speech. The app allows users to input any text, which is then cleaned, processed, and classified using a trained Decision Tree model.
It includes a visually appealing Streamlit interface with color-coded results for easy interpretation.
π Want to try it out? Visit the live app here:
π https://hatespeechdetection-vikash.streamlit.app
- π§Ό Text Preprocessing: URL removal, punctuation cleaning, stopword filtering, and stemming.
- π¬ Real-Time Prediction: Classifies input text instantly when submitted.
- π¨ Custom UI Styling: Styled using HTML and CSS for a clean and modern look.
- π Model Training & Evaluation: Decision Tree Classifier trained using
CountVectorizerfeatures. - πΎ Pickle Integration: Uses preprocessed data and trained model stored as
.pklfiles.
git clone https://github.com/yourusername/hate-speech-detection.git
cd hate-speech-detectionpython -m venv venv
source venv/bin/activate # On Linux/Mac
venv\Scripts\activate # On WindowsCreate a requirements.txt file with the following:
streamlit
nltk
scikit-learn
pandas
numpy
matplotlib
seabornThen install them:
pip install -r requirements.txtpython -m nltk.downloader stopwordsstreamlit run app.py- Vectorizer:
CountVectorizer() - Algorithm:
DecisionTreeClassifier() - Training Data: Preprocessed tweets labeled as
- Hate Speech
- Offensive Language
- Not Hate Speech
The model is trained on cleaned text data (clean_data.pkl) and label data (dataset.pkl).
The cleaning() function performs several preprocessing steps:
- Lowercasing
- Removing URLs, HTML tags, mentions, digits, and punctuation
- Removing stopwords
- Stemming words using
SnowballStemmer
| Category | Tools/Libraries |
|---|---|
| Frontend UI | Streamlit, HTML, CSS |
| NLP | NLTK |
| ML | scikit-learn |
| Data Handling | pandas, numpy |
| Visualization | matplotlib, seaborn |
- π Integrate advanced models like Logistic Regression, Random Forest, or BERT.
- π£οΈ Add multilingual support for hate speech detection.
- π Include data visualization dashboards for model insights.