This project leverages Multilingual BERT, LoRA fine-tuning, and SHAP explainability to classify, interpret, and rewrite Turkish tweets that contain hate speech. It combines modern NLP tools with transformers and parameter-efficient tuning to handle sensitive content in Turkish social media posts.
- Dataset: 16k Turkish hate speech tweets classified into hateful (nefret) and not hateful (hiçbiri).
- Model:
bert-base-multilingual-casedwith PEFT/LoRA for efficient fine-tuning. - Explainability: SHAP token-level explanations visualize what contributed to hate predictions.
- Rewriting: Hateful content is paraphrased using LLaMA-3 to preserve tone but remove offensive phrases.
Dataset is publicly available on Kaggle:
🔗 Turkish Hate Speech Tweets Dataset
- EDA & Visualizations: Tweet length, word count, label distributions, word cloud.
- Training Setup: 4-bit quantized BERT + LoRA (PEFT).
- Metrics: Accuracy, F1-macro, precision, recall.
- Inference: Classifies tweets into "nefret" or "hiçbiri".
- Explanation: SHAP values highlight which tokens drive classification.
- Rewriting: LLaMA-based generation of respectful alternative tweets.
text = "Bu kadar kötü bir şey olabilir mi?"
result = classify_text(text)
print(result)

