This repository contains an end-to-end machine learning project based on the Kaggle "Real or Not? NLP with Disaster Tweets" dataset. The goal is to classify whether a tweet is about a real disaster or not.
We cover the complete pipeline: from data upload to AWS RDS, ETL and cleaning, training and evaluating deep learning models (NNs and Transformers), to interactive dashboard creation in Tableau.
disaster-tweets-classification/ β βββ notebooks/ # All Python notebooks used for data and model pipelines βββ tableau/ # Tableau dashboard (.twbx) and screenshots βββ prep/ # Tableau Prep flow overview βββ data/ # (Placeholder) Local CSVs from Kaggle β not included in repo βββ README.md # This file
The dataset was downloaded from Kaggle and includes:
train.csv: 7,613 labeled tweetstest.csv: 3,263 unlabeled tweets
CSV files were uploaded to an AWS RDS instance (PostgreSQL) using SQLAlchemy in a Jupyter Notebook.
ETL and cleaning steps were handled using:
- PostgreSQL (AWS RDS) for centralized storage
- SQL queries for cleaning and preparing data
- Tableau Prep for advanced joins, transformations, and pivoting of model output metrics
Three transformed tables were exported back to RDS from Tableau Prep and later used in Tableau Desktop.
We trained two families of models:
- Neural Networks (NNs) β Multiple configurations using Keras
- Transformers β Fine-tuning DistilBERT using HuggingFace π€
Each model was evaluated using:
- Accuracy, Precision, Recall, F1-Score
- Confusion Matrix
- Classification Report
Results were saved and merged for visualization purposes.
A full dashboard was created using Tableau Desktop and published to Tableau Public (https://public.tableau.com/app/profile/eros1782/vizzes). It includes:
- F1-Score by class (0 = no disaster, 1 = disaster)
- Confusion Matrix heatmap
- Accuracy and Loss by model
- Comparison of classification metrics using a color-coded table
All Tableau visualizations use the data extracted from the AWS RDS tables prepared in Tableau Prep.
- π οΈ Full ETL pipeline on cloud infrastructure (AWS RDS)
- π€ Deep Learning with both classic NNs and modern Transformers
- π Interactive visual analytics in Tableau
- π Modular, reproducible structure for deployment or reuse
-
Clone the repo:
git clone https://github.com/NirgalFromMars/disaster-tweets-classification.git -
(Optional) Set up a Python environment using requirements.txt (to be added)
-
Open and explore the Jupyter Notebooks in the
/notebooksfolder -
Open
dashboard.twbxwith Tableau Desktop or view it online via Tableau Public
- Dataset: Kaggle NLP Disaster Tweets Competition
- Transformer models: HuggingFace π€
- Visualizations: Tableau Desktop + Tableau Prep
This project is open-sourced under the MIT License.