TCAS Dashboard

Data-driven analysis of TCAS Engineering Programs using Flask, Pandas, and Typhoon LLM

TCAS Dashboard is a web application for analyzing and comparing per-semester tuition costs of Computer Engineering programs across universities in Thailand. The project combines Data Engineering, NLP, and LLM-assisted information extraction into a single, end-to-end pipeline.

This repository is intentionally designed from a senior / production-oriented perspective, focusing on:

Clear data pipeline separation
Real-world Thai unstructured text challenges
Explicit comparison between rule-based (Regex) and LLM-based extraction

The raw data is collected from the MyTCAS API and processed using Typhoon AI (LLM) before being visualized through a clean and interactive Flask Dashboard.

Key Features

Automated Data Collection from MyTCAS API
Text Cleaning & Normalization (Regex vs LLM)
LLM-assisted Information Extraction using Typhoon AI
Interactive Dashboard with tables, charts, and cost rankings
Experimental-ready Architecture for extending to other engineering majors

Dashboard Preview

Example visualizations: tables and charts ranked by per-semester tuition cost

Regex vs LLM: Design Rationale

This repository intentionally implements two different approaches for extracting tuition cost information from Thai text.

Approach	Description	Limitations
Regex-based	Rule-based pattern matching	Brittle, hard to scale, sensitive to text variations
LLM-based (Typhoon)	Context-aware extraction using an LLM	Requires API usage and incurs cost

👉 Only the LLM-based results are used in the production dashboard, as they provide significantly better robustness and coverage for real-world data.

Project Structure

TCAS_dashboard/
│
├── app.py                 # Flask application for rendering the dashboard
├── scraping_typhoon.py    # Data pipeline: fetch → clean → extract using Typhoon LLM
│
├── data/                  # Cleaned datasets (.csv / .xlsx)
│   ├── regex_cleaned/
│   └── llm_cleaned/
│
├── scripts/               # Notebooks and scripts for scraping and preprocessing
│
├── templates/             # HTML templates and dashboard assets
│   ├── dashboard.html
│   └── *.jpeg
│
├── experimental/          # Experiments extending the same logic to other engineering fields
│                           # (e.g., Electrical, Civil Engineering)
│
├── scrap_regex.ipynb      # Baseline: Regex-only extraction (no LLM)
├── requirements.txt       # Python dependencies
├── .env.example           # Environment variable template
├── .gitignore
└── README.md

Installation

git clone https://github.com/xooooiz7/TCAS_dashboard.git
cd TCAS_dashboard

python3 -m venv .venv
source .venv/bin/activate

pip install -r requirements.txt

Usage

1️⃣ Run the Dashboard

python app.py

Open your browser at: http://127.0.0.1:5000

2️⃣ Baseline: Regex-only Extraction

To understand the limitations of rule-based text extraction:

jupyter notebook scrap_regex.ipynb

3️⃣ LLM-based Extraction (Typhoon AI)

cp .env.example .env
# Add your API key
# TYPHOON_API_KEY=YOUR_TYPHOON_API_KEY

python scraping_typhoon.py

The cleaned output will be stored in the data/ directory and used by the dashboard.

Engineering Design Notes

Clear separation between data ingestion, processing, and visualization layers
Easily swappable data sources and extraction strategies
LLM usage is scoped only to tasks where rule-based methods do not scale
Repository structure is designed for extensibility, not just demo purposes

Future Work

Add filtering by university and region
Compare tuition costs with curriculum quality proxies
Cache LLM responses to reduce API costs
Production deployment (Docker + Gunicorn)

Disclaimer

This project is for educational and engineering experimentation purposes only. It is not an official system of TCAS or MyTCAS.

Built with ❤️ using Flask, Pandas, and Typhoon AI

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TCAS Dashboard

Key Features

Dashboard Preview

Regex vs LLM: Design Rationale

Project Structure

Installation

Usage

1️⃣ Run the Dashboard

2️⃣ Baseline: Regex-only Extraction

3️⃣ LLM-based Extraction (Typhoon AI)

Engineering Design Notes

Future Work

Disclaimer

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
data		data
experimental		experimental
scripts		scripts
templates		templates
.DS_Store		.DS_Store
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

sitta07/LLM-Regex-DataPrep

Folders and files

Latest commit

History

Repository files navigation

TCAS Dashboard

Key Features

Dashboard Preview

Regex vs LLM: Design Rationale

Project Structure

Installation

Usage

1️⃣ Run the Dashboard

2️⃣ Baseline: Regex-only Extraction

3️⃣ LLM-based Extraction (Typhoon AI)

Engineering Design Notes

Future Work

Disclaimer

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages