Data-driven analysis of TCAS Engineering Programs using Flask, Pandas, and Typhoon LLM
TCAS Dashboard is a web application for analyzing and comparing per-semester tuition costs of Computer Engineering programs across universities in Thailand. The project combines Data Engineering, NLP, and LLM-assisted information extraction into a single, end-to-end pipeline.
This repository is intentionally designed from a senior / production-oriented perspective, focusing on:
- Clear data pipeline separation
- Real-world Thai unstructured text challenges
- Explicit comparison between rule-based (Regex) and LLM-based extraction
The raw data is collected from the MyTCAS API and processed using Typhoon AI (LLM) before being visualized through a clean and interactive Flask Dashboard.
- Automated Data Collection from MyTCAS API
- Text Cleaning & Normalization (Regex vs LLM)
- LLM-assisted Information Extraction using Typhoon AI
- Interactive Dashboard with tables, charts, and cost rankings
- Experimental-ready Architecture for extending to other engineering majors
Example visualizations: tables and charts ranked by per-semester tuition cost
This repository intentionally implements two different approaches for extracting tuition cost information from Thai text.
| Approach | Description | Limitations |
|---|---|---|
| Regex-based | Rule-based pattern matching | Brittle, hard to scale, sensitive to text variations |
| LLM-based (Typhoon) | Context-aware extraction using an LLM | Requires API usage and incurs cost |
π Only the LLM-based results are used in the production dashboard, as they provide significantly better robustness and coverage for real-world data.
TCAS_dashboard/
β
βββ app.py # Flask application for rendering the dashboard
βββ scraping_typhoon.py # Data pipeline: fetch β clean β extract using Typhoon LLM
β
βββ data/ # Cleaned datasets (.csv / .xlsx)
β βββ regex_cleaned/
β βββ llm_cleaned/
β
βββ scripts/ # Notebooks and scripts for scraping and preprocessing
β
βββ templates/ # HTML templates and dashboard assets
β βββ dashboard.html
β βββ *.jpeg
β
βββ experimental/ # Experiments extending the same logic to other engineering fields
β # (e.g., Electrical, Civil Engineering)
β
βββ scrap_regex.ipynb # Baseline: Regex-only extraction (no LLM)
βββ requirements.txt # Python dependencies
βββ .env.example # Environment variable template
βββ .gitignore
βββ README.md
git clone https://github.com/xooooiz7/TCAS_dashboard.git
cd TCAS_dashboard
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtpython app.pyOpen your browser at: http://127.0.0.1:5000
To understand the limitations of rule-based text extraction:
jupyter notebook scrap_regex.ipynbcp .env.example .env
# Add your API key
# TYPHOON_API_KEY=YOUR_TYPHOON_API_KEY
python scraping_typhoon.pyThe cleaned output will be stored in the data/ directory and used by the dashboard.
- Clear separation between data ingestion, processing, and visualization layers
- Easily swappable data sources and extraction strategies
- LLM usage is scoped only to tasks where rule-based methods do not scale
- Repository structure is designed for extensibility, not just demo purposes
- Add filtering by university and region
- Compare tuition costs with curriculum quality proxies
- Cache LLM responses to reduce API costs
- Production deployment (Docker + Gunicorn)
This project is for educational and engineering experimentation purposes only. It is not an official system of TCAS or MyTCAS.
Built with β€οΈ using Flask, Pandas, and Typhoon AI

