Seanaaa0

sean Seanaaa0

RL/AI research engineer

Pinned Loading

Maze_RL Maze_RL Public

A custom POMDP maze environment for studying agent reasoning, uncertainty, partial observation, and world-model learning. Produced structured trajectories for training GPT-based reasoning models.

Python 2
PlainGPT PlainGPT Public

A compact, fully self-contained Transformer framework for LoRA fine-tuning and text generation.

Python 1
AntWorld AntWorld Public

AntWorld – Multi-ant grid simulation with local memory, pheromones, and 5×5 communication（多螞蟻網格模擬，用於世界模型與強化學習實驗）

Python 1
QT-R1 QT-R1 Public

STaR × S1 math pipeline on Qwen2.5-1.5B. LoRA, strict Final: format, ~20–30% acc (OpenR1-Math split).

Python 1