Skip to content
View Seanaaa0's full-sized avatar

Block or report Seanaaa0

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Pinned Loading

  1. Maze_RL Maze_RL Public

    A custom POMDP maze environment for studying agent reasoning, uncertainty, partial observation, and world-model learning. Produced structured trajectories for training GPT-based reasoning models.

    Python 2

  2. PlainGPT PlainGPT Public

    A compact, fully self-contained Transformer framework for LoRA fine-tuning and text generation.

    Python 1

  3. AntWorld AntWorld Public

    AntWorld – Multi-ant grid simulation with local memory, pheromones, and 5×5 communication(多螞蟻網格模擬,用於世界模型與強化學習實驗)

    Python 1

  4. QT-R1 QT-R1 Public

    STaR × S1 math pipeline on Qwen2.5-1.5B. LoRA, strict Final: format, ~20–30% acc (OpenR1-Math split).

    Python 1