Designing resilient toolkits and scalable RL environments for CAMEL terminal agents
git clone --recurse-submodules https://github.com/camel-ai/seta.git
cd seta
bash setup.shThree runtime options β choose one:
Option A: Local Docker (single machine, no extra setup)
# uses eval_default.yaml (env_type: docker)
--config scripts/evaluation/configs/eval_default.yamlOption B: Remote Docker (multiple nodes via slot pool service)
# 1. start slot pool service first
bash seta_env/runtimes/slot_pool_service/start.sh --dataset seta-env-v2
# 2. uses eval_remote.yaml (env_type: remote_docker)
--config scripts/evaluation/configs/eval_remote.yamlOption C: Env Service (remote CPU servers for agent execution, see env_service)
# 1. deploy env_service to CPU servers + start scheduler
GH_TOKEN=ghp_xxx HF_TOKEN=hf_xxx bash seta_env/services/start.sh --dataset seta-env-v2
# 2. run eval via AReaL launcher
python -m areal.launcher.local scripts/areal/eval_env_service.py \
--config scripts/areal/configs/config_eval_env_service_seta_v2.yaml# start model server
python -m sglang.launch_server --model Qwen/Qwen3-8B --port 30000
# run eval (dataset auto-downloads on first use)
python scripts/evaluation/eval.py --config scripts/evaluation/configs/eval_default.yaml
# sweep across models and datasets
python scripts/evaluation/sweep_eval.py scripts/evaluation/configs/sweep.yaml
# results β outputs/eval/<experiment>/<trial>/summary.json, results.csv# RL training
python -m areal.launcher.local \
scripts/areal/rl_train.py \
--config scripts/areal/configs/config_eval.yaml
# eval only (no gradient updates, single GPU)
python -m areal.launcher.local \
scripts/areal/eval.py \
--config scripts/areal/configs/config_eval.yaml \
allocation_mode=sglang:d1p1t1+eval
# results β outputs/areal/experiments/<experiment>/<trial>/RL (GRPO) training on the seta_env env service with the miles framework β disaggregated, session-server rollout with Daytona sandboxes. Two models are wired up end-to-end:
# 1. one-time: download + convert the model to a torch-dist checkpoint
python scripts/miles/run_glm47_flash_seta_session_server.py prepare # GLM-4.7-Flash
python scripts/miles/run_deepseek_v4_seta_session_server.py prepare # DeepSeek-V4-Flash-FP8
# 2. launch training (restarts env_service + submits the Ray job across the cluster)
bash scripts/miles/run_glm47_flash_seta_session_server.sh # GLM-4.7-Flash
bash scripts/miles/run_deepseek_v4_seta_session_server.sh # DeepSeek-V4-Flash-FP8
# results β /data/training_runs/<run>/ (checkpoints, trials, wandb, ray_job.log)Requires an 8-node Ray cluster, DAYTONA_* / WANDB_API_KEY / HF_TOKEN in ~/.bashrc, and a task
dataset registered under DATASET_ROOT. Full setup, config layout, dataset format, and tuning are in
scripts/miles/README.md.
- Configuration β what to change (model, dataset, runtime) and what to leave alone
- Dataset β download and register datasets
- Evaluation β run eval with local or remote Docker
- Slot Pool Service β distribute environments across remote nodes
- Env Service β remote TerminalEnvironment execution on CPU servers
- Results β what each evaluation records and what the fields mean
- Training β AReaL RL training
- Miles Training β miles RL training (GLM-4.7-Flash, DeepSeek-V4) via env_service + Daytona
- Experiments β log of training and evaluation runs
The miles-based RL training pipeline (scripts/miles/, the seta_env session-server wiring, and the
Daytona environment integration) was built in collaboration with the RadixArk miles team. Thank
you for the miles framework and for the support throughout.
@misc{seta,
author = {Qijia Shen and Jay Rainton and Aznaur Aliev and Ahmed Awelkair and Boyuan Ma and Zhiqi (Julie) Huang and Yuzhen Mao and Wendong Fan and Philip Torr and Bernard Ghanem and Changran Hu and Urmish Thakker and Guohao Li},
title = {{SETA: Scaling Environments for Terminal Agents}},
year = {2026},
month = jan,
url = {https://github.com/camel-ai/seta},
note = {Blog: \url{https://eigent-ai.notion.site/SETA-Scaling-Environments-for-Terminal-Agents-2d2511c70ba280a9b7c0fe3e7f1b6ab8}}
}
