Train in a Weekend Serve to Thousands

Compose experiments across SFT, Reinforcement Learning, and more on your own production code.

Book a demo

Use Synth with:

uvx synth-ai claude

Full-Stack Reinforcement Learning

Scalable pipelines for long-horizon GRPO.

Outcome & rubric-based stepwise advantages

Bring your own judges or use ours

Serverless RL, shared models, and dev inference all in one place

Supervised Fine-Tuning

Spin up SFT jobs with curated datasets, auto-sharded compute, and full artifact tracking.

LoRA / QLoRA / full fine-tune in one click

Support for checkpointed online evaluation

DIY or use Managed Filtered Behavioral Cloning

Supported Models

Qwen 3

0.6B

1.7B

14B

32B

Default for demos; supports tool-calling.

Qwen 3 (Advanced)

4B-2507

30B-A3B

235B-A22B*

480B-A35B*

Enhanced variants with Instruct/Thinking modes and MoE support.

Qwen 3 Coder

30B-A3B

480B-A35B*

Specialized for code generation.

Pricing

Only pay for the GPU time you burn.

H100$4.35/hr

A100$2.75/hr

L40S$2.15/hr

A10$1.23/hr

It's Time to Train

View all docs Official GitHub library

Crafter SFT Loop

Collect traced rollouts, export JSONL, and launch a supervised job in minutes.

View doc ↗

Qwen Coder LoRA

Run the 30B adapter playbook with Synth configs, compute guidance, and tuning tips.

View doc ↗

Rejection Loop

Turn traced RL experience into curated JSONL, fine-tune, then evaluate the checkpoint.

View doc ↗

Math RL

Deploy a math task app, run smoke tests, and stream a full on-policy training run.

View doc ↗

Crafter On-Policy

Deploy a Crafter task app to Modal, verify health, and launch the production-style RL loop.

View doc ↗

Evaluation Playbook

Run hosted evals, pull trace stats, and turn results into the next fine-tuning dataset.

View doc ↗