A vision-capable Orchestrator reads your design. GLM 5.2 writes the code. A verify loop closes the gap.
caecus-skill is a collection of Agent Skills that turn a visual design (an image, a video, or an inspiration source) into working frontend code through a closed-loop generate-and-verify cycle.
Beta: v0.1.0-beta.6, installed from mihneaptu/caecus-skill. The current release is tracked in CHANGELOG.md.
Docs site: mihneaptu.github.io/caecus-skill. Guide, docs, and changelog in one place.
- Closed-loop verification. The Orchestrator renders the result and compares it to your original design, then sends GLM design feedback until they match. Open-loop generators skip this check.
- Hard role boundary. The Orchestrator sees and judges. GLM 5.2 writes every line of code. Neither crosses into the other's work.
- Fidelity to the source. You measure the result against the real design you handed in.
- More than images. Video and inspiration sources work too.
- Design in. Hand the Orchestrator an image, video, or inspiration source.
- Brief out. The Orchestrator writes a
Design_Briefin plain language. - Code out. Paste the brief into GLM 5.2. It returns frontend code.
- Render and compare. Run the code. The Orchestrator compares the rendered result to the original.
- Iterate. If it does not match, the Orchestrator gives design-only feedback; GLM 5.2 regenerates. The loop runs until it matches or the default limit of five rounds is reached.
The rule: the Orchestrator talks about design. GLM 5.2 writes the code.
No API keys. No MCP. No CLI. You are the bridge between the two models.
The first four run on the Orchestrator. The last runs on the code-author side (GLM 5.2).
| Skill | Side | What it does |
|---|---|---|
image-to-prompt |
Orchestrator | Image / screenshot / mockup → Design_Brief |
video-to-prompt |
Orchestrator | Video → Design_Brief |
inspiration-to-prompt |
Orchestrator | Inspiration source → Design_Brief (inspiration, not a copy) |
generate-and-verify |
Orchestrator | Brief → code → render → screenshot → verify loop |
glm-codegen |
Code author | Teaches GLM 5.2 the brief format, fidelity rules, and iteration protocol |
- Start here:
image-to-prompt+generate-and-verify. That is the whole loop: a still design in, verified code out. - Recreating a screen recording or a motion-heavy UI? Add
video-to-prompt. - Building from a reference you admire rather than copying it 1:1? Add
inspiration-to-prompt. - On the GLM 5.2 side, always load
glm-codegen. It teaches GLM the brief format and keeps its output faithful. It is loaded on the code-author side, not with-s(see the FAQ).
- A vision-capable Orchestrator with browser / screenshot tools. Recommended: GPT 5.5. The Orchestrator role is mostly agentic (drive a browser, render the code, screenshot it, run the feedback loop), and GPT 5.5 leads on computer use and autonomous task execution, which is that loop. Gemini 3.1 Pro is the strongest alternative: it leads on pure multimodal/vision, so it edges ahead on the visual-compare lens itself. Claude Opus 4.8 also works well and tops the general intelligence rankings. Whatever you pick must be able to view images and render/screenshot the result itself. It runs inside any agent host that loads Agent Skills (Claude Code, Cursor, Codex, and similar).
- Access to GLM 5.2 via chat.z.ai, OpenCode, ZCode, OpenRouter, or a GLM Coding Plan.
- Node.js 18+, so
npx skills addruns.
npx skills add mihneaptu/caecus-skill -s image-to-prompt video-to-prompt inspiration-to-prompt generate-and-verifyInstall only the ones you need:
npx skills add mihneaptu/caecus-skill -s image-to-prompt generate-and-verifyglm-codegen is loaded on the GLM 5.2 side, not through the -s flag. Use the bundled
system prompt at skills/glm-codegen/references/glm-system-prompt.md
— one paste includes the skill plus fidelity and anti-tell rules.
| Host | How to load glm-codegen |
|---|---|
| chat.z.ai | Paste skills/glm-codegen/references/glm-system-prompt.md as a system prompt once per session. |
| OpenCode / ZCode | Add glm-system-prompt.md as a skill in the GLM 5.2 agent config (or load glm-codegen and paste the bundle). |
| OpenRouter | Paste the bundle as system prompt, or load via your agent host's skill mechanism if available. |
| Host | Install skills | Tips |
|---|---|---|
| Cursor | npx skills add mihneaptu/caecus-skill -s image-to-prompt generate-and-verify |
Attach the source image each round; use browser/screenshot tools for renders. |
| Claude Code | Same npx skills add command |
Save the original design path early; re-open it every compare turn. |
| Codex | Same npx skills add command |
Fresh chat per design keeps the visual compare honest. |
Full walkthrough: site guide.
I want to run the caecus pipeline on this design.
1. Use image-to-prompt on the attached image and write a complete Design_Brief.
2. I'll paste that brief to GLM 5.2; give me the brief only when it's ready.
3. When I bring back GLM's code, use generate-and-verify: render it, screenshot it,
and compare against the original image. Feedback to GLM must be design words only.
4. Repeat until it matches or we hit 5 rounds.
Pick a folder for this run (your workspace root). Create three subfolders if they are not there yet:
your-workspace/
├─ design/ : original image + Design_Brief (Orchestrator only)
├─ glm-5.2/ : code GLM writes (you save it here verbatim)
└─ screenshots/ : renders the Orchestrator captures for compare
GLM 5.2 never sees images, only the brief and design-feedback text you paste.
- Give the Orchestrator your design (save the source in
design/; or use a bundled reference image below). - Paste the
Design_Brieffromdesign/into GLM 5.2. - Save GLM's code verbatim in
glm-5.2/, run it, and give the Orchestrator a screenshot (saved inscreenshots/). - The Orchestrator compares the original and screenshot, then gives you design-only feedback text to paste back to GLM. Repeat until the match is good enough.
Try the bundled examples. Each ships a reference source image you can attach:
- Minimal starter:
three-icon-example.md+three-icon-source.jpg - Showcase (editorial landing):
awwwards-landing-example.md+awwwards-landing-source.jpg(result:awwwards-landing-result.html)
Cursor, Claude Code, Codex (Orchestrator hosts); chat.z.ai, OpenCode, ZCode, OpenRouter (GLM 5.2). Any vision-capable agent host that loads Agent Skills and can view images should work.
Installing a skill writes only Markdown and referenced assets. Nothing executes. No install hooks, no build step, no secrets required. The workflow is a human copying a brief and code between two chat models, so the skills themselves make no network calls and need no API keys.
How is this different from "taste" or anti-slop design skills? Those add good taste to a single agent and stop there: open-loop, with no specific source to match. caecus reproduces a particular design you hand it and verifies the render against that original through a screenshot-and-compare loop. It holds the result to a real source you provide. You can still pair it with a taste skill on the GLM side.
Why does install use -s? Why not npx skills add mihneaptu/caecus-skill?
The bare command installs every skill in the repo, including glm-codegen, which belongs on the GLM 5.2 side, not the Orchestrator. The -s flag installs only the Orchestrator skills and keeps that side clean. Load glm-codegen separately on GLM.
Does it work with React, Vue, or Svelte?
The brief targets design intent, not one framework. Defaults are plain HTML/CSS for simple static designs and React + Tailwind for interactive ones; name any other stack in the brief's target_stack.
What is a SKILL.md?
A portable Markdown instruction file an agent loads automatically. Install it with npx skills add, or copy it into your project or paste it into a chat.
Does anything run on my machine when I install? No. Installing writes only Markdown and referenced assets: no scripts, no build, no secrets. See Security & trust above.
This is a beta. Reports make it better.
This is an early beta and a solo project. If you have feedback or want to propose a skill, open an issue. Contribution guidelines will land as the project grows.
The collection is licensed under Apache License 2.0. See LICENSE. The license covers the code and content. It does not grant rights to the caecus name or brand.
The caecus name and brand are protected separately from the license. Forks and derivatives are welcome under Apache-2.0, but they must not imply they are the official project.
