Skip to content

mihneaptu/caecus-skill

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

caecus logo

caecus-skill

A vision-capable Orchestrator reads your design. GLM 5.2 writes the code. A verify loop closes the gap.

caecus-skill is a collection of Agent Skills that turn a visual design (an image, a video, or an inspiration source) into working frontend code through a closed-loop generate-and-verify cycle.

Beta: v0.1.0-beta.6, installed from mihneaptu/caecus-skill. The current release is tracked in CHANGELOG.md.

Docs site: mihneaptu.github.io/caecus-skill. Guide, docs, and changelog in one place.


What's different

  • Closed-loop verification. The Orchestrator renders the result and compares it to your original design, then sends GLM design feedback until they match. Open-loop generators skip this check.
  • Hard role boundary. The Orchestrator sees and judges. GLM 5.2 writes every line of code. Neither crosses into the other's work.
  • Fidelity to the source. You measure the result against the real design you handed in.
  • More than images. Video and inspiration sources work too.

How it works

  1. Design in. Hand the Orchestrator an image, video, or inspiration source.
  2. Brief out. The Orchestrator writes a Design_Brief in plain language.
  3. Code out. Paste the brief into GLM 5.2. It returns frontend code.
  4. Render and compare. Run the code. The Orchestrator compares the rendered result to the original.
  5. Iterate. If it does not match, the Orchestrator gives design-only feedback; GLM 5.2 regenerates. The loop runs until it matches or the default limit of five rounds is reached.

The rule: the Orchestrator talks about design. GLM 5.2 writes the code.

No API keys. No MCP. No CLI. You are the bridge between the two models.

Skills in the collection

The first four run on the Orchestrator. The last runs on the code-author side (GLM 5.2).

Skill Side What it does
image-to-prompt Orchestrator Image / screenshot / mockup → Design_Brief
video-to-prompt Orchestrator Video → Design_Brief
inspiration-to-prompt Orchestrator Inspiration source → Design_Brief (inspiration, not a copy)
generate-and-verify Orchestrator Brief → code → render → screenshot → verify loop
glm-codegen Code author Teaches GLM 5.2 the brief format, fidelity rules, and iteration protocol

Which skill should I use?

  • Start here: image-to-prompt + generate-and-verify. That is the whole loop: a still design in, verified code out.
  • Recreating a screen recording or a motion-heavy UI? Add video-to-prompt.
  • Building from a reference you admire rather than copying it 1:1? Add inspiration-to-prompt.
  • On the GLM 5.2 side, always load glm-codegen. It teaches GLM the brief format and keeps its output faithful. It is loaded on the code-author side, not with -s (see the FAQ).

Get started

What you need

  • A vision-capable Orchestrator with browser / screenshot tools. Recommended: GPT 5.5. The Orchestrator role is mostly agentic (drive a browser, render the code, screenshot it, run the feedback loop), and GPT 5.5 leads on computer use and autonomous task execution, which is that loop. Gemini 3.1 Pro is the strongest alternative: it leads on pure multimodal/vision, so it edges ahead on the visual-compare lens itself. Claude Opus 4.8 also works well and tops the general intelligence rankings. Whatever you pick must be able to view images and render/screenshot the result itself. It runs inside any agent host that loads Agent Skills (Claude Code, Cursor, Codex, and similar).
  • Access to GLM 5.2 via chat.z.ai, OpenCode, ZCode, OpenRouter, or a GLM Coding Plan.
  • Node.js 18+, so npx skills add runs.

Install the Orchestrator skills

npx skills add mihneaptu/caecus-skill -s image-to-prompt video-to-prompt inspiration-to-prompt generate-and-verify

Install only the ones you need:

npx skills add mihneaptu/caecus-skill -s image-to-prompt generate-and-verify

Set up GLM 5.2

glm-codegen is loaded on the GLM 5.2 side, not through the -s flag. Use the bundled system prompt at skills/glm-codegen/references/glm-system-prompt.md — one paste includes the skill plus fidelity and anti-tell rules.

Host How to load glm-codegen
chat.z.ai Paste skills/glm-codegen/references/glm-system-prompt.md as a system prompt once per session.
OpenCode / ZCode Add glm-system-prompt.md as a skill in the GLM 5.2 agent config (or load glm-codegen and paste the bundle).
OpenRouter Paste the bundle as system prompt, or load via your agent host's skill mechanism if available.

Orchestrator hosts (quick notes)

Host Install skills Tips
Cursor npx skills add mihneaptu/caecus-skill -s image-to-prompt generate-and-verify Attach the source image each round; use browser/screenshot tools for renders.
Claude Code Same npx skills add command Save the original design path early; re-open it every compare turn.
Codex Same npx skills add command Fresh chat per design keeps the visual compare honest.

Full walkthrough: site guide.

Starter prompt (copy into the Orchestrator)

I want to run the caecus pipeline on this design.

1. Use image-to-prompt on the attached image and write a complete Design_Brief.
2. I'll paste that brief to GLM 5.2; give me the brief only when it's ready.
3. When I bring back GLM's code, use generate-and-verify: render it, screenshot it,
   and compare against the original image. Feedback to GLM must be design words only.
4. Repeat until it matches or we hit 5 rounds.

Set up your workspace

Pick a folder for this run (your workspace root). Create three subfolders if they are not there yet:

your-workspace/
├─ design/        : original image + Design_Brief (Orchestrator only)
├─ glm-5.2/       : code GLM writes (you save it here verbatim)
└─ screenshots/   : renders the Orchestrator captures for compare

GLM 5.2 never sees images, only the brief and design-feedback text you paste.

Run once

  1. Give the Orchestrator your design (save the source in design/; or use a bundled reference image below).
  2. Paste the Design_Brief from design/ into GLM 5.2.
  3. Save GLM's code verbatim in glm-5.2/, run it, and give the Orchestrator a screenshot (saved in screenshots/).
  4. The Orchestrator compares the original and screenshot, then gives you design-only feedback text to paste back to GLM. Repeat until the match is good enough.

Try the bundled examples. Each ships a reference source image you can attach:

Tested with

Cursor, Claude Code, Codex (Orchestrator hosts); chat.z.ai, OpenCode, ZCode, OpenRouter (GLM 5.2). Any vision-capable agent host that loads Agent Skills and can view images should work.

Security & trust

Installing a skill writes only Markdown and referenced assets. Nothing executes. No install hooks, no build step, no secrets required. The workflow is a human copying a brief and code between two chat models, so the skills themselves make no network calls and need no API keys.

FAQ

How is this different from "taste" or anti-slop design skills? Those add good taste to a single agent and stop there: open-loop, with no specific source to match. caecus reproduces a particular design you hand it and verifies the render against that original through a screenshot-and-compare loop. It holds the result to a real source you provide. You can still pair it with a taste skill on the GLM side.

Why does install use -s? Why not npx skills add mihneaptu/caecus-skill? The bare command installs every skill in the repo, including glm-codegen, which belongs on the GLM 5.2 side, not the Orchestrator. The -s flag installs only the Orchestrator skills and keeps that side clean. Load glm-codegen separately on GLM.

Does it work with React, Vue, or Svelte? The brief targets design intent, not one framework. Defaults are plain HTML/CSS for simple static designs and React + Tailwind for interactive ones; name any other stack in the brief's target_stack.

What is a SKILL.md? A portable Markdown instruction file an agent loads automatically. Install it with npx skills add, or copy it into your project or paste it into a chat.

Does anything run on my machine when I install? No. Installing writes only Markdown and referenced assets: no scripts, no build, no secrets. See Security & trust above.

Feedback

This is a beta. Reports make it better.

Give beta feedback →

Contributing

This is an early beta and a solo project. If you have feedback or want to propose a skill, open an issue. Contribution guidelines will land as the project grows.

License

The collection is licensed under Apache License 2.0. See LICENSE. The license covers the code and content. It does not grant rights to the caecus name or brand.

The caecus name & brand

The caecus name and brand are protected separately from the license. Forks and derivatives are welcome under Apache-2.0, but they must not imply they are the official project.

About

Turn a visual design into frontend code, then verify the render against the original until it matches. A vision model judges, GLM 5.2 writes.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors