Skip to content

feat(analyzer): add phase-1 structured skill summaries#211

Open
rodboev wants to merge 4 commits into
NVIDIA:mainfrom
rodboev:pr/structured-skill-role-summary
Open

feat(analyzer): add phase-1 structured skill summaries#211
rodboev wants to merge 4 commits into
NVIDIA:mainfrom
rodboev:pr/structured-skill-role-summary

Conversation

@rodboev

@rodboev rodboev commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Summary

SkillSpector currently scans AISOP/AISP bundles as generic directories and never says that a structured workflow format was found or which workflow roles were present. This phase-1 slice teaches recursive discovery and build-context to recognize valid *.aisop.json bundles, then emits one LOW informational summary finding without changing scoring or existing rule behavior.

Refs #130

This follows the phased implementation sketch and AISOP/AISP bundle examples in #130.

Root cause

src/skillspector/multi_skill.py only recognizes root SKILL.md layouts or immediate subdirectories that contain SKILL.md or skill.md, so --recursive misses structured subdirectories that are expressed only through valid AISOP/AISP bundle files. src/skillspector/nodes/build_context.py builds generic scan context and never classifies those bundles, while src/skillspector/nodes/analyzers/__init__.py has no structured-skill analyzer in the registry-driven graph. The bundle parser also needed to follow the issue schema itself, with workflow nodes at user.content.functions and constraint strings in functions.*.constraints, rather than an invented nested aisop.functions shape. The result is simple: valid structured bundles can be scanned, but the report has no structured-skill summary surface at all.

Diff Notes

  • Add src/skillspector/structured_skill.py as the shared detector for valid *.aisop.json AISOP/AISP bundles and their summarized metadata, reading workflow nodes from user.content.functions and AISP resources from user.content.aisp_contract.resources.
  • Bound structured bundle metadata walking in src/skillspector/structured_skill.py and fail closed on over-nested bundles, so adversarial functions or resources trees are ignored instead of crashing build_context() or recursive skill detection.
  • Reuse that detector in src/skillspector/multi_skill.py so --recursive can treat immediate structured subdirectories as skills while preserving root SKILL.md precedence.
  • Populate structured_skill_context in src/skillspector/nodes/build_context.py and src/skillspector/state.py without changing the public report schema.
  • Add src/skillspector/nodes/analyzers/structured_skill_roles.py and register it in src/skillspector/nodes/analyzers/__init__.py so the graph emits one LOW SSR-1 structured summary finding.
  • Replay the branch onto current main and keep both the structured-skill assertions already on the branch and the newer upstream test additions in tests/nodes/test_build_context.py and tests/unit/test_cli.py.
  • Add focused coverage in tests/test_multi_skill.py, tests/nodes/test_build_context.py, tests/nodes/analyzers/test_registry.py, tests/nodes/analyzers/test_structured_skill_roles.py, and tests/unit/test_cli.py.

Scope

  • Phase 1 only. This PR detects valid AISOP/AISP bundles and summarizes them with one additive LOW finding.
  • No severity adjustments, no meta_analyzer changes, and no changes to existing static, behavioral, MCP, or semantic findings.
  • No report-schema additions beyond the existing Finding fields.
  • No Pi extension changes in extensions/skillspector.ts.
  • No full role-aware score downgrades, no malformed-bundle warning surface, and no claim of complete AISOP/AISP spec coverage.

Verification

  • python -m pytest tests/test_multi_skill.py tests/nodes/test_build_context.py tests/nodes/analyzers/test_registry.py tests/nodes/analyzers/test_structured_skill_roles.py tests/unit/test_cli.py -q - pass, 53 passed
  • uv run ruff check src/ tests/ - pass
  • uv run ruff format --check src/ tests/ - not clean locally; currently reports src/skillspector/nodes/analyzers/mcp_least_privilege.py
  • CI Lint & Test (Python 3.12), Lint & Test (Python 3.13), and DCO Check - pending rerun after push

Notes

  • This slice stops at detection and summary. Role-aware severity adjustments remain follow-up work on top of the new structured context.

@rng1995 rng1995 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting changes — the phase-1 structured-skill summary is well-built and thoroughly tested, but it introduces an unhandled-crash (DoS) path on adversarial input, which matters because the scanner's whole job is to ingest untrusted skills.

Blocking — unbounded recursion → uncaught RecursionError in the core build_context path.
structured_skill.py's walkers (_extract_function_names, _extract_constraint_anchors._walk, _extract_resource_anchors._walk) recurse per nesting level with no depth bound. A crafted *.aisop.json with deeply nested functions/constraints/resources (~2k levels) raises RecursionError. _parse_bundle_path only wraps json.loads in try/except (OSError, json.JSONDecodeError) and then calls _parse_bundle_payload(...) outside that guard, so the error propagates through extract_structured_skill_context() into build_context() — a core node, not an isolated analyzer — and crashes the whole scan. (json.loads itself tolerates the nesting; the pure-Python walk is what blows the stack. Reproduced at depth≈2000.) _is_structured_skill_bundle() in multi_skill.detect_skills has the same exposure during detection.

Suggested fix: bound the nesting depth in the walkers (return/skip past a sane cap) and/or have _parse_bundle_path treat a bundle that raises during parsing as "not a valid bundle" (e.g. also catch RecursionError/ValueError around _parse_bundle_payload) so a malformed bundle fails closed to None instead of crashing the scan. A regression test with a deeply nested bundle asserting the scan completes would lock this in.

Everything else looks good — the two-message contract validation is appropriately defensive, the SSR-1 finding is correctly LOW/informational, and the build_context/detection/registry wiring and tests are solid. Happy to re-review once the recursion is bounded.

Comment thread src/skillspector/structured_skill.py Outdated
@rodboev rodboev force-pushed the pr/structured-skill-role-summary branch from 6efdb55 to 655d925 Compare June 27, 2026 18:56
@rodboev

rodboev commented Jun 27, 2026

Copy link
Copy Markdown
Contributor Author

Pushed 655d925. structured_skill.py now bounds nested workflow and resource walking, and _parse_bundle_path() treats an over-nested bundle as invalid instead of letting it crash the scan. The branch also adds focused regressions for both the build_context() path and recursive child-skill detection.

@rodboev rodboev force-pushed the pr/structured-skill-role-summary branch from 655d925 to b6c92de Compare June 30, 2026 00:25

@rng1995 rng1995 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previously reported recursion/DoS issue is resolved: all structured-metadata walkers now enforce a depth cap, parser failures are contained, and the new tests cover fail-closed behavior for functions and resources. However, this head no longer merges cleanly with current main; there are content conflicts in tests/nodes/test_build_context.py and tests/unit/test_cli.py. Please rebase or merge current main and preserve both sets of tests.

rodboev added 4 commits June 30, 2026 06:54
Signed-off-by: Rod Boev <rod.boev@gmail.com>
Signed-off-by: Rod Boev <rod.boev@gmail.com>
Signed-off-by: Rod Boev <rod.boev@gmail.com>
Signed-off-by: Rod Boev <rod.boev@gmail.com>
@rodboev rodboev force-pushed the pr/structured-skill-role-summary branch from b6c92de to 3c00617 Compare June 30, 2026 11:15
@rodboev

rodboev commented Jun 30, 2026

Copy link
Copy Markdown
Contributor Author

Rebased this branch onto current main and kept both sides of the newer test additions. The structured-skill coverage that was already on the branch is still there in tests/nodes/test_build_context.py and tests/unit/test_cli.py, and the accepted parser and analyzer changes are unchanged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants