Skip to content

feat: add skills for specific tools#366

Merged
0xallam merged 45 commits into
mainfrom
load-skills
Mar 19, 2026
Merged

feat: add skills for specific tools#366
0xallam merged 45 commits into
mainfrom
load-skills

Conversation

@bearsyankees

Copy link
Copy Markdown
Collaborator

No description provided.

bearsyankees and others added 30 commits March 8, 2026 02:10
LiteLLM OTel callbacks are not enabled, so _build_trace_metadata()
and its test are dead code.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove all streaming event emission (agent.streaming.updated,
cleared, interrupted) — high-frequency noise with no observability
value. Revert unnecessary i->index, loc->location renames.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Run names are auto-generated by generate_run_name() which already
slugifies the input. The extra sanitization layer was redundant.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Traceloop uses raw print() statements for "exporting traces to a
custom exporter" and "Metrics are disabled" messages. Redirect
stdout during init to suppress them.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Now that traces are always written to the run dir, the output path
is always relevant — not just when vulnerabilities are found.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Base64 screenshot payloads from browser_action were bloating
events.jsonl (~10MB for small runs). Strip them at the sanitizer
level since DOM/HTML content is sufficient for debugging.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Docker SDK uses urllib3 for container API calls, which Traceloop
auto-instruments — flooding events.jsonl with noise.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Implemented `load_skill` function to dynamically load skills into the current agent at runtime.
- Enhanced agent context management to track loaded skills.
- Created XML schema documentation for the new `load_skill` tool.
- Added new skills documentation for various tools including `ffuf`, `httpx`, `katana`, `naabu`, `nmap`, `nuclei`, `sqlmap`, and `subfinder`.
- Introduced runtime tooling logic to detect and manage tool usage in commands.
@greptile-apps

greptile-apps Bot commented Mar 17, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR introduces auto-loading of tool-specific CLI skill context for eight security tools (nmap, nuclei, httpx, ffuf, subfinder, naabu, katana, sqlmap). Skills are injected lazily in two ways: automatically when the agent is about to run a recognised tool via terminal_execute (_inject_runtime_tooling_context in base_agent.py), and explicitly on demand through a new load_skill LLM-callable tool. A lightweight shell-segment parser (runtime_tooling.py) detects the actual tool binary even through wrappers like sudo, env, timeout, and background operators. Previous review concerns (skill-name format mismatch, frozen=True mutability, requested_skills type inconsistency, single & separator) have all been addressed in this revision.

Key changes:

  • strix/skills/runtime_tooling.py — new module: shell parser, get_tooling_preflight, build_tooling_preflight_message, and canonical_runtime_skill_name
  • strix/agents/base_agent.py_inject_runtime_tooling_context intercepts terminal actions before each LLM turn
  • strix/llm/llm.pyadd_runtime_skills / _get_skills_to_load enable post-init system-prompt updates
  • strix/tools/agents_graph/agents_graph_actions.pyload_skill tool for explicit runtime loading
  • 8 new Markdown skill files under strix/skills/tooling/
  • UI renderer update to display skill-loading events as dim status lines instead of chat bubbles

Issues found:

  • _consume_timeout does not skip the value of the -s/--signal and -k/--kill-after flags. For a command like timeout -s TERM 60 nmap target, the signal name TERM is returned as the detected command, causing the nmap skill to be silently missed.

Confidence Score: 4/5

  • Safe to merge; one skill-detection miss in an edge-case timeout flag pattern does not affect correctness or security.
  • The core architecture is sound and all previously flagged issues have been addressed. The one new logic bug (_consume_timeout not consuming the -s SIGNAL value) is a skill auto-loading miss rather than a correctness or security problem — the command still executes normally, the skill just isn't injected. All other parsing paths, deduplication logic, context state management, and the new load_skill tool behave correctly.
  • strix/skills/runtime_tooling.py — specifically _consume_timeout (lines 567–575)

Important Files Changed

Filename Overview
strix/skills/runtime_tooling.py New module implementing shell-command parsing for auto-detecting security tool usage and lazily injecting skill context. Core logic is well-structured with good separator handling (;, \n, `
strix/agents/base_agent.py Adds _inject_runtime_tooling_context which intercepts terminal actions before each LLM turn, detects tool commands, and auto-loads the matching skills into the system prompt. The deduplication logic (using canonical paths stored in runtime_skills_loaded) is correct and consistent with how load_skill writes context. Returns True to short-circuit the loop only when new skills were actually injected.
strix/llm/llm.py Adds _runtime_skills list and add_runtime_skills method to the LLM class, allowing dynamic post-init skill injection by re-loading the system prompt. Deduplication is in-place; _get_skills_to_load merges config skills, runtime skills, and the scan-mode skill with proper ordering. Looks correct.
strix/tools/agents_graph/agents_graph_actions.py Adds load_skill tool function for explicit runtime skill loading. Addresses previous review concerns: raw skill names are validated (via load_skills which resolves short names against categories), then canonicalised before being applied; the context is updated with canonical paths, keeping it consistent with what _inject_runtime_tooling_context reads. The exception handler now correctly re-parses skills as a fallback instead of returning a raw string.
strix/interface/tool_components/user_message_renderer.py Adds rendering for <runtime_tool_skill_context> messages and reclassifies <inter_agent_message> as a "Context:" header rather than "You:". Logic is clean; minor wording issue in _format_runtime_skill_context when multiple tools are listed.
tests/skills/test_runtime_tooling.py Good coverage of the main happy paths: new-skill loading, wrapper/assignment stripping, single-& background operator splitting, help-flag detection, non-tool mention filtering, already-loaded deduplication, and message content. Notably includes a test for the & separator which was flagged in a previous review thread.
tests/tools/test_load_skill_tool.py Unit tests for the load_skill tool using lightweight dummy agents. Covers success, short-name canonicalisation, invalid-skill error, and missing-agent-instance error. Uses a safe pattern of saving/restoring _agent_instances in finally blocks.

Comments Outside Diff (1)

  1. strix/skills/runtime_tooling.py, line 567-575 (link)

    P2 _consume_timeout mishandles -s/--signal flag

    timeout -s SIGNAL duration cmd is a common pattern in security tooling to send a specific termination signal. The current implementation only skips flag tokens that start with -, but doesn't skip the value that follows value-taking flags like -s/--signal and -k/--kill-after.

    For the token stream ['timeout', '-s', 'TERM', '60', 'nmap', 'target']:

    1. -s starts with - → consumed (index → 2)
    2. TERM doesn't start with - → while loop exits
    3. _TIMEOUT_VALUE_RE.match('TERM') fails → index stays at 2
    4. Back in _extract_command_and_args: TERM is treated as the command name

    Because "term" isn't in _TOOL_SKILL_PATHS, no false-positive occurs, but the real command (nmap) is never inspected — its skill won't be auto-loaded.

    A targeted fix analogous to _consume_sudo:

    _TIMEOUT_FLAGS_WITH_VALUE = {"-s", "--signal", "-k", "--kill-after"}
    
    def _consume_timeout(tokens: list[str], start_index: int) -> int:
        index = start_index
        while index < len(tokens):
            part = tokens[index]
            if part in _TIMEOUT_FLAGS_WITH_VALUE and index + 1 < len(tokens):
                index += 2
                continue
            if part.startswith("-"):
                index += 1
                continue
            break
    
        if index < len(tokens) and _TIMEOUT_VALUE_RE.match(tokens[index]):
            index += 1
    
        return index
    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: strix/skills/runtime_tooling.py
    Line: 567-575
    
    Comment:
    **`_consume_timeout` mishandles `-s`/`--signal` flag**
    
    `timeout -s SIGNAL duration cmd` is a common pattern in security tooling to send a specific termination signal. The current implementation only skips flag tokens that start with `-`, but doesn't skip the *value* that follows value-taking flags like `-s`/`--signal` and `-k`/`--kill-after`.
    
    For the token stream `['timeout', '-s', 'TERM', '60', 'nmap', 'target']`:
    1. `-s` starts with `-` → consumed (index → 2)
    2. `TERM` doesn't start with `-` → while loop exits
    3. `_TIMEOUT_VALUE_RE.match('TERM')` fails → index stays at 2
    4. Back in `_extract_command_and_args`: `TERM` is treated as the command name
    
    Because `"term"` isn't in `_TOOL_SKILL_PATHS`, no false-positive occurs, but the real command (`nmap`) is never inspected — its skill won't be auto-loaded.
    
    A targeted fix analogous to `_consume_sudo`:
    ```python
    _TIMEOUT_FLAGS_WITH_VALUE = {"-s", "--signal", "-k", "--kill-after"}
    
    def _consume_timeout(tokens: list[str], start_index: int) -> int:
        index = start_index
        while index < len(tokens):
            part = tokens[index]
            if part in _TIMEOUT_FLAGS_WITH_VALUE and index + 1 < len(tokens):
                index += 2
                continue
            if part.startswith("-"):
                index += 1
                continue
            break
    
        if index < len(tokens) and _TIMEOUT_VALUE_RE.match(tokens[index]):
            index += 1
    
        return index
    ```
    
    How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: strix/skills/runtime_tooling.py
Line: 567-575

Comment:
**`_consume_timeout` mishandles `-s`/`--signal` flag**

`timeout -s SIGNAL duration cmd` is a common pattern in security tooling to send a specific termination signal. The current implementation only skips flag tokens that start with `-`, but doesn't skip the *value* that follows value-taking flags like `-s`/`--signal` and `-k`/`--kill-after`.

For the token stream `['timeout', '-s', 'TERM', '60', 'nmap', 'target']`:
1. `-s` starts with `-` → consumed (index → 2)
2. `TERM` doesn't start with `-` → while loop exits
3. `_TIMEOUT_VALUE_RE.match('TERM')` fails → index stays at 2
4. Back in `_extract_command_and_args`: `TERM` is treated as the command name

Because `"term"` isn't in `_TOOL_SKILL_PATHS`, no false-positive occurs, but the real command (`nmap`) is never inspected — its skill won't be auto-loaded.

A targeted fix analogous to `_consume_sudo`:
```python
_TIMEOUT_FLAGS_WITH_VALUE = {"-s", "--signal", "-k", "--kill-after"}

def _consume_timeout(tokens: list[str], start_index: int) -> int:
    index = start_index
    while index < len(tokens):
        part = tokens[index]
        if part in _TIMEOUT_FLAGS_WITH_VALUE and index + 1 < len(tokens):
            index += 2
            continue
        if part.startswith("-"):
            index += 1
            continue
        break

    if index < len(tokens) and _TIMEOUT_VALUE_RE.match(tokens[index]):
        index += 1

    return index
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: strix/interface/tool_components/user_message_renderer.py
Line: 141-142

Comment:
**Singular "tool" label when multiple skills are loaded**

When multiple skills are loaded (e.g., `skill_names = "nmap, httpx"`), the rendered label reads *"getting context for tool nmap, httpx"*, which is grammatically incorrect. Consider pluralising when there is more than one tool name.

```suggestion
        count = skill_names.count(",") + 1
        label = "tools" if count > 1 else "tool"
        text.append(f"getting context for {label} {skill_names}", style="dim")
```

How can I resolve this? If you propose a fix, please make it concise.

Last reviewed commit: bdf8c04

Comment thread strix/tools/agents_graph/agents_graph_actions.py Outdated
Comment thread strix/skills/runtime_tooling.py Outdated
@bearsyankees

Copy link
Copy Markdown
Collaborator Author
Comment thread strix/tools/agents_graph/agents_graph_actions.py Outdated
Comment thread strix/skills/runtime_tooling.py Outdated
Comment on lines +218 to +224
if char == "|" and index + 1 < len(command) and command[index + 1] == "|":
index += 1

segment = "".join(current).strip()
if segment:
segments.append(segment)
current = []

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Single & background operator not treated as a segment separator

_split_shell_segments only splits on && (logical AND), ;, \n, and |/||. A lone & (background execution) is not treated as a separator. As a result, a command like:

nmap -sV target.tld & sqlmap -u http://target.tld --forms

is emitted as a single segment. _extract_command_and_args then calls shlex.split on the whole string and returns nmap as the command—sqlmap is never visited, so its skill is not auto-loaded.

Consider adding & as a segment separator in the & character handling block:

Suggested change
if char == "|" and index + 1 < len(command) and command[index + 1] == "|":
index += 1
segment = "".join(current).strip()
if segment:
segments.append(segment)
current = []
if char == "&" and index + 1 < len(command) and command[index + 1] == "&":
segment = "".join(current).strip()
if segment:
segments.append(segment)
current = []
index += 2
continue
if char == "&":
segment = "".join(current).strip()
if segment:
segments.append(segment)
current = []
index += 1
continue
Prompt To Fix With AI
This is a comment left during a code review.
Path: strix/skills/runtime_tooling.py
Line: 218-224

Comment:
**Single `&` background operator not treated as a segment separator**

`_split_shell_segments` only splits on `&&` (logical AND), `;`, `\n`, and `|`/`||`. A lone `&` (background execution) is not treated as a separator. As a result, a command like:

```
nmap -sV target.tld & sqlmap -u http://target.tld --forms
```

is emitted as a single segment. `_extract_command_and_args` then calls `shlex.split` on the whole string and returns `nmap` as the command—`sqlmap` is never visited, so its skill is not auto-loaded.

Consider adding `&` as a segment separator in the `&` character handling block:

```suggestion
        if char == "&" and index + 1 < len(command) and command[index + 1] == "&":
            segment = "".join(current).strip()
            if segment:
                segments.append(segment)
            current = []
            index += 2
            continue

        if char == "&":
            segment = "".join(current).strip()
            if segment:
                segments.append(segment)
            current = []
            index += 1
            continue
```

How can I resolve this? If you propose a fix, please make it concise.
bearsyankees and others added 2 commits March 16, 2026 21:34
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
@bearsyankees

Copy link
Copy Markdown
Collaborator Author
@bearsyankees

Copy link
Copy Markdown
Collaborator Author
@0xallam 0xallam merged commit 8634159 into main Mar 19, 2026
1 check passed
@0xallam

0xallam commented Mar 19, 2026

Copy link
Copy Markdown
Member

Closes #278 #280

@0xallam 0xallam deleted the load-skills branch March 20, 2026 05:23
hkboujrida pushed a commit to hkboujrida/strix that referenced this pull request Apr 27, 2026
Co-authored-by: 0xallam <ahmed39652003@gmail.com>
Josz009 pushed a commit to Josz009/strix that referenced this pull request May 19, 2026
Co-authored-by: 0xallam <ahmed39652003@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants