Skip to content

perf(agent): wake on state change instead of 500ms polling#305

Merged
0xallam merged 6 commits into
usestrix:mainfrom
b4l4-sec:claude/improve-tool-efficiency-X7Qsk
May 4, 2026
Merged

perf(agent): wake on state change instead of 500ms polling#305
0xallam merged 6 commits into
usestrix:mainfrom
b4l4-sec:claude/improve-tool-efficiency-X7Qsk

Conversation

@b4l4-sec

@b4l4-sec b4l4-sec commented Feb 5, 2026

Copy link
Copy Markdown
Contributor

…p pre-filter, memory compression

  • Add string-similarity pre-filter to vulnerability deduplication to limit LLM comparisons to the top 10 most similar reports instead of all reports
  • Replace per-request httpx.AsyncClient with persistent connection pool per sandbox, eliminating repeated TCP/TLS handshake overhead
  • Execute independent tools concurrently via asyncio.gather while keeping state-modifying tools sequential
  • Lower memory compression threshold from 100K to 60K tokens and cache token counts to avoid redundant litellm.token_counter calls
  • Double compression chunk size from 10 to 20 messages to halve LLM calls
  • Replace asyncio.sleep(0.5) polling with event-based wake signaling in agent state for immediate response to state changes
…p pre-filter, memory compression

- Add string-similarity pre-filter to vulnerability deduplication to limit
  LLM comparisons to the top 10 most similar reports instead of all reports
- Replace per-request httpx.AsyncClient with persistent connection pool per
  sandbox, eliminating repeated TCP/TLS handshake overhead
- Execute independent tools concurrently via asyncio.gather while keeping
  state-modifying tools sequential
- Lower memory compression threshold from 100K to 60K tokens and cache
  token counts to avoid redundant litellm.token_counter calls
- Double compression chunk size from 10 to 20 messages to halve LLM calls
- Replace asyncio.sleep(0.5) polling with event-based wake signaling in
  agent state for immediate response to state changes

https://claude.ai/code/session_012JYGtxVh4zRbzXKarNmb11
@greptile-apps

greptile-apps Bot commented Feb 5, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR replaces the fixed asyncio.sleep(0.5) polling loop in _wait_for_input with an asyncio.Event-based wake signal, so the agent responds immediately when external input arrives rather than waiting up to 500 ms. The _wake_event is exposed via a new wait_for_wake method on AgentState and is signaled from both add_message (when waiting) and resume_from_waiting.

Confidence Score: 5/5

Safe to merge; the event-driven wake logic is sound and the only finding is a minor P2 spurious-wake on internal status messages.

No P0 or P1 issues found. The single P2 (add_message fires wake for assistant-role messages) causes one extra no-op loop iteration per waiting-state entry but does not break correctness or lose any wake signals.

strix/agents/state.py — add_message role-guard for the wake signal.

Important Files Changed

Filename Overview
strix/agents/state.py Adds asyncio.Event-based wake signaling to replace fixed sleep polls; minor logic issue where add_message fires the wake event for internally-generated assistant messages, causing a spurious loop iteration on every waiting-state entry.
strix/agents/base_agent.py Single-line change: replaces asyncio.sleep(0.5) with state.wait_for_wake(timeout=0.5) to enable immediate event-driven response; change is correct and well-integrated.
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
strix/agents/state.py:49-56
**Spurious wake on every waiting-state entry**

`_enter_waiting_state` in `base_agent.py` calls `enter_waiting_state()` (which sets `waiting_for_input = True`) and then immediately calls `add_message("assistant", ...)`. Because `waiting_for_input` is already `True` at that point, `add_message` fires `_wake_event.set()` on every single status message the system adds when entering a waiting state. On the next loop iteration `wait_for_wake` returns instantly (consuming and clearing the event), and the agent loops back to sleep — defeating the optimization for that first cycle. The intent of the wake signal is to notify the agent of *external* input, not of its own internal status messages.

### Issue 2 of 2
strix/agents/state.py:53-56
Consider restricting the wake to non-assistant messages so the agent's own status notifications don't fire a spurious wake signal. This ensures `_wake_event` is only set when real external input (from a user or another agent) arrives while the state is waiting.

```suggestion
        self.messages.append(message)
        self.last_updated = datetime.now(UTC).isoformat()
        if self.waiting_for_input and role != "assistant":
            self._wake_event.set()
```

Reviews (2): Last reviewed commit: "use PrivateAttr for _wake_event (Pydanti..." | Re-trigger Greptile

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Comment thread strix/tools/executor.py Outdated
Comment thread strix/llm/memory_compressor.py Outdated
0xallam added 5 commits May 3, 2026 18:28
The pre-filter undermines the whole reason for using an LLM for
duplicate detection: semantic understanding. SequenceMatcher operates
character-by-character on strings — two SQL injection findings on
different endpoints can have very different titles but be the same
root issue, and two unrelated bugs can have similar descriptions if
they hit the same endpoint. Pre-filtering by string match throws
away the LLM's main advantage on this exact problem.

The arbitrary top-10 cutoff also silently drops any duplicate that
ranks 11th or lower by string similarity. With modern long-context
models (Sonnet 4.6 = 200K), sending all reports to the LLM is well
within budget; the cost reduction here doesn't justify the precision
loss.
- state.py + base_agent.py: the wake-event work is one logical unit;
  reverting state forces reverting base_agent (its only change calls
  state.wait_for_wake). The Pydantic asyncio.Event integration is
  fragile and the 500ms polling it replaces hasn't been a measured
  problem.
- memory_compressor.py: lowering the threshold from 100K to 60K is a
  workload-dependent change shipped without measurement; the token
  cache is unbounded; PR usestrix#381 has since changed this file in a
  different direction (reserved_tokens parameter), guaranteeing
  conflicts.
- executor.py: connection pooling never wires close_sandbox_client
  into sandbox teardown (resource leak). The parallel-tool execution
  has correctness risk — the sequential denylist is incomplete (file
  edits, notes, python, terminal can race) and the partition reorders
  calls, breaking LLM-intended ordering.

PR is now effectively empty; should be closed.
Previous revert commit was over-broad: it reverted the wake-event
work along with the rest. The wake event is the one piece of this
PR that's a clean win — replaces 500ms polling with immediate signal
on state change, no correctness risk, properly excluded from
serialization.
- Drop signal_wake() — defined but never called; the two real
  callers (add_message, resume_from_waiting) hit _wake_event.set()
  directly.
- Drop the bool return from wait_for_wake — the only caller in
  base_agent.py discards it.
- Drop docstrings (codebase style is sparse on docstrings) and the
  redundant self-explanatory comment on the field.
Underscore-prefixed fields with Field() raise NameError in Pydantic
2.x at class-definition time — the model wouldn't even import.
Switching to PrivateAttr is the correct spelling for runtime-only
private attributes, and as a bonus removes both the
arbitrary_types_allowed model_config (private attrs aren't validated)
and the exclude=True on the Field (private attrs are auto-excluded
from model_dump). Verified end-to-end: instantiates cleanly,
serializes without _wake_event, set/clear cycle works.
@0xallam

0xallam commented May 4, 2026

Copy link
Copy Markdown
Member
@0xallam 0xallam changed the title Improve tool efficiency: connection pooling, parallel execution, dedu… May 4, 2026
@0xallam 0xallam merged commit e1f38f8 into usestrix:main May 4, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

3 participants