perf(agent): wake on state change instead of 500ms polling#305
Conversation
…p pre-filter, memory compression - Add string-similarity pre-filter to vulnerability deduplication to limit LLM comparisons to the top 10 most similar reports instead of all reports - Replace per-request httpx.AsyncClient with persistent connection pool per sandbox, eliminating repeated TCP/TLS handshake overhead - Execute independent tools concurrently via asyncio.gather while keeping state-modifying tools sequential - Lower memory compression threshold from 100K to 60K tokens and cache token counts to avoid redundant litellm.token_counter calls - Double compression chunk size from 10 to 20 messages to halve LLM calls - Replace asyncio.sleep(0.5) polling with event-based wake signaling in agent state for immediate response to state changes https://claude.ai/code/session_012JYGtxVh4zRbzXKarNmb11
Greptile SummaryThis PR replaces the fixed Confidence Score: 5/5Safe to merge; the event-driven wake logic is sound and the only finding is a minor P2 spurious-wake on internal status messages. No P0 or P1 issues found. The single P2 (add_message fires wake for assistant-role messages) causes one extra no-op loop iteration per waiting-state entry but does not break correctness or lose any wake signals. strix/agents/state.py — add_message role-guard for the wake signal. Important Files Changed
Prompt To Fix All With AIFix the following 2 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 2
strix/agents/state.py:49-56
**Spurious wake on every waiting-state entry**
`_enter_waiting_state` in `base_agent.py` calls `enter_waiting_state()` (which sets `waiting_for_input = True`) and then immediately calls `add_message("assistant", ...)`. Because `waiting_for_input` is already `True` at that point, `add_message` fires `_wake_event.set()` on every single status message the system adds when entering a waiting state. On the next loop iteration `wait_for_wake` returns instantly (consuming and clearing the event), and the agent loops back to sleep — defeating the optimization for that first cycle. The intent of the wake signal is to notify the agent of *external* input, not of its own internal status messages.
### Issue 2 of 2
strix/agents/state.py:53-56
Consider restricting the wake to non-assistant messages so the agent's own status notifications don't fire a spurious wake signal. This ensures `_wake_event` is only set when real external input (from a user or another agent) arrives while the state is waiting.
```suggestion
self.messages.append(message)
self.last_updated = datetime.now(UTC).isoformat()
if self.waiting_for_input and role != "assistant":
self._wake_event.set()
```
Reviews (2): Last reviewed commit: "use PrivateAttr for _wake_event (Pydanti..." | Re-trigger Greptile |
The pre-filter undermines the whole reason for using an LLM for duplicate detection: semantic understanding. SequenceMatcher operates character-by-character on strings — two SQL injection findings on different endpoints can have very different titles but be the same root issue, and two unrelated bugs can have similar descriptions if they hit the same endpoint. Pre-filtering by string match throws away the LLM's main advantage on this exact problem. The arbitrary top-10 cutoff also silently drops any duplicate that ranks 11th or lower by string similarity. With modern long-context models (Sonnet 4.6 = 200K), sending all reports to the LLM is well within budget; the cost reduction here doesn't justify the precision loss.
- state.py + base_agent.py: the wake-event work is one logical unit; reverting state forces reverting base_agent (its only change calls state.wait_for_wake). The Pydantic asyncio.Event integration is fragile and the 500ms polling it replaces hasn't been a measured problem. - memory_compressor.py: lowering the threshold from 100K to 60K is a workload-dependent change shipped without measurement; the token cache is unbounded; PR usestrix#381 has since changed this file in a different direction (reserved_tokens parameter), guaranteeing conflicts. - executor.py: connection pooling never wires close_sandbox_client into sandbox teardown (resource leak). The parallel-tool execution has correctness risk — the sequential denylist is incomplete (file edits, notes, python, terminal can race) and the partition reorders calls, breaking LLM-intended ordering. PR is now effectively empty; should be closed.
Previous revert commit was over-broad: it reverted the wake-event work along with the rest. The wake event is the one piece of this PR that's a clean win — replaces 500ms polling with immediate signal on state change, no correctness risk, properly excluded from serialization.
- Drop signal_wake() — defined but never called; the two real callers (add_message, resume_from_waiting) hit _wake_event.set() directly. - Drop the bool return from wait_for_wake — the only caller in base_agent.py discards it. - Drop docstrings (codebase style is sparse on docstrings) and the redundant self-explanatory comment on the field.
Underscore-prefixed fields with Field() raise NameError in Pydantic 2.x at class-definition time — the model wouldn't even import. Switching to PrivateAttr is the correct spelling for runtime-only private attributes, and as a bonus removes both the arbitrary_types_allowed model_config (private attrs aren't validated) and the exclude=True on the Field (private attrs are auto-excluded from model_dump). Verified end-to-end: instantiates cleanly, serializes without _wake_event, set/clear cycle works.
…p pre-filter, memory compression