Skip to content

Add OpenTelemetry observability with local JSONL traces and Traceloop option#347

Merged
0xallam merged 30 commits into
usestrix:mainfrom
bearsyankees:better-logs
Mar 9, 2026
Merged

Add OpenTelemetry observability with local JSONL traces and Traceloop option#347
0xallam merged 30 commits into
usestrix:mainfrom
bearsyankees:better-logs

Conversation

@bearsyankees

@bearsyankees bearsyankees commented Mar 8, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • add OpenTelemetry/OpenLLMetry observability wired through Tracer
  • write full run telemetry to strix_runs/<run_name>/events.jsonl
  • support optional remote OTLP export via TRACELOOP_BASE_URL + TRACELOOP_API_KEY (+ optional TRACELOOP_HEADERS)
  • keep PostHog flow in place while adding OTel in parallel
  • enrich actor metadata so events with agent_id include agent_name when available

Config / Docs

  • add and document telemetry env vars and dual-write behavior

Tests

  • add config tests for telemetry env handling
  • add tracer tests for JSONL writing, remote setup, redaction, single run.completed, streaming throttling, and agent-name enrichment
  • add LiteLLM OTel callback test to ensure otel callback is added without clobbering existing callbacks

Validation

  • ran targeted lint and tests:
    • uvx poetry run ruff check on changed telemetry/test files
    • uvx poetry run pytest tests/config/test_config_telemetry.py tests/telemetry/test_tracer.py tests/llm/test_llm_otel.py
  • ran quick scans against https://github.com/usestrix/strix and https://app.strix.ai to verify runtime telemetry output in events.jsonl
@bearsyankees bearsyankees changed the title Add OpenTelemetry observability with local JSONL traces Mar 8, 2026
@greptile-apps

greptile-apps Bot commented Mar 8, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR introduces a dual-write OpenTelemetry observability layer — local events.jsonl per run via a custom JsonlSpanExporter, and optional remote OTLP export through Traceloop — alongside a TelemetrySanitizer for credential redaction. The overall design is sound with good test coverage.

Key remaining issues:

  1. Bearer token regex incomplete (strix/telemetry/utils.py:33): The sensitive token pattern [a-z0-9._-]+ omits standard Base64 characters +, /, =, causing partial redaction of OAuth tokens and similar credentials in JSONL output. A token like Bearer ya29.A0AXzc1v+abc/def= would leave +abc/def= visible in logs.

  2. Missing diagnostic logging on run directory fallback (strix/telemetry/utils.py:117): When sanitize_run_dir_name receives input that becomes empty after sanitization (e.g., "..."), it silently generates a run-{digest} fallback directory with no warning. Operators cannot diagnose why their configured run name was discarded.

Both issues are straightforward to fix and do not block core functionality, but the credential redaction gap is a security concern for users with Base64-encoded tokens.

Confidence Score: 4/5

  • Safe to merge with two minor fixes: Bearer token regex needs Base64 characters for credential redaction, and diagnostic logging should be added on directory fallback.
  • The PR introduces well-designed telemetry infrastructure with good test coverage. Core functionality is solid: correct per-file locking, proper cache invalidation on run rename, and clean separation of concerns. Two verified issues remain: (1) Bearer token regex is incomplete (security/redaction gap), (2) missing diagnostic log on directory name fallback. Both are straightforward fixes that do not block functionality. No correctness bugs in span export, event emission, or state management logic.
  • strix/telemetry/utils.py (fix Bearer token regex and add diagnostic logging on name fallback)

Last reviewed commit: 9f42719

Comment thread strix/telemetry/tracer.py
Comment thread strix/telemetry/tracer.py Outdated
Comment thread strix/telemetry/tracer.py Outdated
Comment thread strix/telemetry/tracer.py Outdated
@bearsyankees

Copy link
Copy Markdown
Collaborator Author
Comment thread strix/telemetry/tracer.py
Comment thread strix/telemetry/tracer.py Outdated
@bearsyankees

Copy link
Copy Markdown
Collaborator Author
Comment thread strix/llm/llm.py Outdated
Comment thread strix/telemetry/tracer.py Outdated
Comment thread strix/telemetry/tracer.py Outdated
@bearsyankees

Copy link
Copy Markdown
Collaborator Author
@greptile-apps

greptile-apps Bot commented Mar 8, 2026

Copy link
Copy Markdown
Contributor

Looks good! All three changes check out:

  1. Telemetry gating: _ensure_litellm_otel_callback() returns early when disabled, and Tracer._setup_telemetry() skips all OTel setup when self._telemetry_enabled is False.

  2. Empty directory cleanup: After removing expired events.jsonl files, the pruner now calls events_file.parent.rmdir() which will clean up empty run directories.

  3. Redundant sanitization removed: _append_event_record now just writes the record directly - sanitization happens once in _emit_event before the record is passed down.

Nice cleanup! ✅

Comment thread strix/telemetry/tracer.py
Comment thread strix/llm/llm.py Outdated
Comment thread strix/telemetry/tracer.py Outdated
@bearsyankees

Copy link
Copy Markdown
Collaborator Author
Comment thread pyproject.toml
Comment thread strix/telemetry/tracer.py Outdated
Comment thread strix/telemetry/tracer.py Outdated
Comment thread strix/telemetry/tracer.py Outdated
@bearsyankees

Copy link
Copy Markdown
Collaborator Author
Comment thread strix/telemetry/tracer.py Outdated
Comment thread strix/telemetry/tracer.py Outdated
Comment thread pyproject.toml
@bearsyankees

Copy link
Copy Markdown
Collaborator Author
Comment thread strix/telemetry/tracer.py
Comment thread strix/telemetry/tracer.py
@bearsyankees

Copy link
Copy Markdown
Collaborator Author
@bearsyankees

Copy link
Copy Markdown
Collaborator Author
Comment thread strix/telemetry/utils.py
Comment thread strix/telemetry/tracer.py
@bearsyankees

Copy link
Copy Markdown
Collaborator Author
Comment thread strix/telemetry/utils.py
Comment on lines +31 to +38
_SENSITIVE_TOKEN_PATTERN = re.compile(
r"(?i)\b("
r"bearer\s+[a-z0-9._-]+|"
r"sk-[a-z0-9_-]{8,}|"
r"gh[pousr]_[a-z0-9_-]{12,}|"
r"xox[baprs]-[a-z0-9-]{12,}"
r")\b"
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bearer token regex misses standard Base64 characters, leaving tokens partially un-redacted.

_SENSITIVE_TOKEN_PATTERN uses [a-z0-9._-]+ for the bearer-token value (line 33), but standard Base64 encoding also uses +, /, and = (padding). Any OAuth 2.0 access_token or similar that contains these characters will only be partially matched — the regex stops at the first +, /, or =, leaving the remainder of the token un-redacted in the JSONL output.

For example, Bearer ya29.A0AXzc1v+abc/def= would match only Bearer ya29.A0AXzc1v, leaving +abc/def= in plaintext.

Consider broadening the character class to cover standard Base64:

Suggested change
_SENSITIVE_TOKEN_PATTERN = re.compile(
r"(?i)\b("
r"bearer\s+[a-z0-9._-]+|"
r"sk-[a-z0-9_-]{8,}|"
r"gh[pousr]_[a-z0-9_-]{12,}|"
r"xox[baprs]-[a-z0-9-]{12,}"
r")\b"
)
_SENSITIVE_TOKEN_PATTERN = re.compile(
r"(?i)\b("
r"bearer\s+[a-z0-9._\-+/]+=*|"
r"sk-[a-z0-9_-]{8,}|"
r"gh[pousr]_[a-z0-9_-]{12,}|"
r"xox[baprs]-[a-z0-9-]{12,}"
r")\b"
)
Prompt To Fix With AI
This is a comment left during a code review.
Path: strix/telemetry/utils.py
Line: 31-38

Comment:
Bearer token regex misses standard Base64 characters, leaving tokens partially un-redacted.

`_SENSITIVE_TOKEN_PATTERN` uses `[a-z0-9._-]+` for the bearer-token value (line 33), but standard Base64 encoding also uses `+`, `/`, and `=` (padding). Any OAuth 2.0 `access_token` or similar that contains these characters will only be partially matched — the regex stops at the first `+`, `/`, or `=`, leaving the remainder of the token un-redacted in the JSONL output.

For example, `Bearer ya29.A0AXzc1v+abc/def=` would match only `Bearer ya29.A0AXzc1v`, leaving `+abc/def=` in plaintext.

Consider broadening the character class to cover standard Base64:

```suggestion
_SENSITIVE_TOKEN_PATTERN = re.compile(
    r"(?i)\b("
    r"bearer\s+[a-z0-9._\-+/]+=*|"
    r"sk-[a-z0-9_-]{8,}|"
    r"gh[pousr]_[a-z0-9_-]{12,}|"
    r"xox[baprs]-[a-z0-9-]{12,}"
    r")\b"
)
```

How can I resolve this? If you propose a fix, please make it concise.
Comment thread strix/telemetry/utils.py Outdated
Comment on lines +112 to +126
def sanitize_run_dir_name(run_dir_name: str) -> str:
normalized = run_dir_name.strip()
digest = f"{zlib.crc32(normalized.encode('utf-8')):08x}"

sanitized = re.sub(r"[^A-Za-z0-9._-]+", "-", normalized).strip(".-")
if not sanitized:
sanitized = f"run-{digest}"
elif sanitized != normalized:
sanitized = f"{sanitized}-{digest}"

if len(sanitized) > 80:
prefix = sanitized[:71].rstrip(".-")
sanitized = f"{prefix}-{digest}" if prefix else f"run-{digest}"

return sanitized

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sanitize_run_dir_name silently produces a fallback directory when the input becomes empty after sanitization, with no warning logged.

When the function applies strip(".-") on a string like "...", the result becomes empty. The fallback to f"run-{digest}" occurs (line 117–118) without any log output to help operators diagnose why their configured run name was not usable.

Consider adding a warning when the name becomes fully sanitized:

Suggested change
def sanitize_run_dir_name(run_dir_name: str) -> str:
normalized = run_dir_name.strip()
digest = f"{zlib.crc32(normalized.encode('utf-8')):08x}"
sanitized = re.sub(r"[^A-Za-z0-9._-]+", "-", normalized).strip(".-")
if not sanitized:
sanitized = f"run-{digest}"
elif sanitized != normalized:
sanitized = f"{sanitized}-{digest}"
if len(sanitized) > 80:
prefix = sanitized[:71].rstrip(".-")
sanitized = f"{prefix}-{digest}" if prefix else f"run-{digest}"
return sanitized
if not sanitized:
sanitized = f"run-{digest}"
logger.warning("run_dir_name %r was fully sanitized; using fallback %r", run_dir_name, sanitized)
elif sanitized != normalized:
Prompt To Fix With AI
This is a comment left during a code review.
Path: strix/telemetry/utils.py
Line: 112-126

Comment:
`sanitize_run_dir_name` silently produces a fallback directory when the input becomes empty after sanitization, with no warning logged.

When the function applies `strip(".-")` on a string like `"..."`, the result becomes empty. The fallback to `f"run-{digest}"` occurs (line 117–118) without any log output to help operators diagnose why their configured run name was not usable.

Consider adding a warning when the name becomes fully sanitized:

```suggestion
    if not sanitized:
        sanitized = f"run-{digest}"
        logger.warning("run_dir_name %r was fully sanitized; using fallback %r", run_dir_name, sanitized)
    elif sanitized != normalized:
```

How can I resolve this? If you propose a fix, please make it concise.
bearsyankees and others added 12 commits March 8, 2026 21:18
LiteLLM OTel callbacks are not enabled, so _build_trace_metadata()
and its test are dead code.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove all streaming event emission (agent.streaming.updated,
cleared, interrupted) — high-frequency noise with no observability
value. Revert unnecessary i->index, loc->location renames.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Run names are auto-generated by generate_run_name() which already
slugifies the input. The extra sanitization layer was redundant.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Traceloop uses raw print() statements for "exporting traces to a
custom exporter" and "Metrics are disabled" messages. Redirect
stdout during init to suppress them.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Now that traces are always written to the run dir, the output path
is always relevant — not just when vulnerabilities are found.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Base64 screenshot payloads from browser_action were bloating
events.jsonl (~10MB for small runs). Strip them at the sanitizer
level since DOM/HTML content is sufficient for debugging.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Docker SDK uses urllib3 for container API calls, which Traceloop
auto-instruments — flooding events.jsonl with noise.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@0xallam 0xallam merged commit a60cb4b into usestrix:main Mar 9, 2026
hkboujrida pushed a commit to hkboujrida/strix that referenced this pull request Apr 27, 2026
Co-authored-by: 0xallam <ahmed39652003@gmail.com>
Josz009 pushed a commit to Josz009/strix that referenced this pull request May 19, 2026
Co-authored-by: 0xallam <ahmed39652003@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants