Skip to content

Simplify LLM configuration layer (closes #504)#524

Merged
0xallam merged 11 commits into
mainfrom
fix/deepseek-auth-and-oai-compatible
Jun 8, 2026
Merged

Simplify LLM configuration layer (closes #504)#524
0xallam merged 11 commits into
mainfrom
fix/deepseek-auth-and-oai-compatible

Conversation

@0xallam

@0xallam 0xallam commented Jun 7, 2026

Copy link
Copy Markdown
Member

Closes #504.

Strips the LLM-config layer to a thin shim over the OpenAI Agents SDK + LiteLLM. No hand-rolled provider tables, no shorthand expansion, no per-provider gating.

What changes

  • One key, any provider. LLM_API_KEY is mirrored into the provider-specific env var (DEEPSEEK_API_KEY, ANTHROPIC_API_KEY, TOGETHERAI_API_KEY, ...) via litellm.validate_environment. No local provider→env-var map.
  • No litellm/ prefix in user config. StrixProvider (a 5-line MultiProvider subclass) routes any non-OpenAI prefix through LiteLLM. Users type deepseek/deepseek-v4-pro. Old litellm/... configs still work.
  • No tool_choice="required". Modern thinking models (Anthropic extended thinking, DeepSeek /beta, Bedrock newer Claude) reject it; convergence relies on _finish_tool_use_behavior + _append_noninteractive_tool_required_message (already in place).
  • No bare-name shorthand expansion. Users supply <provider>/<model>. Bare unknown names route to the SDK's OpenAI default; pre-warm-up panel warns when the bare name isn't a known OpenAI model and exits before the doomed call.

Side stability fixes

  • Catch the LiteLLM end-of-stream RuntimeError: cannot schedule new futures after shutdown race (background success_handler.submit losing to atexit during interpreter teardown — stream content was already delivered).
  • Silence LiteLLM stdout banners via suppress_debug_info, disable_streaming_logging, turn_off_message_logging.
  • Dedupe pass also routes via StrixProvider (was stock MultiProvider, broke non-OpenAI configs).
  • Bump litellm 1.83.7 → 1.88.0.
LiteLLM's per-provider branches (deepseek, anthropic, groq, etc.)
don't consult ``litellm.api_key`` (the module global Strix sets).
They only check the per-call ``api_key`` kwarg and the
``<PROVIDER>_API_KEY`` env var. The SDK's LitellmModel passes
``api_key=None`` by default, so requests went out with an empty
bearer and DeepSeek (and friends) returned 401.

Mirror the user's LLM_API_KEY into the provider-specific env var
(``DEEPSEEK_API_KEY`` for ``deepseek/...``, ``ANTHROPIC_API_KEY``
for ``anthropic/...``, etc.) using LiteLLM's documented convention.
``os.environ.setdefault`` is used so an explicit user env is never
clobbered. The OpenAI branch was already working via
``set_default_openai_key`` + the existing ``litellm.api_key`` global
fallback.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@greptile-apps

greptile-apps Bot commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR replaces the hand-rolled LLM configuration layer with a thin StrixProvider subclass of MultiProvider, relying on LiteLLM's own validate_environment for key mirroring and its routing table for provider dispatch. The change eliminates shorthand expansion, per-provider env-var maps, and the universal tool_choice="required" that modern thinking models reject.

  • StrixProvider routes any non-openai/litellm/any-llm prefix directly to LiteLLM, so users type anthropic/claude-opus-4-7 instead of litellm/anthropic/claude-opus-4-7; old litellm/… strings continue to work.
  • Key mirroring now calls litellm.validate_environment to discover the correct provider-specific env var name instead of uppercasing the prefix, avoiding the TOGETHER_AI_API_KEY / TOGETHERAI_API_KEY class of mismatches flagged in prior review threads.
  • Stability fixes include suppressing LiteLLM's stdout banners, catching the ThreadPoolExecutor shutdown race on stream teardown, and routing the dedupe path through StrixProvider (was stock MultiProvider, breaking non-OpenAI configs).

Confidence Score: 5/5

Safe to merge; all changes are well-scoped simplifications with no functional regressions on the main paths.

The core routing change is straightforward and well-tested by the warm-up path. Key mirroring now delegates to litellm's own environment validation rather than a fragile hand-rolled map, fixing the previously flagged provider naming bugs. The two concerns raised are maintenance-oriented: the _supports_responses_custom_tools function uses supports_reasoning as a proxy (a reasonable assumption for current OpenAI models), and the shutdown-race guard matches on a string literal. Neither affects correctness today.

strix/config/models.py — the _supports_responses_custom_tools coupling between supports_reasoning and Responses API capability detection is worth revisiting if new OpenAI models diverge from the current pattern.

Important Files Changed

Filename Overview
strix/config/models.py Introduces StrixProvider and _mirror_api_key_to_provider_env; replaces hand-rolled normalisation with litellm.validate_environment. The _supports_responses_custom_tools helper conflates supports_reasoning with Responses API custom tool support.
strix/core/execution.py Adds LiteLLM shutdown-race guard around the stream event loop; fragile string-matching on RuntimeError message.
strix/core/runner.py Replaces normalize_model_name with a plain strip() and injects StrixProvider via RunConfig; logic straightforward and consistent with the rest of the change.
strix/interface/main.py Adds pre-warm-up check for bare non-OpenAI names and exits early with a clear user-facing message; UX guidance updated to use provider/model form.
strix/report/dedupe.py Switches from MultiProvider+normalize_model_name to StrixProvider, fixing non-OpenAI routing in the dedupe path.
strix/agents/factory.py Removes reset_tool_choice=interactive; consistent with the universal removal of tool_choice="required" in this PR.
strix/core/inputs.py Simplifies make_model_settings by dropping the model_supports_reasoning guard; reasoning params are now forwarded unconditionally and silently dropped via drop_params for unsupported models.
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
strix/config/models.py:125-130
**`supports_reasoning` used as a proxy for Responses API custom tool support**

`_supports_responses_custom_tools` returns `supports_reasoning` from litellm's model cost table, but the two capabilities are distinct. Today the correlation holds (o3/o4-mini have `supports_reasoning=True` and also accept native Responses API capabilities like `Filesystem`/`Shell`), but a new OpenAI model could support one without the other. If that happens, `uses_chat_completions_tool_schema` will silently mis-format tool definitions for the new model, sending Responses-API capability schemas when function-tool schemas are required (or vice versa), causing tool invocation failures at runtime with no obvious error at startup.

### Issue 2 of 2
strix/core/execution.py:359-361
**Fragile substring match on a runtime error message**

Catching `RuntimeError` by checking for `"after shutdown"` in the message string is fragile in two ways: if LiteLLM ever rewords the message (e.g. "cannot schedule new futures after interpreter shutdown"), the race exception will propagate as an unhandled error instead of being silently swallowed; and if some other `RuntimeError` happens to contain "after shutdown", it will be incorrectly suppressed. Consider matching against a specific exception type or a more structured sentinel if LiteLLM exposes one; alternatively, tightening the guard slightly reduces the blast radius.

```suggestion
                except RuntimeError as stream_exc:
                    if not any(
                        phrase in str(stream_exc)
                        for phrase in ("after shutdown", "after interpreter shutdown")
                    ):
                        raise
```

Reviews (6): Last reviewed commit: "Use function-tool schema for non-reasoni..." | Re-trigger Greptile

Comment thread strix/config/models.py Outdated
normalize_model_name expands `claude-*` and `gemini-*` shorthands into
`litellm/anthropic/...` and `litellm/gemini/...` at routing time, but
the mirror helper was looking at the raw pre-normalization name — bare
shorthands had no `/` and hit the early return, so ANTHROPIC_API_KEY /
GEMINI_API_KEY were never populated for those users.

Run the same normalization inside the mirror helper so the provider
prefix is consistent with what LiteLLM actually sees downstream.
Comment thread strix/config/models.py Outdated
0xallam added 6 commits June 7, 2026 22:58
Naively uppercasing the routing prefix breaks for providers whose
LiteLLM env var name doesn't match the prefix verbatim:
  together_ai/...  needs TOGETHERAI_API_KEY  (no underscore)
  perplexity/...   needs PERPLEXITYAI_API_KEY

Ask LiteLLM directly via litellm.validate_environment(model=...) which
env vars it consults for the chosen provider, then setdefault each one
to LLM_API_KEY. This is the SDK-blessed lookup and stays correct for
every provider LiteLLM supports without a hand-maintained name map.

Lowercase the routed model name before lookup so mixed-case user input
(e.g. Together_AI/...) still resolves.
A bare model name without a provider prefix routes through the SDK's
default OpenAI provider, so configuring STRIX_LLM=deepseek-v4-pro with
LLM_API_KEY=<deepseek key> sends that key to api.openai.com and
surfaces a confusing "Incorrect API key" error pointing at the OpenAI
dashboard.

When warm-up fails with an OpenAI-shaped error AND the configured
model is still unprefixed after normalize_model_name, append a hint
that points the user at the '<provider>/<model>' form with concrete
examples.
When the user opts into reasoning_effort but the configured model
isn't in litellm.model_cost at all (private SKUs, fresh releases the
registry hasn't picked up — e.g. deepseek/deepseek-v4-pro), we can't
confirm thinking support and were sending tool_choice="required",
which thinking-mode endpoints reject ("Thinking mode does not support
this tool_choice").

Add model_known_to_registry() and split the decision: when the user
wants reasoning AND the model is either confirmed-reasoning OR
unknown-to-registry, drop tool_choice. The Reasoning(effort=...) param
still only attaches for confirmed-reasoning models, so we don't send
reasoning hints to known non-reasoning models.

Known non-reasoning models (gpt-4o, registry-confirmed) keep
tool_choice="required" unchanged.
Users had to type STRIX_LLM=litellm/deepseek/deepseek-chat — the
litellm/ wrapper was Strix-internal plumbing surfacing in user config.

Add StrixProvider, a MultiProvider subclass that routes any non-OpenAI
prefix (deepseek/, anthropic/, groq/, xai/, mistral/, openrouter/, …)
through LitellmProvider with the prefix preserved. normalize_model_name
no longer adds litellm/ to anything; bare claude-* / gemini-* shorthands
expand to anthropic/<model> / gemini/<model> instead of the wrapped form.

Wire StrixProvider into warm_up_llm and RunConfig.model_provider.
litellm/<provider>/<model> and any-llm/<provider>/<model> still resolve
unchanged for users on older config.

Refresh stale model names in the env-validation messages and the
warm-up hint (gpt-5.4, claude-opus-4-7, deepseek-reasoner).

Verified 24-case end-to-end matrix: OpenAI direct vs. LitellmProvider
routing, env-var mirroring via validate_environment, supports_reasoning
detection, and tool_choice gating all behave correctly across modern
providers including the user's unknown DeepSeek SKU.
Drop every hand-rolled provider table and per-model gating that had
accumulated in the model-handling layer:

  * normalize_model_name no longer auto-prefixes bare claude-* / gemini-*
    names. Users supply the full <provider>/<model> form. The function
    became literally model_name.strip(), so callers now inline that and
    the function is removed.
  * tool_choice="required" is gone everywhere. Thinking-mode endpoints
    (Anthropic, DeepSeek /beta) reject it; modern reasoning models don't
    need it; non-interactive runs already have
    _append_noninteractive_tool_required_message as the convergence
    backstop. model_supports_reasoning, model_known_to_registry, and
    _model_cost_entry were only used to gate this and follow it out.
  * Reasoning(effort=...) is now attached whenever
    STRIX_REASONING_EFFORT is non-none. litellm.drop_params=True absorbs
    it for non-reasoning models.
  * Warm-up's bare-name OpenAI 401 hint is removed (false-positive prone,
    relied on substring matching).
  * reset_tool_choice on SandboxAgent is no-op now (no tool_choice gets
    set) and is removed.
  * report/dedupe.py was still routing through stock MultiProvider, so
    non-OpenAI configs failed the dedupe LLM pass; switch it to
    StrixProvider.

Verified end-to-end against modern provider strings (openai/gpt-5.4,
anthropic/claude-opus-4-7, deepseek/deepseek-reasoner,
gemini/gemini-2.5-pro, groq/, xai/, mistral/, together_ai/, perplexity/,
openrouter/, litellm/ legacy form, and whitespace-padded input): 18/18
cases route correctly, env vars mirror via litellm.validate_environment,
and ModelSettings carries no tool_choice. mypy strict passes.
Warn on bare unknown model names before warm-up. is_known_openai_bare_model
consults litellm.model_cost and matches only entries whose
litellm_provider == "openai". When the configured STRIX_LLM has no
provider prefix, isn't a known OpenAI model, and no LLM_API_BASE is
set, show a clear panel pointing the user at the <provider>/<model>
form and exit before issuing the doomed request — no more chasing an
"Incorrect API key" 401 from OpenAI when the user actually meant
deepseek/, anthropic/, etc. Custom-base configs are still allowed
through unconfirmed.

Disable LiteLLM's message-logging and streaming-logging knobs to cut
noise and skip one of the two end-of-stream submit paths. The other
path at streaming_handler.py:2206 schedules work on a global
ThreadPoolExecutor that loses to atexit shutdown when the interpreter
is winding down; the SDK's stream consumer surfaces that as a fatal
"cannot schedule new futures after shutdown" RuntimeError even though
the actual stream content was already delivered. Catch and swallow
that specific RuntimeError in _run_cycle so the scan isn't killed by
an upstream end-of-stream logging race.
Comment thread strix/interface/main.py
0xallam added 2 commits June 8, 2026 02:38
litellm.suppress_debug_info silences two unsolicited print() calls in
LiteLLM core: the "Provider List: https://docs.litellm.ai/docs/providers"
banner emitted by get_llm_provider_logic and the "Give Feedback /
Get Help" + "If you need to debug this error, use litellm._turn_on_debug()"
pair emitted by exception_mapping_utils on every LiteLLM exception.
Both are unconditional print() calls, not logger output, so log-level
config can't catch them. LiteLLM's own router and proxy_server set the
same flag for the same reason.
@0xallam 0xallam changed the title fix: mirror LLM_API_KEY into provider-specific env vars (closes #504) Jun 7, 2026
OpenAI's Responses API rejects tools[i].type="custom" on non-reasoning
models like gpt-4o (400 with code=unknown_parameter, param=tools).
Strix's SDK-native Filesystem capability registers CustomTool entries
by default, so a bare STRIX_LLM=gpt-4o run failed at the first tool
invocation even though warm-up (a tool-less call) succeeded.

uses_chat_completions_tool_schema now consults
litellm.model_cost[<name>].supports_reasoning for OpenAI routes and
flips to the chat-completions function-tool schema for models that
don't carry the reasoning flag. Same registry-lookup pattern as
is_known_openai_bare_model. Non-OpenAI prefixes and configs with
LLM_API_BASE are unchanged (still function tools).
@0xallam 0xallam merged commit dcf3155 into main Jun 8, 2026
2 checks passed
@0xallam 0xallam deleted the fix/deepseek-auth-and-oai-compatible branch June 8, 2026 00:36
0xallam added a commit that referenced this pull request Jun 8, 2026
The two-bucket design (estimated + observed) shipped in #524 was
over-engineered for what the user actually wants (one accurate total)
and had a real sum-vs-total inconsistency flagged by Greptile when a
provider populates input_tokens/output_tokens but leaves total_tokens
unset.

Collapse LLMUsageLedger to a single _total_cost. record() still skips
the tokens-times-registry estimate for LiteLLM-routed models so we do
not double-count the callback, but everything lands in one bucket.
to_record() apportions the total to agents by token share, using
_resolve_total_tokens (which falls back to input + output when
total_tokens is 0). The denominator and per-agent numerators read from
the same map, so per-agent shares now sum exactly to the total. The
cost_source label is removed.

litellm_cost_callback and its private extractor move from a separate
report/cost_capture.py module into report/state.py, collapsed into a
single self-contained adapter at the bottom of that file. cost_capture
is deleted.

ReportState gains a public record_observed_llm_cost(cost) method so
the callback no longer reaches into report_state._llm_usage as a
private attribute.

hydrate() loads the persisted cost back into _total_cost with no
attempt to bucket it as estimated vs observed, which removes the
resume drift where a previously-observed run picked up a "mixed"
label on resume.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant