Simplify LLM configuration layer (closes #504) by 0xallam · Pull Request #524 · usestrix/strix

0xallam · 2026-06-07T19:42:37Z

Closes #504.

Strips the LLM-config layer to a thin shim over the OpenAI Agents SDK + LiteLLM. No hand-rolled provider tables, no shorthand expansion, no per-provider gating.

What changes

One key, any provider. LLM_API_KEY is mirrored into the provider-specific env var (DEEPSEEK_API_KEY, ANTHROPIC_API_KEY, TOGETHERAI_API_KEY, ...) via litellm.validate_environment. No local provider→env-var map.
No litellm/ prefix in user config. StrixProvider (a 5-line MultiProvider subclass) routes any non-OpenAI prefix through LiteLLM. Users type deepseek/deepseek-v4-pro. Old litellm/... configs still work.
No tool_choice="required". Modern thinking models (Anthropic extended thinking, DeepSeek /beta, Bedrock newer Claude) reject it; convergence relies on _finish_tool_use_behavior + _append_noninteractive_tool_required_message (already in place).
No bare-name shorthand expansion. Users supply <provider>/<model>. Bare unknown names route to the SDK's OpenAI default; pre-warm-up panel warns when the bare name isn't a known OpenAI model and exits before the doomed call.

Side stability fixes

Catch the LiteLLM end-of-stream RuntimeError: cannot schedule new futures after shutdown race (background success_handler.submit losing to atexit during interpreter teardown — stream content was already delivered).
Silence LiteLLM stdout banners via suppress_debug_info, disable_streaming_logging, turn_off_message_logging.
Dedupe pass also routes via StrixProvider (was stock MultiProvider, broke non-OpenAI configs).
Bump litellm 1.83.7 → 1.88.0.

LiteLLM's per-provider branches (deepseek, anthropic, groq, etc.) don't consult ``litellm.api_key`` (the module global Strix sets). They only check the per-call ``api_key`` kwarg and the ``<PROVIDER>_API_KEY`` env var. The SDK's LitellmModel passes ``api_key=None`` by default, so requests went out with an empty bearer and DeepSeek (and friends) returned 401. Mirror the user's LLM_API_KEY into the provider-specific env var (``DEEPSEEK_API_KEY`` for ``deepseek/...``, ``ANTHROPIC_API_KEY`` for ``anthropic/...``, etc.) using LiteLLM's documented convention. ``os.environ.setdefault`` is used so an explicit user env is never clobbered. The OpenAI branch was already working via ``set_default_openai_key`` + the existing ``litellm.api_key`` global fallback. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

greptile-apps · 2026-06-07T19:45:31Z

Greptile Summary

This PR replaces the hand-rolled LLM configuration layer with a thin StrixProvider subclass of MultiProvider, relying on LiteLLM's own validate_environment for key mirroring and its routing table for provider dispatch. The change eliminates shorthand expansion, per-provider env-var maps, and the universal tool_choice="required" that modern thinking models reject.

StrixProvider routes any non-openai/litellm/any-llm prefix directly to LiteLLM, so users type anthropic/claude-opus-4-7 instead of litellm/anthropic/claude-opus-4-7; old litellm/… strings continue to work.
Key mirroring now calls litellm.validate_environment to discover the correct provider-specific env var name instead of uppercasing the prefix, avoiding the TOGETHER_AI_API_KEY / TOGETHERAI_API_KEY class of mismatches flagged in prior review threads.
Stability fixes include suppressing LiteLLM's stdout banners, catching the ThreadPoolExecutor shutdown race on stream teardown, and routing the dedupe path through StrixProvider (was stock MultiProvider, breaking non-OpenAI configs).

Confidence Score: 5/5

Safe to merge; all changes are well-scoped simplifications with no functional regressions on the main paths.

The core routing change is straightforward and well-tested by the warm-up path. Key mirroring now delegates to litellm's own environment validation rather than a fragile hand-rolled map, fixing the previously flagged provider naming bugs. The two concerns raised are maintenance-oriented: the _supports_responses_custom_tools function uses supports_reasoning as a proxy (a reasonable assumption for current OpenAI models), and the shutdown-race guard matches on a string literal. Neither affects correctness today.

strix/config/models.py — the _supports_responses_custom_tools coupling between supports_reasoning and Responses API capability detection is worth revisiting if new OpenAI models diverge from the current pattern.

Important Files Changed

Filename	Overview
strix/config/models.py	Introduces StrixProvider and _mirror_api_key_to_provider_env; replaces hand-rolled normalisation with litellm.validate_environment. The _supports_responses_custom_tools helper conflates supports_reasoning with Responses API custom tool support.
strix/core/execution.py	Adds LiteLLM shutdown-race guard around the stream event loop; fragile string-matching on RuntimeError message.
strix/core/runner.py	Replaces normalize_model_name with a plain strip() and injects StrixProvider via RunConfig; logic straightforward and consistent with the rest of the change.
strix/interface/main.py	Adds pre-warm-up check for bare non-OpenAI names and exits early with a clear user-facing message; UX guidance updated to use provider/model form.
strix/report/dedupe.py	Switches from MultiProvider+normalize_model_name to StrixProvider, fixing non-OpenAI routing in the dedupe path.
strix/agents/factory.py	Removes reset_tool_choice=interactive; consistent with the universal removal of tool_choice="required" in this PR.
strix/core/inputs.py	Simplifies make_model_settings by dropping the model_supports_reasoning guard; reasoning params are now forwarded unconditionally and silently dropped via drop_params for unsupported models.

Prompt To Fix All With AI

Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
strix/config/models.py:125-130
**`supports_reasoning` used as a proxy for Responses API custom tool support**

`_supports_responses_custom_tools` returns `supports_reasoning` from litellm's model cost table, but the two capabilities are distinct. Today the correlation holds (o3/o4-mini have `supports_reasoning=True` and also accept native Responses API capabilities like `Filesystem`/`Shell`), but a new OpenAI model could support one without the other. If that happens, `uses_chat_completions_tool_schema` will silently mis-format tool definitions for the new model, sending Responses-API capability schemas when function-tool schemas are required (or vice versa), causing tool invocation failures at runtime with no obvious error at startup.

### Issue 2 of 2
strix/core/execution.py:359-361
**Fragile substring match on a runtime error message**

Catching `RuntimeError` by checking for `"after shutdown"` in the message string is fragile in two ways: if LiteLLM ever rewords the message (e.g. "cannot schedule new futures after interpreter shutdown"), the race exception will propagate as an unhandled error instead of being silently swallowed; and if some other `RuntimeError` happens to contain "after shutdown", it will be incorrectly suppressed. Consider matching against a specific exception type or a more structured sentinel if LiteLLM exposes one; alternatively, tightening the guard slightly reduces the blast radius.

```suggestion
                except RuntimeError as stream_exc:
                    if not any(
                        phrase in str(stream_exc)
                        for phrase in ("after shutdown", "after interpreter shutdown")
                    ):
                        raise
```

_{Reviews (6): Last reviewed commit: "Use function-tool schema for non-reasoni..." | Re-trigger Greptile}

normalize_model_name expands `claude-*` and `gemini-*` shorthands into `litellm/anthropic/...` and `litellm/gemini/...` at routing time, but the mirror helper was looking at the raw pre-normalization name — bare shorthands had no `/` and hit the early return, so ANTHROPIC_API_KEY / GEMINI_API_KEY were never populated for those users. Run the same normalization inside the mirror helper so the provider prefix is consistent with what LiteLLM actually sees downstream.

Naively uppercasing the routing prefix breaks for providers whose LiteLLM env var name doesn't match the prefix verbatim: together_ai/... needs TOGETHERAI_API_KEY (no underscore) perplexity/... needs PERPLEXITYAI_API_KEY Ask LiteLLM directly via litellm.validate_environment(model=...) which env vars it consults for the chosen provider, then setdefault each one to LLM_API_KEY. This is the SDK-blessed lookup and stays correct for every provider LiteLLM supports without a hand-maintained name map. Lowercase the routed model name before lookup so mixed-case user input (e.g. Together_AI/...) still resolves.

A bare model name without a provider prefix routes through the SDK's default OpenAI provider, so configuring STRIX_LLM=deepseek-v4-pro with LLM_API_KEY=<deepseek key> sends that key to api.openai.com and surfaces a confusing "Incorrect API key" error pointing at the OpenAI dashboard. When warm-up fails with an OpenAI-shaped error AND the configured model is still unprefixed after normalize_model_name, append a hint that points the user at the '<provider>/<model>' form with concrete examples.

When the user opts into reasoning_effort but the configured model isn't in litellm.model_cost at all (private SKUs, fresh releases the registry hasn't picked up — e.g. deepseek/deepseek-v4-pro), we can't confirm thinking support and were sending tool_choice="required", which thinking-mode endpoints reject ("Thinking mode does not support this tool_choice"). Add model_known_to_registry() and split the decision: when the user wants reasoning AND the model is either confirmed-reasoning OR unknown-to-registry, drop tool_choice. The Reasoning(effort=...) param still only attaches for confirmed-reasoning models, so we don't send reasoning hints to known non-reasoning models. Known non-reasoning models (gpt-4o, registry-confirmed) keep tool_choice="required" unchanged.

Users had to type STRIX_LLM=litellm/deepseek/deepseek-chat — the litellm/ wrapper was Strix-internal plumbing surfacing in user config. Add StrixProvider, a MultiProvider subclass that routes any non-OpenAI prefix (deepseek/, anthropic/, groq/, xai/, mistral/, openrouter/, …) through LitellmProvider with the prefix preserved. normalize_model_name no longer adds litellm/ to anything; bare claude-* / gemini-* shorthands expand to anthropic/<model> / gemini/<model> instead of the wrapped form. Wire StrixProvider into warm_up_llm and RunConfig.model_provider. litellm/<provider>/<model> and any-llm/<provider>/<model> still resolve unchanged for users on older config. Refresh stale model names in the env-validation messages and the warm-up hint (gpt-5.4, claude-opus-4-7, deepseek-reasoner). Verified 24-case end-to-end matrix: OpenAI direct vs. LitellmProvider routing, env-var mirroring via validate_environment, supports_reasoning detection, and tool_choice gating all behave correctly across modern providers including the user's unknown DeepSeek SKU.

Drop every hand-rolled provider table and per-model gating that had accumulated in the model-handling layer: * normalize_model_name no longer auto-prefixes bare claude-* / gemini-* names. Users supply the full <provider>/<model> form. The function became literally model_name.strip(), so callers now inline that and the function is removed. * tool_choice="required" is gone everywhere. Thinking-mode endpoints (Anthropic, DeepSeek /beta) reject it; modern reasoning models don't need it; non-interactive runs already have _append_noninteractive_tool_required_message as the convergence backstop. model_supports_reasoning, model_known_to_registry, and _model_cost_entry were only used to gate this and follow it out. * Reasoning(effort=...) is now attached whenever STRIX_REASONING_EFFORT is non-none. litellm.drop_params=True absorbs it for non-reasoning models. * Warm-up's bare-name OpenAI 401 hint is removed (false-positive prone, relied on substring matching). * reset_tool_choice on SandboxAgent is no-op now (no tool_choice gets set) and is removed. * report/dedupe.py was still routing through stock MultiProvider, so non-OpenAI configs failed the dedupe LLM pass; switch it to StrixProvider. Verified end-to-end against modern provider strings (openai/gpt-5.4, anthropic/claude-opus-4-7, deepseek/deepseek-reasoner, gemini/gemini-2.5-pro, groq/, xai/, mistral/, together_ai/, perplexity/, openrouter/, litellm/ legacy form, and whitespace-padded input): 18/18 cases route correctly, env vars mirror via litellm.validate_environment, and ModelSettings carries no tool_choice. mypy strict passes.

Warn on bare unknown model names before warm-up. is_known_openai_bare_model consults litellm.model_cost and matches only entries whose litellm_provider == "openai". When the configured STRIX_LLM has no provider prefix, isn't a known OpenAI model, and no LLM_API_BASE is set, show a clear panel pointing the user at the <provider>/<model> form and exit before issuing the doomed request — no more chasing an "Incorrect API key" 401 from OpenAI when the user actually meant deepseek/, anthropic/, etc. Custom-base configs are still allowed through unconfirmed. Disable LiteLLM's message-logging and streaming-logging knobs to cut noise and skip one of the two end-of-stream submit paths. The other path at streaming_handler.py:2206 schedules work on a global ThreadPoolExecutor that loses to atexit shutdown when the interpreter is winding down; the SDK's stream consumer surfaces that as a fatal "cannot schedule new futures after shutdown" RuntimeError even though the actual stream content was already delivered. Catch and swallow that specific RuntimeError in _run_cycle so the scan isn't killed by an upstream end-of-stream logging race.

litellm.suppress_debug_info silences two unsolicited print() calls in LiteLLM core: the "Provider List: https://docs.litellm.ai/docs/providers" banner emitted by get_llm_provider_logic and the "Give Feedback / Get Help" + "If you need to debug this error, use litellm._turn_on_debug()" pair emitted by exception_mapping_utils on every LiteLLM exception. Both are unconditional print() calls, not logger output, so log-level config can't catch them. LiteLLM's own router and proxy_server set the same flag for the same reason.

OpenAI's Responses API rejects tools[i].type="custom" on non-reasoning models like gpt-4o (400 with code=unknown_parameter, param=tools). Strix's SDK-native Filesystem capability registers CustomTool entries by default, so a bare STRIX_LLM=gpt-4o run failed at the first tool invocation even though warm-up (a tool-less call) succeeded. uses_chat_completions_tool_schema now consults litellm.model_cost[<name>].supports_reasoning for OpenAI routes and flips to the chat-completions function-tool schema for models that don't carry the reasoning flag. Same registry-lookup pattern as is_known_openai_bare_model. Non-OpenAI prefixes and configs with LLM_API_BASE are unchanged (still function tools).

The two-bucket design (estimated + observed) shipped in #524 was over-engineered for what the user actually wants (one accurate total) and had a real sum-vs-total inconsistency flagged by Greptile when a provider populates input_tokens/output_tokens but leaves total_tokens unset. Collapse LLMUsageLedger to a single _total_cost. record() still skips the tokens-times-registry estimate for LiteLLM-routed models so we do not double-count the callback, but everything lands in one bucket. to_record() apportions the total to agents by token share, using _resolve_total_tokens (which falls back to input + output when total_tokens is 0). The denominator and per-agent numerators read from the same map, so per-agent shares now sum exactly to the total. The cost_source label is removed. litellm_cost_callback and its private extractor move from a separate report/cost_capture.py module into report/state.py, collapsed into a single self-contained adapter at the bottom of that file. cost_capture is deleted. ReportState gains a public record_observed_llm_cost(cost) method so the callback no longer reaches into report_state._llm_usage as a private attribute. hydrate() loads the persisted cost back into _total_cost with no attempt to bucket it as estimated vs observed, which removes the resume drift where a previously-observed run picked up a "mixed" label on resume.

greptile-apps Bot reviewed Jun 7, 2026

View reviewed changes

Comment thread strix/config/models.py Outdated

greptile-apps Bot reviewed Jun 7, 2026

View reviewed changes

Comment thread strix/config/models.py Outdated

0xallam added 6 commits June 7, 2026 22:58

greptile-apps Bot reviewed Jun 7, 2026

View reviewed changes

Comment thread strix/interface/main.py

0xallam mentioned this pull request Jun 7, 2026

[BUG] LLM Connection to Bedrock API not working #511

Closed

0xallam added 2 commits June 8, 2026 02:38

Bump litellm 1.83.7 -> 1.88.0

3b40a6c

0xallam changed the title ~~fix: mirror LLM_API_KEY into provider-specific env vars (closes #504)~~ Jun 7, 2026

0xallam merged commit dcf3155 into main Jun 8, 2026
2 checks passed

0xallam deleted the fix/deepseek-auth-and-oai-compatible branch June 8, 2026 00:36

0xallam mentioned this pull request Jun 8, 2026

Simplify cost ledger to one bucket #531

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Simplify LLM configuration layer (closes #504)#524

Simplify LLM configuration layer (closes #504)#524
0xallam merged 11 commits into
mainfrom
fix/deepseek-auth-and-oai-compatible

0xallam commented Jun 7, 2026 •

edited

Loading

greptile-apps Bot commented Jun 7, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Labels

1 participant

Uh oh!

Conversation

0xallam commented Jun 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes

Side stability fixes

greptile-apps Bot commented Jun 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Labels

1 participant

0xallam commented Jun 7, 2026 •

edited

Loading

greptile-apps Bot commented Jun 7, 2026 •

edited

Loading