Skip to content

fix: graceful fallback for newer model families in count_message_tokens()#235

Open
ArivunidhiA wants to merge 1 commit into
AgentOps-AI:mainfrom
ArivunidhiA:fix/handle-new-model-families-228-231
Open

fix: graceful fallback for newer model families in count_message_tokens()#235
ArivunidhiA wants to merge 1 commit into
AgentOps-AI:mainfrom
ArivunidhiA:fix/handle-new-model-families-228-231

Conversation

@ArivunidhiA

Copy link
Copy Markdown

Fixes #228, #231

Problem

count_message_tokens() raises KeyError for any model not in its hardcoded if/elif chain — including gpt-5-nano, gpt-4.1, gpt-4.5-preview, Gemini models, etc.

Fix

Replace raise KeyError with a graceful fallback using the same token overhead all modern OpenAI models use (tokens_per_message=3, tokens_per_name=1). A warning is logged for visibility. Encoding is already handled correctly by tiktoken's own fallback.

Before:

count_message_tokens([{"role": "user", "content": "Hello"}], "gpt-5-nano")
# KeyError: num_tokens_from_messages() is not implemented for model gpt-5-nano

After:

count_message_tokens([{"role": "user", "content": "Hello"}], "gpt-5-nano")
# 8

Changes

  • `tokencost/costs

Problem

count_message_tokens() raises KeyError for any model not in its hardcoded if/elif chain — including gpt-5-nano, gpt-4.1, gpt-4.5-preview, Gemini models, etc.

Fix

Replace `raise Ke133 tests pass. Zero new lint errors.

…ns()

Replace raise KeyError with graceful fallback using default token-per-message
overhead (tokens_per_message=3, tokens_per_name=1) for models not in the
hardcoded if/elif chain. This matches the format used by all modern OpenAI
chat models since gpt-4-0613.

Fixes AgentOps-AI#228, AgentOps-AI#231
Copilot AI review requested due to automatic review settings March 22, 2026 06:46

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates count_message_tokens() to avoid raising KeyError for unknown/newer chat model identifiers by falling back to the modern default message overhead, and expands tests to cover the new behavior (addressing #228 and #231).

Changes:

  • Replace the unknown-model KeyError in count_message_tokens() with a warning + default overhead (tokens_per_message=3, tokens_per_name=1).
  • Update existing tests that expected KeyError and add new coverage for newer/unknown model names and prefixes.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
tokencost/costs.py Adds a graceful fallback path for unknown/new model token counting instead of raising KeyError.
tests/test_costs.py Adjusts expectations for unknown models and adds new tests covering fallback behavior for newer model families.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tokencost/costs.py
Comment on lines +263 to +271
# For newer model families (gpt-5, gpt-4.1, gpt-4.5, gemini, etc.)
# and any unknown models, use the modern default token format.
# All recent OpenAI chat models use tokens_per_message=3, tokens_per_name=1.
logger.warning(
f"Warning: model {model} is not explicitly handled for token counting. "
"Using default token-per-message overhead."
)
tokens_per_message = 3
tokens_per_name = 1

Copilot AI Mar 22, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new fallback block is intended to cover model families like gpt-4.1 / gpt-4.5-*, but those strings currently match the earlier elif "gpt-4" in model: branch and get redirected to gpt-4-0613. That bypasses this fallback and can also force the wrong tiktoken encoding for gpt-4.1/gpt-4.5 if they use a different encoding than gpt-4-0613. Consider handling gpt-4.1 / gpt-4.5 explicitly before the generic "gpt-4" in model check, or tightening that check (e.g., only gpt-4 + optional - suffixes).

Copilot uses AI. Check for mistakes.
Comment thread tokencost/costs.py
Comment on lines +266 to 269
logger.warning(
f"Warning: model {model} is not explicitly handled for token counting. "
"Using default token-per-message overhead."
)

Copilot AI Mar 22, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logger.warning(...) already conveys severity; prefixing the message itself with "Warning:" makes logs noisier and inconsistent with other warnings in this module (some include the prefix, some don’t). Consider removing the redundant "Warning:" prefix and keeping the message content focused on the condition and fallback being used.

Copilot uses AI. Check for mistakes.
Comment thread tests/test_costs.py
Comment on lines +97 to 105
"""Empty input (not a list) should raise an error during encoding"""
# Empty string is not a valid message list; the function will attempt
# to iterate over it and fail or produce 0 tokens.
# With the graceful fallback, an empty string iterates as zero messages.
result = count_message_tokens("", "")
# Empty string iterates as zero messages, plus 3 for reply priming
assert result == 3


Copilot AI Mar 22, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test now passes a str for messages (despite the function contract being List[Dict[str, str]]) and asserts a numeric result. That’s brittle: an empty string happens to iterate to zero messages and returns 3, but any non-empty string will iterate characters and then crash at message.items(). It would be better to have count_message_tokens() validate messages is a list of dicts and raise TypeError for invalid inputs, and update this test to assert that error instead of relying on the empty-string edge case.

Suggested change
"""Empty input (not a list) should raise an error during encoding"""
# Empty string is not a valid message list; the function will attempt
# to iterate over it and fail or produce 0 tokens.
# With the graceful fallback, an empty string iterates as zero messages.
result = count_message_tokens("", "")
# Empty string iterates as zero messages, plus 3 for reply priming
assert result == 3
"""Empty input (not a list) should raise a TypeError."""
# Empty string is not a valid message list; count_message_tokens should
# validate the input type and raise a TypeError for invalid inputs.
with pytest.raises(TypeError):
count_message_tokens("", "")
Copilot uses AI. Check for mistakes.
Comment thread tests/test_costs.py
Comment on lines +370 to +371
assert isinstance(result, int)
assert result == 4 # "Hello, world!" = 4 tokens in cl100k_base

Copilot AI Mar 22, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring/comment for this new test says the new models work “via cl100k_base fallback”, but count_string_tokens() will use tiktoken.encoding_for_model(model) when available (no fallback). To keep this test accurate and resilient to future tiktoken model mappings, consider deriving the expected token count using the same encoding-resolution logic as the production function (try encoding_for_model, else cl100k_base) rather than hard-coding 4.

Suggested change
assert isinstance(result, int)
assert result == 4 # "Hello, world!" = 4 tokens in cl100k_base
import tiktoken
try:
encoding = tiktoken.encoding_for_model(model)
except KeyError:
encoding = tiktoken.get_encoding("cl100k_base")
expected = len(encoding.encode(STRING))
assert isinstance(result, int)
assert result == expected
Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants