Skip to content

feat(agents): add streaming support for completions (GAP-004)#2277

Draft
Hartorn wants to merge 1 commit intomainfrom
feat/agents-gap004-streaming
Draft

feat(agents): add streaming support for completions (GAP-004)#2277
Hartorn wants to merge 1 commit intomainfrom
feat/agents-gap004-streaming

Conversation

@Hartorn
Copy link
Copy Markdown
Member

@Hartorn Hartorn commented Feb 26, 2026

  • StreamChunk model with field descriptions
  • BaseGenerator.stream() with fallback to complete()
  • LiteLLMGenerator.stream() with real litellm streaming and proper delta-merging for incremental tool_calls
  • ChatWorkflow.stream_steps() async context manager
  • _StepRunner.execute_streaming() sharing _build_step() with execute() to reduce duplication
  • Refactored LiteLLMGenerator with _build_params() for DRY

Made-with: Cursor

Description

Related Issue

Type of Change

  • 📚 Examples / docs / tutorials / dependencies update
  • 🔧 Bug fix (non-breaking change which fixes an issue)
  • 🥂 Improvement (non-breaking change which improves an existing feature)
  • 🚀 New feature (non-breaking change which adds functionality)
  • 💥 Breaking change (fix or feature that would cause existing functionality to change)
  • 🔐 Security fix

Checklist

  • I've read the CODE_OF_CONDUCT.md document.
  • I've read the CONTRIBUTING.md guide.
  • I've written tests for all new methods and classes that I created.
  • I've written the docstring in Google format for all the methods and classes that I used.
  • I've updated the pdm.lock running pdm update-lock (only applicable when pyproject.toml has been
    modified)
- StreamChunk model with field descriptions
- BaseGenerator.stream() with fallback to complete()
- LiteLLMGenerator.stream() with real litellm streaming and proper
  delta-merging for incremental tool_calls
- ChatWorkflow.stream_steps() async context manager
- _StepRunner.execute_streaming() sharing _build_step() with execute()
  to reduce duplication
- Refactored LiteLLMGenerator with _build_params() for DRY

Made-with: Cursor
@Hartorn Hartorn self-assigned this Feb 26, 2026
@Hartorn Hartorn marked this pull request as draft February 26, 2026 13:36
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @Hartorn, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the agent framework by integrating streaming capabilities for language model completions. This allows for more interactive and responsive agent interactions, as users can receive partial responses and observe the agent's thought process in real-time. The changes involve introducing new data models for stream chunks, extending the base generator with streaming methods, and providing a concrete streaming implementation for LiteLLM, along with a new workflow context manager to expose these features.

Highlights

  • Streaming Completion Support: Introduced a new StreamChunk model to represent incremental tokens and tool calls during streaming completions, enabling real-time feedback from language models.
  • Base Generator Streaming API: Added an asynchronous stream() method to the BaseGenerator class, which provides a default fallback to the existing complete() method for generators that do not natively support streaming.
  • LiteLLM Streaming Implementation: Implemented native streaming for LiteLLMGenerator, including sophisticated logic to merge incremental tool_calls deltas received from LiteLLM into complete ToolCall objects at the end of the stream.
  • Workflow Streaming Context Manager: Provided a new stream_steps() asynchronous context manager in ChatWorkflow that allows consumers to receive both StreamChunk objects during completion and WorkflowStep objects for tool execution results, offering a granular view of the workflow progression.
  • Code Refactoring for Reusability: Refactored _StepRunner to share the _build_step() method between execute() and execute_streaming() and introduced _build_params() in LiteLLMGenerator to reduce code duplication and improve maintainability.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • libs/giskard-agents/src/giskard/agents/init.py
    • Imported StreamChunk from giskard.agents.generators.
    • Added StreamChunk to the module's __all__ export list.
  • libs/giskard-agents/src/giskard/agents/generators/init.py
    • Imported StreamChunk from the local base module.
    • Added StreamChunk to the module's __all__ export list.
  • libs/giskard-agents/src/giskard/agents/generators/base.py
    • Imported AsyncIterator from collections.abc.
    • Imported ToolCall from ..tools.
    • Defined the StreamChunk Pydantic model to represent incremental streaming data.
    • Added an asynchronous stream method to BaseGenerator with a default implementation that falls back to complete().
  • libs/giskard-agents/src/giskard/agents/generators/litellm_generator.py
    • Imported AsyncIterator and Any for type hinting.
    • Imported Function, ToolCall from ..tools.tool.
    • Imported StreamChunk from the local base module.
    • Extracted parameter building logic into a new _build_params method to reduce duplication.
    • Implemented the stream method for LiteLLMGenerator to handle actual LiteLLM streaming responses.
    • Added logic within the stream method to accumulate and merge incremental tool_calls deltas from LiteLLM.
  • libs/giskard-agents/src/giskard/agents/workflow.py
    • Imported StreamChunk from giskard.agents.generators.
    • Extracted the _build_step method from _StepRunner to centralize step creation logic.
    • Added an execute_streaming method to _StepRunner to process workflow steps with streaming completions.
    • Implemented stream_steps as an asynchronous context manager in ChatWorkflow to expose streaming capabilities to the user.
  • libs/giskard-agents/tests/test_streaming.py
    • Added a new test file to cover the streaming functionality.
    • Included tests for BaseGenerator's stream fallback, StreamChunk delta accumulation, and tool call handling in streaming.
    • Added a test for ChatWorkflow.stream_steps() to verify it yields both StreamChunk and WorkflowStep objects.
Activity
  • The pull request was created by Hartorn, introducing streaming support for agent completions.
  • The author provided a detailed list of changes in the PR body, outlining the new models, methods, and refactorings.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces comprehensive streaming support for completions, which is a valuable addition to the agents' capabilities. The changes are well-structured, including a new StreamChunk model, a default streaming implementation in BaseGenerator, and a specialized LiteLLMGenerator streaming method that correctly handles incremental tool call deltas. The refactoring of _build_params and _build_step improves code organization and reduces duplication. Furthermore, the ChatWorkflow.stream_steps() context manager provides a clean API for consuming streaming events, and the new test suite adequately covers the new functionality. Overall, the implementation is robust and well-thought-out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

1 participant