AI prompt editor and evaluations tooling now supports multi-turn conversations
You can now save and evaluate multi-turn conversations in the GitHub Models prompt editor and evaluations tooling!
Until now, the evaluations tooling only supported a single user prompt. With this update, you can include up to four rounds of user and assistant messages directly in your .prompt.yml
file and test how models respond at the end of a longer interaction. In the API, you can include unlimited pairings.
This is especially useful for:
- Testing memory and context retention. For example, in the case of a travel bot, does it still recommend snowy places by turn four after the user says “I want a cold destination” in turn two?
- Ensuring consistent behavior as instructions evolve. For example, a shopping assistant where the user first says “make it under $100,” then later changes it to “under $200,” and the assistant correctly adjusts its recommendations.
- Evaluating real-world chat flows. For example, a customer support agent that needs to escalate properly after several back-and-forth troubleshooting steps.
Start building AI apps with GitHub Models today
GitHub Models and all our AI development tooling are available now to all GitHub users in public preview. This includes prompt editing and lightweight evaluations. Try our tools out by enabling them in your repository or organization, or learn more in our documentation.
Help us shape what’s next
We’re just getting started, and your feedback helps guide our roadmap. Join the community discussion to share your thoughts and connect with other developers building the future of AI on GitHub.