β¨ Introduction
Ragas is a library that helps you move from "vibe checks" to systematic evaluation loops for your AI applications. It provides tools to supercharge the evaluation of Large Language Model (LLM) applications, enabling you to evaluate your LLM applications with ease and confidence.
Why Ragas?
Traditional evaluation metrics don't capture what matters for LLM applications. Manual evaluation doesn't scale. Ragas solves this by combining LLM-driven metrics with systematic experimentation to create a continuous improvement loop.
Key Features
-
Experiments-first approach: Evaluate changes consistently with
experiments. Make changes, run evaluations, observe results, and iterate to improve your LLM application. -
Ragas Metrics: Create custom metrics tailored to your specific use case with simple decorators or use our library of available metrics. Learn more about metrics in Ragas.
-
Easy to integrate: Built-in dataset management, result tracking, and integration with popular frameworks like LangChain, LlamaIndex, and more.
-
π Get Started
Start evaluating in 5 minutes with our quickstart guide.
-
π Core Concepts
Understand experiments, metrics, and datasetsβthe building blocks of effective evaluation.
-
π οΈ How-to Guides
Integrate Ragas into your workflow with practical guides for specific use cases.
-
π References
API documentation and technical details for diving deeper.
Want help improving your AI application using evals?
In the past 2 years, we have seen and helped improve many AI applications using evals.
We are compressing this knowledge into a product to replace vibe checks with eval loops so that you can focus on building great AI applications.
If you want help with improving and scaling up your AI application using evals, π Book a slot or drop us a line: founders@explodinggradients.com.