✨ Introduction

Ragas is a library that helps you move from "vibe checks" to systematic evaluation loops for your AI applications. It provides tools to supercharge the evaluation of Large Language Model (LLM) applications, enabling you to evaluate your LLM applications with ease and confidence.

Why Ragas?

Traditional evaluation metrics don't capture what matters for LLM applications. Manual evaluation doesn't scale. Ragas solves this by combining LLM-driven metrics with systematic experimentation to create a continuous improvement loop.

Key Features

Experiments-first approach: Evaluate changes consistently with experiments. Make changes, run evaluations, observe results, and iterate to improve your LLM application.
Ragas Metrics: Create custom metrics tailored to your specific use case with simple decorators or use our library of available metrics. Learn more about metrics in Ragas.
Easy to integrate: Built-in dataset management, result tracking, and integration with popular frameworks like LangChain, LlamaIndex, and more.

🚀 Get Started

Start evaluating in 5 minutes with our quickstart guide.

Get Started
📚 Core Concepts

Understand experiments, metrics, and datasets—the building blocks of effective evaluation.

Core Concepts
🛠️ How-to Guides

Integrate Ragas into your workflow with practical guides for specific use cases.

How-to Guides
📖 References

API documentation and technical details for diving deeper.

References

Want help improving your AI application using evals?

In the past 2 years, we have seen and helped improve many AI applications using evals.

We are compressing this knowledge into a product to replace vibe checks with eval loops so that you can focus on building great AI applications.

If you want help with improving and scaling up your AI application using evals, 🔗 Book a slot or drop us a line: founders@explodinggradients.com.