GitHub - ccarvalho-eng/aludel: LLM Evaluation for Phoenix Apps

A Phoenix-native workbench for comparing providers, tracking prompt history, and running regression suites.

Aludel gives teams a clean way to evaluate prompt and model behavior without inventing their own tooling first.

Compare the same prompt across OpenAI, Anthropic, Gemini, and Ollama.
Inspect output, latency, token usage, and cost side by side.
Version prompts and see how changes affect results over time.
Run evaluation suites with assertions and document attachments.
Use it inside an existing Phoenix app or run it standalone.

Why Aludel

Most teams evaluating LLM behavior end up with some combination of scripts, spreadsheets, and ad hoc dashboards. Aludel brings that work into one place with a UI that is practical enough for day-to-day iteration.

Provider comparison: run the same input across models and vendors in one view.
Prompt history: keep prompt changes traceable instead of losing them in copy-pasted variants.
Regression coverage: turn important scenarios into repeatable suites with assertions.
Phoenix-native deployment: mount it in your app or run it as a standalone dashboard.

Structured Output Scoring

Suites support strict string assertions and structured JSON checks.

For structured outputs, use json_deep_compare to score partial matches instead of forcing all-or-nothing pass/fail outcomes.

[
  {
    "type": "json_deep_compare",
    "expected": {
      "status": "ok",
      "customer": {
        "name": "Jane",
        "tier": "gold"
      }
    },
    "threshold": 75.0
  }
]

Aludel stores field-level comparison details, per-test match scores, and suite-run average scores so prompt evolution and exports can track structured output quality over time.

Quick Start

Embed in an existing Phoenix app

Requirements:

Elixir and Phoenix
PostgreSQL 12+

Aludel depends on PostgreSQL-specific features, including JSONB, percentile_disc(), and DATE()-based aggregations. SQLite and MySQL are not supported.

1. Add the dependency

def deps do
  [
    {:aludel, "~> 0.1"}
  ]
end

mix deps.get

2. Configure the repo

config :aludel, repo: YourApp.Repo

3. Install and run migrations

mix aludel.install
mix ecto.migrate

4. Mount the dashboard

use YourAppWeb, :router
import Aludel.Web.Router

if Mix.env() == :dev do
  scope "/dev" do
    pipe_through :browser
    aludel_dashboard "/aludel"
  end
end

5. Start using it

Visit your configured path, for example http://localhost:4000/dev/aludel.

Standalone mode

If you want to run Aludel by itself:

git clone https://github.com/ccarvalho-eng/aludel.git
cd aludel/standalone
mix deps.get
mix ecto.create
mix ecto.migrate
mix phx.server

To populate the local database with sample prompts, providers, and suites:

mix aludel.seed

Visit http://localhost:4000.

Provider support

Aludel supports OpenAI, Anthropic, Google Gemini, and Ollama.

Provider	API key required	Notes
OpenAI	Yes	Configure with `OPENAI_API_KEY`
Anthropic	Yes	Configure with `ANTHROPIC_API_KEY`
Google Gemini	Yes	Configure with `GOOGLE_API_KEY`
Ollama	No	Runs locally

For embedded apps, configure provider keys in config/runtime.exs:

# In config/runtime.exs
config :aludel, :llm,
  openai_api_key: System.get_env("OPENAI_API_KEY"),
  anthropic_api_key: System.get_env("ANTHROPIC_API_KEY"),
  google_api_key: System.get_env("GOOGLE_API_KEY")

Ollama runs locally and does not require an API key.

Document Storage

Uploaded test case documents go through Aludel.Storage.

Development uses the local filesystem adapter from config/dev.exs.
Production uses config/runtime.exs and requires ALUDEL_STORAGE_BACKEND.

Development storage

Development stores uploaded documents on the local filesystem.

Production storage

Set ALUDEL_STORAGE_BACKEND to aws or gcs.

For AWS S3:

export ALUDEL_STORAGE_BACKEND=aws
export AWS_S3_BUCKET=aludel-uploads
export AWS_REGION=us-east-1
export AWS_ACCESS_KEY_ID=...
export AWS_SECRET_ACCESS_KEY=...

For Google Cloud Storage:

export ALUDEL_STORAGE_BACKEND=gcs
export GCS_BUCKET=aludel-uploads
export GOOGLE_APPLICATION_CREDENTIALS=/absolute/path/to/service-account.json

If your GCS bucket requires requester-pays access, also set:

export GCS_USER_PROJECT=your-billing-project-id

The GCS adapter uses Goth with standard Google application credentials. GOOGLE_APPLICATION_CREDENTIALS_JSON also works if you prefer inline JSON.

Documentation

The README is intentionally optimized for first contact. For deeper setup, usage, and contribution details:

Development

For local development:

mix deps.get
mix compile
mix test
mix precommit

If you are changing frontend assets:

mix assets.build
mix compile --force

For standalone development, run the app from the standalone directory:

cd standalone
mix phx.server

If you change frontend assets, rebuild them from the repo root and restart the standalone server:

mix assets.build
mix compile --force

License

Apache License 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 544 Commits
.github		.github
assets		assets
config		config
lib		lib
priv		priv
standalone		standalone
test		test
.credo.exs		.credo.exs
.dockerignore		.dockerignore
.env.example		.env.example
.formatter.exs		.formatter.exs
.gitignore		.gitignore
.sobelow-conf		.sobelow-conf
.tool-versions		.tool-versions
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
codecov.yml		codecov.yml
coveralls.json		coveralls.json
docker-compose.yaml		docker-compose.yaml
mix.exs		mix.exs
mix.lock		mix.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Why Aludel

Structured Output Scoring

Quick Start

Embed in an existing Phoenix app

Standalone mode

Provider support

Document Storage

Development storage

Production storage

Documentation

Development

License

About

Uh oh!

Releases 22

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Why Aludel

Structured Output Scoring

Quick Start

Embed in an existing Phoenix app

Standalone mode

Provider support

Document Storage

Development storage

Production storage

Documentation

Development

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 22

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages