-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Description
[[evals]]
model = "gemini:gemini-1.5-pro-latest"
prompt = """
Question: Is the following statement true or false? "Python was created before Java."
Think through this step by step:
Step 1: When was Python created?
Step 2: When was Java created?
Step 3: Compare the dates
Format your response as:
1. [Step 1 reasoning]
2. [Step 2 reasoning]
3. [Step 3 reasoning]
Final Answer: True/False - [brief explanation]
"""
judge_model = "gemini:gemini-1.5-pro-latest"
expected = "True - Python (1991) was created before Java (1995)"
re: gemini
✅ Use natural numbered steps in your prompts ("1. ... 2. ... 3. ...")
✅ Your parsing code handles it - markdown parsing will catch Gemini's output
✅ Use Claude/GPT-4 as judge - they're better at evaluating reasoning quality
Metadata
Metadata
Assignees
Labels
No labels