Skip to content

docs: add benchmarks directory#291

Merged
0xallam merged 1 commit into
mainfrom
adding-benchmarks
Jan 23, 2026
Merged

docs: add benchmarks directory#291
0xallam merged 1 commit into
mainfrom
adding-benchmarks

Conversation

@0xallam

@0xallam 0xallam commented Jan 23, 2026

Copy link
Copy Markdown
Member

Summary

  • Add benchmarks directory with XBEN results overview
  • Links to full benchmark repo at usestrix/benchmarks

Details

  • 96% success rate on XBOW benchmark (100/104 challenges)
  • Strix v0.4.0 in black-box mode
  • Performance breakdown by difficulty level
  • Resource usage stats
@0xallam 0xallam merged commit 2bc1e5e into main Jan 23, 2026
2 checks passed
@0xallam 0xallam deleted the adding-benchmarks branch January 23, 2026 19:04
@greptile-apps

greptile-apps Bot commented Jan 23, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds a new benchmarks directory with documentation of Strix's performance on the XBEN security benchmark suite. The documentation includes a 96% success rate (100/104 challenges) achieved by Strix v0.4.0 in black-box mode, performance breakdown by difficulty level, and resource usage statistics.

Changes:

  • New benchmarks/README.md file documenting XBEN benchmark results
  • References to the external usestrix/benchmarks repository for full details
  • Performance metrics and visualization using a mermaid pie chart
  • Links to external benchmark resources on GitHub

Issue found:

  • Inconsistent benchmark naming: line 13 refers to "XBOW benchmark" but should be "XBEN" to match the heading, table, and link text.

Confidence Score: 4/5

  • This PR is safe to merge with a minor naming inconsistency that should be corrected.
  • The PR introduces documentation for benchmark results, which is a low-risk change. The content is clear and informative with proper links to external resources. One syntax issue was found: the benchmark is referred to as "XBOW" in the description but "XBEN" elsewhere, causing an inconsistency. This is a simple fix that should be corrected before merging. The rest of the content including statistics, tables, and formatting is accurate and well-structured.
  • benchmarks/README.md - correct the "XBOW" naming to "XBEN" on line 13

Important Files Changed

Filename Overview
benchmarks/README.md New benchmarks directory with documentation of Strix's performance on XBEN benchmark. Contains 96% success rate results, performance breakdown by difficulty, and resource usage metrics. One naming inconsistency found: "XBOW" vs "XBEN" on line 13 should be corrected for consistency.

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Comment thread benchmarks/README.md

### XBEN

The [XBOW benchmark](https://github.com/usestrix/benchmarks/tree/main/XBEN) is a set of 104 web security challenges designed to evaluate autonomous penetration testing agents. Each challenge follows a CTF format where the agent must discover and exploit vulnerabilities to extract a hidden flag.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

syntax: The benchmark is referred to as "XBOW" here but the heading and links use "XBEN". This inconsistency may confuse readers. Should be consistent - likely should be "XBEN" based on the table and link text above.

Suggested change
The [XBOW benchmark](https://github.com/usestrix/benchmarks/tree/main/XBEN) is a set of 104 web security challenges designed to evaluate autonomous penetration testing agents. Each challenge follows a CTF format where the agent must discover and exploit vulnerabilities to extract a hidden flag.
The [XBEN benchmark](https://github.com/usestrix/benchmarks/tree/main/XBEN) is a set of 104 web security challenges designed to evaluate autonomous penetration testing agents. Each challenge follows a CTF format where the agent must discover and exploit vulnerabilities to extract a hidden flag.
Prompt To Fix With AI
This is a comment left during a code review.
Path: benchmarks/README.md
Line: 13:13

Comment:
**syntax:** The benchmark is referred to as "XBOW" here but the heading and links use "XBEN". This inconsistency may confuse readers. Should be consistent - likely should be "XBEN" based on the table and link text above.

```suggestion
The [XBEN benchmark](https://github.com/usestrix/benchmarks/tree/main/XBEN) is a set of 104 web security challenges designed to evaluate autonomous penetration testing agents. Each challenge follows a CTF format where the agent must discover and exploit vulnerabilities to extract a hidden flag.
```

How can I resolve this? If you propose a fix, please make it concise.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant