Vibe Bench

>

CLI benchmark for evaluating AI coding tools on real engineering tasks.

Unlike model-centric leaderboards, VibeBench treats each tool variant (claude-code sonnet-4.1, codex gpt-5-reasoned) as a distinct system with reproducible pass criteria.

End-to-end evaluation: prompt → tool run → patch extraction → validation.

Get Notified When We Launch

We're finalizing our evaluation methodology, dataset, and scoring system. Register your interest to be the first to know when the full benchmark is live.

Register Your Interest

We'll only use your email to notify you about VibeBench updates.

AI Companies: Help Us Benchmark Your Tools

Sponsor VibeBench by providing free API credits or tool access. Get early access to detailed evaluation results and featured placement on our leaderboard.