News OpenAI News 2026-02-23

Why we no longer evaluate SWE-bench Verified

SWE-bench Verified is increasingly contaminated and mismeasures frontier coding progress. Our analysis shows flawed tests and training leakage. We recommend SWE-bench Pro.

1 0
Share:

No detailed content yet

Discussion

Leave a Comment

0/2000
...
= ?