RepoGenesis evaluates 106 repos (60 Python, 46 Java) across three metrics: Pass@1 (test pass rate), AC (API Coverage), and DSR (Deployment Success Rate). Learn more here.

Verified evaluates systems on 30 verified repos (22 Python + 8 Java) using the full Docker-based harness with all three metrics: Pass@1, AC, and DSR.
Filters:
# Model Category Lang Pass@1 AC DSR Org Date
Loading...