Official Leaderboards
the 1st to Benchmark LLMs on repository-level microservice generation from README specs.
RepoGenesis evaluates 106 repos (60 Python, 46 Java) across three metrics: Pass@1 (test pass rate), AC (API Coverage), and DSR (Deployment Success Rate). Learn more here.
Verified evaluates systems on 30 verified repos (22 Python + 8 Java) using the full Docker-based harness with all three metrics: Pass@1, AC, and DSR.
Filters:
| # | Model | Category | Lang | Pass@1 | AC | DSR | Org | Date | |
|---|---|---|---|---|---|---|---|---|---|
| Loading... | |||||||||