Inference Index
All datasets
Directory · Datasets · Evaluation
Evaluation

SWE-Bench

2,294 real GitHub issues from 12 popular Python repos. The gold-standard dataset for coding agents.

Size
2.3K issues
Format
jsonl
License
MIT
Maintainer
Princeton NLP

What it\u2019s for

2,294 real GitHub issues from 12 popular Python repos. The gold-standard dataset for coding agents.