←All datasets
Directory · Datasets · Evaluation
EvaluationSWE-Bench
2,294 real GitHub issues from 12 popular Python repos. The gold-standard dataset for coding agents.
Size
2.3K issues
Format
jsonl
License
MIT
Maintainer
Princeton NLP
What it\u2019s for
2,294 real GitHub issues from 12 popular Python repos. The gold-standard dataset for coding agents.