Inference Index

←All datasets

Directory · Datasets · Evaluation

Evaluation

SWE-Bench

2,294 real GitHub issues from 12 popular Python repos. The gold-standard dataset for coding agents.

Size

2.3K issues

Format

jsonl

License

MIT

Maintainer

Princeton NLP

What it\u2019s for

2,294 real GitHub issues from 12 popular Python repos. The gold-standard dataset for coding agents.