←All datasets
Directory · Datasets · Code
CodeThe Stack v2
The largest open-source code dataset. 67.5TB of permissively licensed source code across 600+ languages.
Size
67.5TB
Format
parquet
License
Multiple (per-file)
Maintainer
BigCode / HuggingFace
What it\u2019s for
The largest open-source code dataset. 67.5TB of permissively licensed source code across 600+ languages.