Inference Index
All datasets
Directory · Datasets · Code
Code

The Stack v2

The largest open-source code dataset. 67.5TB of permissively licensed source code across 600+ languages.

Size
67.5TB
Format
parquet
License
Multiple (per-file)
Maintainer
BigCode / HuggingFace

What it\u2019s for

The largest open-source code dataset. 67.5TB of permissively licensed source code across 600+ languages.

Known training usage