←All datasets
Directory · Datasets · Domain
DomainarXiv Dataset
Full-text snapshot of the arXiv preprint archive. The substrate of every AI-science-assistant startup.
Size
2.3M papers
Format
parquet
License
CC0 (metadata) + per-paper
Maintainer
arXiv / Cornell
What it\u2019s for
Full-text snapshot of the arXiv preprint archive. The substrate of every AI-science-assistant startup.