数据集:

vesteinn/babylm

中文

ATTENTION

This is preprocessed data for the BabyLM challenge https://babylm.github.io/ If you want the raw unprocessed files, you should download them directly.