数据集:
biglam/gutenberg-poetry-corpus
任务:
文本生成子任务:
language-modeling语言:
en计算机处理:
monolingual大小:
1M<n<10M语言创建人:
found批注创建人:
no-annotation许可:
cc0-1.0This corpus was originally published under the CC0 license by Allison Parrish . Please visit Allison's fantastic accompanying GitHub repository for usage inspiration as well as more information on how the data was mined, how to create your own version of the corpus, and examples of projects using it.
This dataset contains 3,085,117 lines of poetry from hundreds of Project Gutenberg books. Each line has a corresponding gutenberg_id (1191 unique values) from project Gutenberg.
Dataset({ features: ['line', 'gutenberg_id'], num_rows: 3085117 })
A row of data looks like this:
{'line': 'And retreated, baffled, beaten,', 'gutenberg_id': 19}