数据集:

biglam/gutenberg-poetry-corpus

语言:

en

计算机处理:

monolingual

大小:

1M<n<10M

语言创建人:

found

批注创建人:

no-annotation

许可:

cc0-1.0
中文

Allison Parrish's Gutenberg Poetry Corpus

This corpus was originally published under the CC0 license by Allison Parrish . Please visit Allison's fantastic accompanying GitHub repository for usage inspiration as well as more information on how the data was mined, how to create your own version of the corpus, and examples of projects using it.

This dataset contains 3,085,117 lines of poetry from hundreds of Project Gutenberg books. Each line has a corresponding gutenberg_id (1191 unique values) from project Gutenberg.

Dataset({
    features: ['line', 'gutenberg_id'],
    num_rows: 3085117
})

A row of data looks like this:

{'line': 'And retreated, baffled, beaten,', 'gutenberg_id': 19}