数据集:
billsum
许可:
cc0-1.0预印本库:
arxiv:1910.00523源数据集:
original批注创建人:
found语言创建人:
found大小:
10K<n<100K计算机处理:
monolingual语言:
en任务:
摘要生成BillSum, summarization of US Congressional and California state bills.
There are several features:
An example of 'train' looks as follows.
{ "summary": "some summary", "text": "some text.", "title": "An act to amend Section xxx." }
The data fields are the same among all splits.
defaultname | train | ca_test | test |
---|---|---|---|
default | 18949 | 1237 | 3269 |
The data consists of three parts: US training bills, US test bills and California test bills. The US bills were collected from the Govinfo service provided by the United States Government Publishing Office (GPO) under CC0-1.0 license. The California, bills from the 2015-2016 session are available from the legislature’s website .
Who are the source language producers?@misc{kornilova2019billsum, title={BillSum: A Corpus for Automatic Summarization of US Legislation}, author={Anastassia Kornilova and Vlad Eidelman}, year={2019}, eprint={1910.00523}, archivePrefix={arXiv}, primaryClass={cs.CL} }
Thanks to @thomwolf , @jplu , @lewtun for adding this dataset.