数据集:
billsum
许可:
预印本库:
arxiv:1910.00523源数据集:
original批注创建人:
found语言创建人:
found大小:
10K<n<100K计算机处理:
monolingual语言:
任务:
BillSum, summarization of US Congressional and California state bills.
There are several features:
An example of 'train' looks as follows.
{
"summary": "some summary",
"text": "some text.",
"title": "An act to amend Section xxx."
}
The data fields are the same among all splits.
default| name | train | ca_test | test |
|---|---|---|---|
| default | 18949 | 1237 | 3269 |
The data consists of three parts: US training bills, US test bills and California test bills. The US bills were collected from the Govinfo service provided by the United States Government Publishing Office (GPO) under CC0-1.0 license. The California, bills from the 2015-2016 session are available from the legislature’s website .
Who are the source language producers?
@misc{kornilova2019billsum,
title={BillSum: A Corpus for Automatic Summarization of US Legislation},
author={Anastassia Kornilova and Vlad Eidelman},
year={2019},
eprint={1910.00523},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Thanks to @thomwolf , @jplu , @lewtun for adding this dataset.