数据集:
aquamuse
AQuaMuSe is a novel scalable approach to automatically mine dual query based multi-document summarization datasets for extractive and abstractive summaries using question answering dataset (Google Natural Questions) and large document corpora (Common Crawl)
This dataset contains versions of automatically generated datasets for abstractive and extractive query-based multi-document summarization as described in AQuaMuSe paper .
en : English
Example:
{ 'input_urls': ['https://boxofficebuz.com/person/19653-charles-michael-davis'], 'query': 'who is the actor that plays marcel on the originals', 'target': "In February 2013, it was announced that Davis was cast in a lead role on The CW's new show The Originals, a spinoff of The Vampire Diaries, centered on the Original Family as they move to New Orleans, where Davis' character (a vampire named Marcel) currently rules." }
input_urls : a list of string features.
List of URLs to input documents pointing to Common Crawl to be summarized.
Dependencies: Documents URLs references the Common Crawl June 2017 Archive .
query : a string feature.
Input query to be used as summarization context. This is derived from Natural Questions user queries.
target : a string feature
Summarization target, derived from Natural Questions long answers.
The dataset is automatically generated datasets for abstractive and extractive query-based multi-document summarization as described in AQuaMuSe paper .
[More Information Needed]
Who are the source language producers?[More Information Needed]
[More Information Needed]
Who are the annotators?[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
The dataset curator is sayalikulkarni , who is the contributor for the official GitHub repository for this dataset and also one of the authors of this dataset’s paper. As the account handles of other authors are not available currently who were also part of the curation of this dataset, the authors of the paper are mentioned here as follows, Sayali Kulkarni, Sheide Chammas, Wan Zhu, Fei Sha, and Eugene Ie.
[More Information Needed]
@misc{kulkarni2020aquamuse, title={AQuaMuSe: Automatically Generating Datasets for Query-Based Multi-Document Summarization}, author={Sayali Kulkarni and Sheide Chammas and Wan Zhu and Fei Sha and Eugene Ie}, year={2020}, eprint={2010.12694}, archivePrefix={arXiv}, primaryClass={cs.CL} }
Thanks to @Karthik-Bhaskar for adding this dataset.