数据集:
GEM/xwikis
任务:
摘要生成计算机处理:
unknown语言创建人:
unknown批注创建人:
found源数据集:
original预印本库:
arxiv:2202.09583许可:
cc-by-sa-4.0You can find the main data card on the GEM Website .
The XWikis Corpus provides datasets with different language pairs and directions for cross-lingual and multi-lingual abstractive document summarisation.
You can load the dataset via:
import datasets data = datasets.load_dataset('GEM/xwikis')
The data loader can be found here .
website paperhttps://arxiv.org/abs/2202.09583
authorsLaura Perez-Beltrachini (University of Edinburgh)
https://arxiv.org/abs/2202.09583
BibTex@InProceedings{clads-emnlp, author = "Laura Perez-Beltrachini and Mirella Lapata", title = "Models and Datasets for Cross-Lingual Summarisation", booktitle = "Proceedings of The 2021 Conference on Empirical Methods in Natural Language Processing ", year = "2021", address = "Punta Cana, Dominican Republic", }Contact Name
Laura Perez-Beltrachini
Contact Emaillperez@ed.ac.uk
Has a Leaderboard?no
yes
Covered LanguagesGerman , English , French , Czech , Chinese
Licensecc-by-sa-4.0: Creative Commons Attribution Share Alike 4.0 International
Intended UseCross-lingual and Multi-lingual single long input document abstractive summarisation.
Primary TaskSummarization
Communicative GoalEntity descriptive summarisation, that is, generate a summary that conveys the most salient facts of a document related to a given entity.
academic
Dataset CreatorsLaura Perez-Beltrachini (University of Edinburgh)
Who added the Dataset to GEM?Laura Perez-Beltrachini (University of Edinburgh) and Ronald Cardenas (University of Edinburgh)
For each language pair and direction there exists a train/valid/test split. The test split is a sample of size 7k from the intersection of titles existing in the four languages (cs,fr,en,de). Train/valid are randomly split.
no
no
Additional Splits?no
ROUGE
Previous results available?yes
Other Evaluation ApproachesROUGE-1/2/L
no
Found
Where was it found?Single website
Data Validationother
Was Data Filtered?not filtered
found
Annotation Service?no
Annotation ValuesThe input documents have section structure information.
Any Quality Control?validated by another rater
Quality Control DetailsBilingual annotators assessed the content overlap of source document and target summaries.
no
no PII
no
no
no
no
public domain
Copyright Restrictions on the Language Datapublic domain