数据集:
GEM/e2e_nlg
语言:
en计算机处理:
unknown语言创建人:
unknown批注创建人:
none源数据集:
original其他:
data-to-text许可:
cc-by-sa-4.0任务:
表格到文本You can find the main data card on the GEM Website .
The E2E NLG dataset is an English benchmark dataset for data-to-text models that verbalize a set of 2-9 key-value attribute pairs in the restaurant domain. The version used for GEM is the cleaned E2E NLG dataset, which filters examples with hallucinations and outputs that don't fully cover all input attributes.
You can load the dataset via:
import datasets data = datasets.load_dataset('GEM/e2e_nlg')
The data loader can be found here .
website paperFirst data release , Detailed E2E Challenge writeup , Cleaned E2E version
authorsJekaterina Novikova, Ondrej Dusek and Verena Rieser
First data release , Detailed E2E Challenge writeup , Cleaned E2E version
BibTex@inproceedings{e2e_cleaned, address = {Tokyo, Japan}, title = {Semantic {Noise} {Matters} for {Neural} {Natural} {Language} {Generation}}, url = {https://www.aclweb.org/anthology/W19-8652/}, booktitle = {Proceedings of the 12th {International} {Conference} on {Natural} {Language} {Generation} ({INLG} 2019)}, author = {Dušek, Ondřej and Howcroft, David M and Rieser, Verena}, year = {2019}, pages = {421--426}, }Contact Name
Ondrej Dusek
Contact Emailodusek@ufal.mff.cuni.cz
Has a Leaderboard?no
no
Covered DialectsDialect-specific data was not collected and the language is general British English.
Covered LanguagesEnglish
Whose Language?The original dataset was collected using the CrowdFlower (now Appen) platform using native English speakers (self-reported). No demographic information was provided, but the collection was geographically limited to English-speaking countries.
Licensecc-by-sa-4.0: Creative Commons Attribution Share Alike 4.0 International
Intended UseThe dataset was collected to test neural model on a very well specified realization task.
Primary TaskData-to-Text
Communicative GoalProducing a text informing/recommending a restaurant, given all and only the attributes specified on the input.
academic
Curation Organization(s)Heriot-Watt University
Dataset CreatorsJekaterina Novikova, Ondrej Dusek and Verena Rieser
FundingThis research received funding from the EPSRC projects DILiGENt (EP/M005429/1) and MaDrIgAL (EP/N017536/1).
Who added the Dataset to GEM?Simon Mille wrote the initial data card and Yacine Jernite the data loader. Sebastian Gehrmann migrated the data card to the v2 format and moved the data loader to the hub.
The data is in a CSV format, with the following fields:
There are additional fields ( fixed , orig_mr ) indicating whether the data was modified in the cleaning process and what was the original MR before cleaning, but these aren't used for NLG.
The MR has a flat structure -- attribute-value pairs are comma separated, with values enclosed in brackets (see example above). There are 8 attributes:
The same MR is often repeated multiple times with different synonymous references.
How were labels chosen?The source MRs were generated automatically at random from a set of valid attribute values. The labels were crowdsourced and are natural language
Example Instance{ "input": "name[Alimentum], area[riverside], familyFriendly[yes], near[Burger King]", "target": "Alimentum is a kids friendly place in the riverside area near Burger King." }Data Splits
MRs | Distinct MRs | References | |
---|---|---|---|
Training | 12,568 | 8,362 | 33,525 |
Development | 1,484 | 1,132 | 4,299 |
Test | 1,847 | 1,358 | 4,693 |
Total | 15,899 | 10,852 | 42,517 |
“Distinct MRs” are MRs that remain distinct even if restaurant/place names (attributes name , near ) are delexicalized, i.e., replaced with a placeholder.
Splitting CriteriaThe data are divided so that MRs in different splits do not overlap.
The E2E dataset is one of the largest limited-domain NLG datasets and is frequently used as a data-to-text generation benchmark. The E2E Challenge included 20 systems of very different architectures, with system outputs available for download.
Similar Datasetsyes
Unique Language Coverageno
Difference from other GEM datasetsThe dataset is much cleaner than comparable datasets, and it is also a relatively easy task, making for a straightforward evaluation.
Ability that the Dataset measuressurface realization.
yes
Additional Splits?yes
Split Information4 special test sets for E2E were added to the GEM evaluation suite.
Input length | Frequency English |
---|---|
2 | 5 |
3 | 120 |
4 | 389 |
5 | 737 |
6 | 1187 |
7 | 1406 |
8 | 774 |
9 | 73 |
10 | 2 |
Generalization and robustness
Surface realization.
MetricsBLEU , METEOR , ROUGE
Proposed EvaluationThe official evaluation script combines the MT-Eval and COCO Captioning libraries with the following metrics.
yes
Other Evaluation ApproachesMost previous results, including the shared task results, used the library provided by the dataset creators. The shared task also conducted a human evaluation using the following two criteria:
The shared task writeup has in-depth evaluations of systems ( https://www.sciencedirect.com/science/article/pii/S0885230819300919 )
The dataset was collected to showcase/test neural NLG models. It is larger and contains more lexical richness and syntactic variation than previous closed-domain NLG datasets.
Communicative GoalProducing a text informing/recommending a restaurant, given all and only the attributes specified on the input.
Sourced from Different Sourcesno
Crowdsourced
Where was it crowdsourced?Other crowdworker platform
Language ProducersHuman references describing the MRs were collected by crowdsourcing on the CrowdFlower (now Appen) platform, with either textual or pictorial MRs as a baseline. The pictorial MRs were used in 20% of cases -- these yield higher lexical variation but introduce noise.
Topics CoveredThe dataset is focused on descriptions of restaurants.
Data Validationvalidated by data curator
Data PreprocessingThere were basic checks (length, valid characters, repetition).
Was Data Filtered?algorithmically
Filter CriteriaThe cleaned version of the dataset which we are using in GEM was algorithmically filtered. They used regular expressions to match all human-generated references with a more accurate input when attributes were hallucinated or dropped. Additionally, train-test overlap stemming from the transformation was removed. As a result, this data is much cleaner than the original dataset but not perfect (about 20% of instances may have misaligned slots, compared to 40% of the original data.
none
Annotation Service?no
yes
Consent Policy DetailsSince a crowdsourcing platform was used, the involved raters waived their rights to the data and are aware that the produced annotations can be publicly released.
no PII
Justification for no PIIThe dataset is artificial and does not contain any description of people.
no
no
no
no
Are the Language Producers Representative of the Language?The source data is generated randomly, so it should not contain biases. The human references may be biased by the workers' demographic, but that was not investigated upon data collection.
open license - commercial use allowed
Copyright Restrictions on the Language Dataopen license - commercial use allowed
The cleaned version still has data points with hallucinated or omitted attributes.
Unsuited ApplicationsThe data only pertains to the restaurant domain and the included attributes. A model cannot be expected to handle other domains or attributes.