数据集:
imppres
任务:
文本分类语言:
en计算机处理:
monolingual大小:
1K<n<10K语言创建人:
machine-generated批注创建人:
machine-generated源数据集:
original许可:
cc-by-nc-4.0Over >25k semiautomatically generated sentence pairs illustrating well-studied pragmatic inference types. IMPPRES is an NLI dataset following the format of SNLI (Bowman et al., 2015), MultiNLI (Williams et al., 2018) and XNLI (Conneau et al., 2018), which was created to evaluate how well trained NLI models recognize several classes of presuppositions and scalar implicatures.
Natural Language Inference.
English.
The data consists of 2 configurations: implicature and presupposition. Each configuration consists of several different sub-datasets:
Pressupposition
Implicature
Each sentence type in IMPPRES is generated according to a template that specifies the linear order of the constituents in the sentence. The constituents are sampled from a vocabulary of over 3000 lexical items annotated with grammatical features needed to ensure wellformedness. We semiautomatically generate IMPPRES using a codebase developed by Warstadt et al. (2019a) and significantly expanded for the BLiMP dataset (Warstadt et al., 2019b).
Here is an instance of the raw presupposition data from any sub-dataset:
{ "sentence1": "All ten guys that proved to boast might have been divorcing.", "sentence2": "There are exactly ten guys that proved to boast.", "trigger": "modal", "presupposition": "positive", "gold_label": "entailment", "UID": "all_n_presupposition", "pairID": "9e", "paradigmID": 0 }
and the raw implicature data from any sub-dataset:
{ "sentence1": "That teenager couldn't yell.", "sentence2": "That teenager could yell.", "gold_label_log": "contradiction", "gold_label_prag": "contradiction", "spec_relation": "negation", "item_type": "control", "trigger": "modal", "lexemes": "can - have to" }
Presupposition
There is a slight mapping from the raw data fields in the presupposition sub-datasets and the fields appearing in the HuggingFace Datasets. When dealing with the HF Dataset, the following mapping of fields happens:
"premise" -> "sentence1" "hypothesis"-> "sentence2" "trigger" -> "trigger" or "Not_In_Example" "trigger1" -> "trigger1" or "Not_In_Example" "trigger2" -> "trigger2" or "Not_In_Example" "presupposition" -> "presupposition" or "Not_In_Example" "gold_label" -> "gold_label" "UID" -> "UID" "pairID" -> "pairID" "paradigmID" -> "paradigmID"
For the most part, the majority of the raw fields remain unchanged. However, when it comes to the various trigger fields, a new mapping was introduced. There are some examples in the dataset that only have the trigger field while other examples have the trigger1 and trigger2 field without the trigger or presupposition field. Nominally, most examples look like the example in the Data Instances section above. Occassionally, however, some examples will look like:
{ 'sentence1': 'Did that committee know when Lissa walked through the cafe?', 'sentence2': 'That committee knew when Lissa walked through the cafe.', 'trigger1': 'interrogative', 'trigger2': 'unembedded', 'gold_label': 'neutral', 'control_item': True, 'UID': 'question_presupposition', 'pairID': '1821n', 'paradigmID': 95 }
In this example, trigger1 and trigger2 appear and presupposition and trigger are removed. This maintains the length of the dictionary. To account for these examples, we have thus introduced the mapping above such that all examples accessed through the HF Datasets interface will have the same size as well as the same fields. In the event that an example does not have a value for one of the fields, the field is maintained in the dictionary but given a value of Not_In_Example .
To illustrate this point, the example given in the Data Instances section above would look like the following in the HF Datasets:
{ "premise": "All ten guys that proved to boast might have been divorcing.", "hypothesis": "There are exactly ten guys that proved to boast.", "trigger": "modal", "trigger1": "Not_In_Example", "trigger2": "Not_In_Example" "presupposition": "positive", "gold_label": "entailment", "UID": "all_n_presupposition", "pairID": "9e", "paradigmID": 0 }
Below is description of the fields:
"premise": The premise. "hypothesis": The hypothesis. "trigger": A detailed discussion of trigger types appears in the paper. "trigger1": A detailed discussion of trigger types appears in the paper. "trigger2": A detailed discussion of trigger types appears in the paper. "presupposition": positive or negative. "gold_label": Corresponds to entailment, contradiction, or neutral. "UID": Unique id. "pairID": Sentence pair ID. "paradigmID": ?
It is not immediately clear what the difference is between trigger , trigger1 , and trigger2 is or what the paradigmID refers to.
Implicature
The implicature fields only have the mapping below:
"premise" -> "sentence1" "hypothesis"-> "sentence2"
Here is a description of the fields:
"premise": The premise. "hypothesis": The hypothesis. "gold_label_log": Gold label for a logical reading of the sentence pair. "gold_label_prag": Gold label for a pragmatic reading of the sentence pair. "spec_relation": ? "item_type": ? "trigger": A detailed discussion of trigger types appears in the paper. "lexemes": ?
As the dataset was created to test already trained models, the only split that exists is for testing.
IMPPRES was created to evaluate how well trained NLI models recognize several classes of presuppositions and scalar implicatures.
[More Information Needed]
Who are the source language producers?[More Information Needed]
[More Information Needed]
Who are the annotators?The annotations were generated semi-automatically.
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
IMPPRES is available under a Creative Commons Attribution-NonCommercial 4.0 International Public License ("The License"). You may not use these files except in compliance with the License. Please see the LICENSE file for more information before you use the dataset.
@inproceedings{jeretic-etal-2020-natural, title = "Are Natural Language Inference Models {IMPPRESsive}? {L}earning {IMPlicature} and {PRESupposition}", author = "Jereti\v{c}, Paloma and Warstadt, Alex and Bhooshan, Suvrat and Williams, Adina", booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics", month = jul, year = "2020", address = "Online", publisher = "Association for Computational Linguistics", url = "https://www.aclweb.org/anthology/2020.acl-main.768", doi = "10.18653/v1/2020.acl-main.768", pages = "8690--8705", abstract = "Natural language inference (NLI) is an increasingly important task for natural language understanding, which requires one to infer whether a sentence entails another. However, the ability of NLI models to make pragmatic inferences remains understudied. We create an IMPlicature and PRESupposition diagnostic dataset (IMPPRES), consisting of 32K semi-automatically generated sentence pairs illustrating well-studied pragmatic inference types. We use IMPPRES to evaluate whether BERT, InferSent, and BOW NLI models trained on MultiNLI (Williams et al., 2018) learn to make pragmatic inferences. Although MultiNLI appears to contain very few pairs illustrating these inference types, we find that BERT learns to draw pragmatic inferences. It reliably treats scalar implicatures triggered by {``}some{''} as entailments. For some presupposition triggers like {``}only{''}, BERT reliably recognizes the presupposition as an entailment, even when the trigger is embedded under an entailment canceling operator like negation. BOW and InferSent show weaker evidence of pragmatic reasoning. We conclude that NLI training encourages models to learn some, but not all, pragmatic inferences.", }
Thanks to @aclifton314 for adding this dataset.