数据集:
anli
任务:
文本分类语言:
en计算机处理:
monolingual大小:
100K<n<1M语言创建人:
found预印本库:
arxiv:1910.14599许可:
cc-by-nc-4.0The Adversarial Natural Language Inference (ANLI) is a new large-scale NLI benchmark dataset, The dataset is collected via an iterative, adversarial human-and-model-in-the-loop procedure. ANLI is much more difficult than its predecessors including SNLI and MNLI. It contains three rounds. Each round has train/dev/test splits.
English
An example of 'train_r2' looks as follows.
This example was too long and was cropped: { "hypothesis": "Idris Sultan was born in the first month of the year preceding 1994.", "label": 0, "premise": "\"Idris Sultan (born January 1993) is a Tanzanian Actor and comedian, actor and radio host who won the Big Brother Africa-Hotshot...", "reason": "", "uid": "ed5c37ab-77c5-4dbc-ba75-8fd617b19712" }
The data fields are the same among all splits.
plain_textname | train_r1 | dev_r1 | train_r2 | dev_r2 | train_r3 | dev_r3 | test_r1 | test_r2 | test_r3 |
---|---|---|---|---|---|---|---|---|---|
plain_text | 16946 | 1000 | 45460 | 1000 | 100459 | 1200 | 1000 | 1000 | 1200 |
cc-4 Attribution-NonCommercial
@InProceedings{nie2019adversarial, title={Adversarial NLI: A New Benchmark for Natural Language Understanding}, author={Nie, Yixin and Williams, Adina and Dinan, Emily and Bansal, Mohit and Weston, Jason and Kiela, Douwe}, booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics", year = "2020", publisher = "Association for Computational Linguistics", }
Thanks to @thomwolf , @easonnie , @lhoestq , @patrickvonplaten for adding this dataset.