数据集:
xcsr
任务:
问答子任务:
multiple-choice-qa计算机处理:
multilingual大小:
1K<n<10K批注创建人:
crowdsourced预印本库:
arxiv:2106.06937许可:
mitTo evaluate multi-lingual language models (ML-LMs) for commonsense reasoning in a cross-lingual zero-shot transfer setting (X-CSR), i.e., training in English and test in other languages, we create two benchmark datasets, namely X-CSQA and X-CODAH. Specifically, we automatically translate the original CSQA and CODAH datasets, which only have English versions, to 15 other languages, forming development and test sets for studying X-CSR. As our goal is to evaluate different ML-LMs in a unified evaluation protocol for X-CSR, we argue that such translated examples, although might contain noise, can serve as a starting benchmark for us to obtain meaningful analysis, before more human-translated datasets will be available in the future.
https://inklab.usc.edu//XCSR/leaderboard
The total 16 languages for X-CSR: {en, zh, de, es, fr, it, jap, nl, pl, pt, ru, ar, vi, hi, sw, ur}.
An example of the X-CSQA dataset:
{ "id": "be1920f7ba5454ad", # an id shared by all languages "lang": "en", # one of the 16 language codes. "question": { "stem": "What will happen to your knowledge with more learning?", # question text "choices": [ {"label": "A", "text": "headaches" }, {"label": "B", "text": "bigger brain" }, {"label": "C", "text": "education" }, {"label": "D", "text": "growth" }, {"label": "E", "text": "knowing more" } ] }, "answerKey": "D" # hidden for test data. }
An example of the X-CODAH dataset:
{ "id": "b8eeef4a823fcd4b", # an id shared by all languages "lang": "en", # one of the 16 language codes. "question_tag": "o", # one of 6 question types "question": { "stem": " ", # always a blank as a dummy question "choices": [ {"label": "A", "text": "Jennifer loves her school very much, she plans to drop every courses."}, {"label": "B", "text": "Jennifer loves her school very much, she is never absent even when she's sick."}, {"label": "C", "text": "Jennifer loves her school very much, she wants to get a part-time job."}, {"label": "D", "text": "Jennifer loves her school very much, she quits school happily."} ] }, "answerKey": "B" # hidden for test data. }
To evaluate multi-lingual language models (ML-LMs) for commonsense reasoning in a cross-lingual zero-shot transfer setting (X-CSR), i.e., training in English and test in other languages, we create two benchmark datasets, namely X-CSQA and X-CODAH.
The details of the dataset construction, especially the translation procedures, can be found in section A of the appendix of the paper .
[Needs More Information]
Who are the source language producers?[Needs More Information]
[Needs More Information]
Who are the annotators?[Needs More Information]
[Needs More Information]
[Needs More Information]
[Needs More Information]
[Needs More Information]
[Needs More Information]
[Needs More Information]
# X-CSR @inproceedings{lin-etal-2021-common, title = "Common Sense Beyond {E}nglish: Evaluating and Improving Multilingual Language Models for Commonsense Reasoning", author = "Lin, Bill Yuchen and Lee, Seyeon and Qiao, Xiaoyang and Ren, Xiang", booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)", month = aug, year = "2021", address = "Online", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2021.acl-long.102", doi = "10.18653/v1/2021.acl-long.102", pages = "1274--1287", abstract = "Commonsense reasoning research has so far been limited to English. We aim to evaluate and improve popular multilingual language models (ML-LMs) to help advance commonsense reasoning (CSR) beyond English. We collect the Mickey corpus, consisting of 561k sentences in 11 different languages, which can be used for analyzing and improving ML-LMs. We propose Mickey Probe, a language-general probing task for fairly evaluating the common sense of popular ML-LMs across different languages. In addition, we also create two new datasets, X-CSQA and X-CODAH, by translating their English versions to 14 other languages, so that we can evaluate popular ML-LMs for cross-lingual commonsense reasoning. To improve the performance beyond English, we propose a simple yet effective method {---} multilingual contrastive pretraining (MCP). It significantly enhances sentence representations, yielding a large performance gain on both benchmarks (e.g., +2.7{\%} accuracy for X-CSQA over XLM-R{\_}L).", } # CSQA @inproceedings{Talmor2019commonsenseqaaq, address = {Minneapolis, Minnesota}, author = {Talmor, Alon and Herzig, Jonathan and Lourie, Nicholas and Berant, Jonathan}, booktitle = {Proceedings of the 2019 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)}, doi = {10.18653/v1/N19-1421}, pages = {4149--4158}, publisher = {Association for Computational Linguistics}, title = {CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge}, url = {https://www.aclweb.org/anthology/N19-1421}, year = {2019} } # CODAH @inproceedings{Chen2019CODAHAA, address = {Minneapolis, USA}, author = {Chen, Michael and D{'}Arcy, Mike and Liu, Alisa and Fernandez, Jared and Downey, Doug}, booktitle = {Proceedings of the 3rd Workshop on Evaluating Vector Space Representations for {NLP}}, doi = {10.18653/v1/W19-2008}, pages = {63--69}, publisher = {Association for Computational Linguistics}, title = {CODAH: An Adversarially-Authored Question Answering Dataset for Common Sense}, url = {https://www.aclweb.org/anthology/W19-2008}, year = {2019} }
Thanks to Bill Yuchen Lin , Seyeon Lee , Xiaoyang Qiao , Xiang Ren for adding this dataset.