数据集:
xquad
任务:
问答子任务:
extractive-qa计算机处理:
multilingual语言创建人:
expert-generated批注创建人:
expert-generated源数据集:
extended|squad预印本库:
arxiv:1910.11856许可:
cc-by-sa-4.0XQuAD (Cross-lingual Question Answering Dataset) is a benchmark dataset for evaluating cross-lingual question answering performance. The dataset consists of a subset of 240 paragraphs and 1190 question-answer pairs from the development set of SQuAD v1.1 (Rajpurkar et al., 2016) together with their professional translations into ten languages: Spanish, German, Greek, Russian, Turkish, Arabic, Vietnamese, Thai, Chinese, and Hindi. Consequently, the dataset is entirely parallel across 11 languages.
An example of 'validation' looks as follows.
This example was too long and was cropped: { "answers": { "answer_start": [527], "text": ["136"] }, "context": "\"Die Verteidigung der Panthers gab nur 308 Punkte ab und belegte den sechsten Platz in der Liga, während sie die NFL mit 24 Inte...", "id": "56beb4343aeaaa14008c925c", "question": "Wie viele Sacks erzielte Jared Allen in seiner Karriere?" }xquad.de
An example of 'validation' looks as follows.
This example was too long and was cropped: { "answers": { "answer_start": [527], "text": ["136"] }, "context": "\"Die Verteidigung der Panthers gab nur 308 Punkte ab und belegte den sechsten Platz in der Liga, während sie die NFL mit 24 Inte...", "id": "56beb4343aeaaa14008c925c", "question": "Wie viele Sacks erzielte Jared Allen in seiner Karriere?" }xquad.el
An example of 'validation' looks as follows.
This example was too long and was cropped: { "answers": { "answer_start": [527], "text": ["136"] }, "context": "\"Die Verteidigung der Panthers gab nur 308 Punkte ab und belegte den sechsten Platz in der Liga, während sie die NFL mit 24 Inte...", "id": "56beb4343aeaaa14008c925c", "question": "Wie viele Sacks erzielte Jared Allen in seiner Karriere?" }xquad.en
An example of 'validation' looks as follows.
This example was too long and was cropped: { "answers": { "answer_start": [527], "text": ["136"] }, "context": "\"Die Verteidigung der Panthers gab nur 308 Punkte ab und belegte den sechsten Platz in der Liga, während sie die NFL mit 24 Inte...", "id": "56beb4343aeaaa14008c925c", "question": "Wie viele Sacks erzielte Jared Allen in seiner Karriere?" }xquad.es
An example of 'validation' looks as follows.
This example was too long and was cropped: { "answers": { "answer_start": [527], "text": ["136"] }, "context": "\"Die Verteidigung der Panthers gab nur 308 Punkte ab und belegte den sechsten Platz in der Liga, während sie die NFL mit 24 Inte...", "id": "56beb4343aeaaa14008c925c", "question": "Wie viele Sacks erzielte Jared Allen in seiner Karriere?" }
The data fields are the same among all splits.
xquad.arname | validation |
---|---|
xquad.ar | 1190 |
xquad.de | 1190 |
xquad.el | 1190 |
xquad.en | 1190 |
xquad.es | 1190 |
@article{Artetxe:etal:2019, author = {Mikel Artetxe and Sebastian Ruder and Dani Yogatama}, title = {On the cross-lingual transferability of monolingual representations}, journal = {CoRR}, volume = {abs/1910.11856}, year = {2019}, archivePrefix = {arXiv}, eprint = {1910.11856} }
Thanks to @lewtun , @patrickvonplaten , @thomwolf for adding this dataset.