数据集:
sberquad
任务:
问答子任务:
extractive-qa语言:
ru计算机处理:
monolingual大小:
10K<n<100K批注创建人:
crowdsourced源数据集:
original预印本库:
arxiv:1912.09723许可:
license:unknownSber Question Answering Dataset (SberQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable. Russian original analogue presented in Sberbank Data Science Journey 2017.
[Needs More Information]
Russian
{ "context": "Первые упоминания о строении человеческого тела встречаются в Древнем Египте...", "id": 14754, "qas": [ { "id": 60544, "question": "Где встречаются первые упоминания о строении человеческого тела?", "answers": [{"answer_start": 60, "text": "в Древнем Египте"}], } ] }
name | train | validation | test |
---|---|---|---|
plain_text | 45328 | 5036 | 23936 |
[Needs More Information]
[Needs More Information]
Who are the source language producers?[Needs More Information]
[Needs More Information]
Who are the annotators?[Needs More Information]
[Needs More Information]
[Needs More Information]
[Needs More Information]
[Needs More Information]
[Needs More Information]
[Needs More Information]
@InProceedings{sberquad, doi = {10.1007/978-3-030-58219-7_1}, author = {Pavel Efimov and Andrey Chertok and Leonid Boytsov and Pavel Braslavski}, title = {SberQuAD -- Russian Reading Comprehension Dataset: Description and Analysis}, booktitle = {Experimental IR Meets Multilinguality, Multimodality, and Interaction}, year = {2020}, publisher = {Springer International Publishing}, pages = {3--15} }
Thanks to @alenusch for adding this dataset.