数据集:
squad
任务:
子任务:
extractive-qa语言:
计算机处理:
monolingual大小:
10K<n<100K批注创建人:
crowdsourced源数据集:
extended|wikipedia预印本库:
arxiv:1606.05250许可:
Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable.
An example of 'train' looks as follows.
{
"answers": {
"answer_start": [1],
"text": ["This is a test text"]
},
"context": "This is a test context.",
"id": "1",
"question": "Is this a test?",
"title": "train test"
}
The data fields are the same among all splits.
plain_text| name | train | validation |
|---|---|---|
| plain_text | 87599 | 10570 |
@article{2016arXiv160605250R,
author = {{Rajpurkar}, Pranav and {Zhang}, Jian and {Lopyrev},
Konstantin and {Liang}, Percy},
title = "{SQuAD: 100,000+ Questions for Machine Comprehension of Text}",
journal = {arXiv e-prints},
year = 2016,
eid = {arXiv:1606.05250},
pages = {arXiv:1606.05250},
archivePrefix = {arXiv},
eprint = {1606.05250},
}
Thanks to @lewtun , @albertvillanova , @patrickvonplaten , @thomwolf for adding this dataset.