数据集:
squad_v2
任务:
问答语言:
en计算机处理:
monolingual大小:
100K<n<1M语言创建人:
crowdsourced批注创建人:
crowdsourced源数据集:
original预印本库:
arxiv:1606.05250许可:
cc-by-sa-4.0combines the 100,000 questions in SQuAD1.1 with over 50,000 unanswerable questions written adversarially by crowdworkers to look similar to answerable ones. To do well on SQuAD2.0, systems must not only answer questions when possible, but also determine when no answer is supported by the paragraph and abstain from answering.
An example of 'validation' looks as follows.
This example was too long and was cropped: { "answers": { "answer_start": [94, 87, 94, 94], "text": ["10th and 11th centuries", "in the 10th and 11th centuries", "10th and 11th centuries", "10th and 11th centuries"] }, "context": "\"The Normans (Norman: Nourmands; French: Normands; Latin: Normanni) were the people who in the 10th and 11th centuries gave thei...", "id": "56ddde6b9a695914005b9629", "question": "When were the Normans in Normandy?", "title": "Normans" }
The data fields are the same among all splits.
squad_v2name | train | validation |
---|---|---|
squad_v2 | 130319 | 11873 |
@article{2016arXiv160605250R, author = {{Rajpurkar}, Pranav and {Zhang}, Jian and {Lopyrev}, Konstantin and {Liang}, Percy}, title = "{SQuAD: 100,000+ Questions for Machine Comprehension of Text}", journal = {arXiv e-prints}, year = 2016, eid = {arXiv:1606.05250}, pages = {arXiv:1606.05250}, archivePrefix = {arXiv}, eprint = {1606.05250}, }
Thanks to @lewtun , @albertvillanova , @patrickvonplaten , @thomwolf for adding this dataset.