数据集:
squad_v2
任务:
语言:
计算机处理:
monolingual大小:
100K<n<1M语言创建人:
crowdsourced批注创建人:
crowdsourced源数据集:
original预印本库:
arxiv:1606.05250许可:
将SQuAD1.1中的100,000个问题与由众包工人以类似可回答问题的方式撰写的50,000个无法回答的问题相结合。要在SQuAD2.0上表现良好,系统不仅必须在可能时回答问题,还必须确定段落不支持任何答案并放弃回答。
'validation'的示例如下所示。
This example was too long and was cropped:
{
"answers": {
"answer_start": [94, 87, 94, 94],
"text": ["10th and 11th centuries", "in the 10th and 11th centuries", "10th and 11th centuries", "10th and 11th centuries"]
},
"context": "\"The Normans (Norman: Nourmands; French: Normands; Latin: Normanni) were the people who in the 10th and 11th centuries gave thei...",
"id": "56ddde6b9a695914005b9629",
"question": "When were the Normans in Normandy?",
"title": "Normans"
}
所有拆分之间的数据字段相同。
squad_v2name | train | validation |
---|---|---|
squad_v2 | 130319 | 11873 |
@article{2016arXiv160605250R,
author = {{Rajpurkar}, Pranav and {Zhang}, Jian and {Lopyrev},
Konstantin and {Liang}, Percy},
title = "{SQuAD: 100,000+ Questions for Machine Comprehension of Text}",
journal = {arXiv e-prints},
year = 2016,
eid = {arXiv:1606.05250},
pages = {arXiv:1606.05250},
archivePrefix = {arXiv},
eprint = {1606.05250},
}
感谢 @lewtun 、 @albertvillanova 、 @patrickvonplaten 、 @thomwolf 添加此数据集。