数据集:
squad_v2
任务:
问答语言:
en计算机处理:
monolingual大小:
100K<n<1M语言创建人:
crowdsourced批注创建人:
crowdsourced源数据集:
original预印本库:
arxiv:1606.05250许可:
cc-by-sa-4.0将SQuAD1.1中的100,000个问题与由众包工人以类似可回答问题的方式撰写的50,000个无法回答的问题相结合。要在SQuAD2.0上表现良好,系统不仅必须在可能时回答问题,还必须确定段落不支持任何答案并放弃回答。
'validation'的示例如下所示。
This example was too long and was cropped: { "answers": { "answer_start": [94, 87, 94, 94], "text": ["10th and 11th centuries", "in the 10th and 11th centuries", "10th and 11th centuries", "10th and 11th centuries"] }, "context": "\"The Normans (Norman: Nourmands; French: Normands; Latin: Normanni) were the people who in the 10th and 11th centuries gave thei...", "id": "56ddde6b9a695914005b9629", "question": "When were the Normans in Normandy?", "title": "Normans" }
所有拆分之间的数据字段相同。
squad_v2name | train | validation |
---|---|---|
squad_v2 | 130319 | 11873 |
@article{2016arXiv160605250R, author = {{Rajpurkar}, Pranav and {Zhang}, Jian and {Lopyrev}, Konstantin and {Liang}, Percy}, title = "{SQuAD: 100,000+ Questions for Machine Comprehension of Text}", journal = {arXiv e-prints}, year = 2016, eid = {arXiv:1606.05250}, pages = {arXiv:1606.05250}, archivePrefix = {arXiv}, eprint = {1606.05250}, }
感谢 @lewtun 、 @albertvillanova 、 @patrickvonplaten 、 @thomwolf 添加此数据集。