数据集:
boolq
任务:
语言:
计算机处理:
monolingual大小:
10K<n<100K语言创建人:
found批注创建人:
crowdsourced源数据集:
original许可:
BoolQ is a question answering dataset for yes/no questions containing 15942 examples. These questions are naturally occurring ---they are generated in unprompted and unconstrained settings. Each example is a triplet of (question, passage, answer), with the title of the page as optional additional context. The text-pair classification setup is similar to existing natural language inference tasks.
An example of 'validation' looks as follows.
This example was too long and was cropped:
{
"answer": false,
"passage": "\"All biomass goes through at least some of these steps: it needs to be grown, collected, dried, fermented, distilled, and burned...",
"question": "does ethanol take more energy make that produces"
}
The data fields are the same among all splits.
default| name | train | validation |
|---|---|---|
| default | 9427 | 3270 |
BoolQ is released under the Creative Commons Share-Alike 3.0 license.
@inproceedings{clark2019boolq,
title = {BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions},
author = {Clark, Christopher and Lee, Kenton and Chang, Ming-Wei, and Kwiatkowski, Tom and Collins, Michael, and Toutanova, Kristina},
booktitle = {NAACL},
year = {2019},
}
Thanks to @lewtun , @lhoestq , @thomwolf , @patrickvonplaten , @albertvillanova for adding this dataset.