数据集:
boolq
任务:
文本分类语言:
en计算机处理:
monolingual大小:
10K<n<100K语言创建人:
found批注创建人:
crowdsourced源数据集:
original许可:
cc-by-sa-3.0BoolQ is a question answering dataset for yes/no questions containing 15942 examples. These questions are naturally occurring ---they are generated in unprompted and unconstrained settings. Each example is a triplet of (question, passage, answer), with the title of the page as optional additional context. The text-pair classification setup is similar to existing natural language inference tasks.
An example of 'validation' looks as follows.
This example was too long and was cropped: { "answer": false, "passage": "\"All biomass goes through at least some of these steps: it needs to be grown, collected, dried, fermented, distilled, and burned...", "question": "does ethanol take more energy make that produces" }
The data fields are the same among all splits.
defaultname | train | validation |
---|---|---|
default | 9427 | 3270 |
BoolQ is released under the Creative Commons Share-Alike 3.0 license.
@inproceedings{clark2019boolq, title = {BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions}, author = {Clark, Christopher and Lee, Kenton and Chang, Ming-Wei, and Kwiatkowski, Tom and Collins, Michael, and Toutanova, Kristina}, booktitle = {NAACL}, year = {2019}, }
Thanks to @lewtun , @lhoestq , @thomwolf , @patrickvonplaten , @albertvillanova for adding this dataset.