数据集:
web_questions
任务:
子任务:
open-domain-qa语言:
计算机处理:
monolingual大小:
1K<n<10K语言创建人:
found批注创建人:
crowdsourced源数据集:
original许可:
This dataset consists of 6,642 question/answer pairs. The questions are supposed to be answerable by Freebase, a large knowledge graph. The questions are mostly centered around a single named entity. The questions are popular ones asked on the web (at least in 2013).
An example of 'train' looks as follows.
{
"answers": ["Jamaican Creole English Language", "Jamaican English"],
"question": "what does jamaican people speak?",
"url": "http://www.freebase.com/view/en/jamaica"
}
The data fields are the same among all splits.
default| name | train | test |
|---|---|---|
| default | 3778 | 2032 |
@inproceedings{berant-etal-2013-semantic,
title = "Semantic Parsing on {F}reebase from Question-Answer Pairs",
author = "Berant, Jonathan and
Chou, Andrew and
Frostig, Roy and
Liang, Percy",
booktitle = "Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing",
month = oct,
year = "2013",
address = "Seattle, Washington, USA",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/D13-1160",
pages = "1533--1544",
}
Thanks to @thomwolf , @mariamabarham , @lewtun for adding this dataset.