数据集:
web_questions
任务:
问答子任务:
open-domain-qa语言:
en计算机处理:
monolingual大小:
1K<n<10K语言创建人:
found批注创建人:
crowdsourced源数据集:
original许可:
license:unknownThis dataset consists of 6,642 question/answer pairs. The questions are supposed to be answerable by Freebase, a large knowledge graph. The questions are mostly centered around a single named entity. The questions are popular ones asked on the web (at least in 2013).
An example of 'train' looks as follows.
{ "answers": ["Jamaican Creole English Language", "Jamaican English"], "question": "what does jamaican people speak?", "url": "http://www.freebase.com/view/en/jamaica" }
The data fields are the same among all splits.
defaultname | train | test |
---|---|---|
default | 3778 | 2032 |
@inproceedings{berant-etal-2013-semantic, title = "Semantic Parsing on {F}reebase from Question-Answer Pairs", author = "Berant, Jonathan and Chou, Andrew and Frostig, Roy and Liang, Percy", booktitle = "Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing", month = oct, year = "2013", address = "Seattle, Washington, USA", publisher = "Association for Computational Linguistics", url = "https://www.aclweb.org/anthology/D13-1160", pages = "1533--1544", }
Thanks to @thomwolf , @mariamabarham , @lewtun for adding this dataset.