数据集:
lc_quad
任务:
问答语言:
en计算机处理:
monolingual大小:
10K<n<100K语言创建人:
found批注创建人:
crowdsourced源数据集:
original许可:
cc-by-3.0LC-QuAD 2.0 is a Large Question Answering dataset with 30,000 pairs of question and its corresponding SPARQL query. The target knowledge base is Wikidata and DBpedia, specifically the 2018 version. Please see our paper for details about the dataset creation process and framework.
An example of 'train' looks as follows.
This example was too long and was cropped: { "NNQT_question": "What is the {periodical literature} for {mouthpiece} of {Delta Air Lines}", "paraphrased_question": "What is Delta Air Line's periodical literature mouthpiece?", "question": "What periodical literature does Delta Air Lines use as a moutpiece?", "sparql_dbpedia18": "\"select distinct ?obj where { ?statement <http://www.w3.org/1999/02/22-rdf-syntax-ns#subject> <http://wikidata.dbpedia.org/resou...", "sparql_wikidata": " select distinct ?obj where { wd:Q188920 wdt:P2813 ?obj . ?obj wdt:P31 wd:Q1002697 } ", "subgraph": "simple question right", "template": " <S P ?O ; ?O instanceOf Type>", "template_index": 65, "uid": 19719 }
The data fields are the same among all splits.
defaultname | train | test |
---|---|---|
default | 19293 | 4781 |
LC-QuAD 2.0 is licensed under a Creative Commons Attribution 3.0 Unported License .
@inproceedings{dubey2017lc2, title={LC-QuAD 2.0: A Large Dataset for Complex Question Answering over Wikidata and DBpedia}, author={Dubey, Mohnish and Banerjee, Debayan and Abdelkawi, Abdelrahman and Lehmann, Jens}, booktitle={Proceedings of the 18th International Semantic Web Conference (ISWC)}, year={2019}, organization={Springer} }
Thanks to @lewtun , @thomwolf , @patrickvonplaten for adding this dataset.