数据集:
wiki_qa
任务:
问答子任务:
open-domain-qa语言:
en计算机处理:
monolingual大小:
10K<n<100K语言创建人:
found批注创建人:
crowdsourced源数据集:
original许可:
otherWiki Question Answering corpus from Microsoft.
The WikiQA corpus is a publicly available set of question and sentence pairs, collected and annotated for research on open-domain question answering.
An example of 'train' looks as follows.
{ "answer": "Glacier caves are often called ice caves , but this term is properly used to describe bedrock caves that contain year-round ice.", "document_title": "Glacier cave", "label": 0, "question": "how are glacier caves formed?", "question_id": "Q1" }
The data fields are the same among all splits.
defaultname | train | validation | test |
---|---|---|---|
default | 20360 | 2733 | 6165 |
MICROSOFT RESEARCH DATA LICENSE AGREEMENT FOR MICROSOFT RESEARCH WIKIQA CORPUS
These license terms are an agreement between Microsoft Corporation (or based on where you live, one of its affiliates) and you. Please read them. They apply to the data associated with this license above, which includes the media on which you received it, if any. The terms also apply to any Microsoft:
DISTRIBUTION REQUIREMENTS: a. If you distribute the Dataset or any derivative works of the Dataset, you will distribute them under the same terms and conditions as in this Agreement, and you will not grant other rights to the Dataset or derivative works that are different from those provided by this Agreement. b. If you have created derivative works of the Dataset, and distribute such derivative works, you will cause the modified files to carry prominent notices so that recipients know that they are not receiving Page 1 of 3the original Dataset. Such notices must state: (i) that you have changed the Dataset; and (ii) the date of any changes.
DISTRIBUTION RESTRICTIONS. You may not: (a) alter any copyright, trademark or patent notice in the Dataset; (b) use Microsoft’s trademarks in a way that suggests your derivative works or modifications come from or are endorsed by Microsoft; (c) include the Dataset in malicious, deceptive or unlawful programs.
OWNERSHIP. Microsoft retains all right, title, and interest in and to any Dataset provided to you under this Agreement. You acquire no interest in the Dataset you may receive under the terms of this Agreement.
LICENSE TO MICROSOFT. Microsoft is granted back, without any restrictions or limitations, a non-exclusive, perpetual, irrevocable, royalty-free, assignable and sub-licensable license, to reproduce, publicly perform or display, use, modify, post, distribute, make and have made, sell and transfer your modifications to and/or derivative works of the Dataset, for any purpose.
FEEDBACK. If you give feedback about the Dataset to Microsoft, you give to Microsoft, without charge, the right to use, share and commercialize your feedback in any way and for any purpose. You also give to third parties, without charge, any patent rights needed for their products, technologies and services to use or interface with any specific parts of a Microsoft dataset or service that includes the feedback. You will not give feedback that is subject to a license that requires Microsoft to license its Dataset or documentation to third parties because we include your feedback in them. These rights survive this Agreement.
EXPORT RESTRICTIONS. The Dataset is subject to United States export laws and regulations. You must comply with all domestic and international export laws and regulations that apply to the Dataset. These laws include restrictions on destinations, end users and end use. For additional information, see www.microsoft.com/exporting .
ENTIRE AGREEMENT. This Agreement, and the terms for supplements, updates, Internet-based services and support services that you use, are the entire agreement for the Dataset.
SUPPORT SERVICES. Because this data is “as is,” we may not provide support services for it.
APPLICABLE LAW. a. United States. If you acquired the software in the United States, Washington state law governs the interpretation of this agreement and applies to claims for breach of it, regardless of conflict of laws principles. The laws of the state where you live govern all other claims, including claims under state consumer protection laws, unfair competition laws, and in tort. b. Outside the United States. If you acquired the software in any other country, the laws of that country apply.
LEGAL EFFECT. This Agreement describes certain legal rights. You may have other rights under the laws of your country. You may also have rights with respect to the party from whom you acquired the Dataset. This Agreement does not change your rights under the laws of your country if the laws of your country do not permit it to do so.
DISCLAIMER OF WARRANTY. The Dataset is licensed “as-is.” You bear the risk of using it. Microsoft gives no express warranties, guarantees or conditions. You may have additional consumer rights or statutory guarantees under your local laws which this agreement cannot change. To the extent permitted under your local laws, Microsoft excludes the implied warranties of merchantability, fitness for a particular purpose and non- infringement.
LIMITATION ON AND EXCLUSION OF REMEDIES AND DAMAGES. YOU CAN RECOVER FROM MICROSOFT AND ITS SUPPLIERS ONLY DIRECT DAMAGES UP TO U.S. $5.00. YOU CANNOT RECOVER ANY OTHER DAMAGES, INCLUDING CONSEQUENTIAL, LOST PROFITS, SPECIAL, INDIRECT OR INCIDENTAL DAMAGES.
This limitation applies to
It also applies even if Microsoft knew or should have known about the possibility of the damages. The above limitation or exclusion may not apply to you because your country may not allow the exclusion or limitation of incidental, consequential or other damages.
@inproceedings{yang-etal-2015-wikiqa, title = "{W}iki{QA}: A Challenge Dataset for Open-Domain Question Answering", author = "Yang, Yi and Yih, Wen-tau and Meek, Christopher", booktitle = "Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing", month = sep, year = "2015", address = "Lisbon, Portugal", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/D15-1237", doi = "10.18653/v1/D15-1237", pages = "2013--2018", }
Thanks to @patrickvonplaten , @mariamabarham , @lewtun , @thomwolf for adding this dataset.