数据集:
covid_qa_deepset
任务:
问答语言:
en计算机处理:
monolingual大小:
1K<n<10K语言创建人:
found批注创建人:
expert-generated源数据集:
original许可:
apache-2.0COVID-QA is a Question Answering dataset consisting of 2,019 question/answer pairs annotated by volunteer biomedical experts on scientific articles related to COVID-19. A total of 147 scientific articles from the CORD-19 dataset were annotated by 15 experts.
[More Information Needed]
The text in the dataset is in English.
What do the instances that comprise the dataset represent? Each represents a question, a context (document passage from the CORD19 dataset) and an answer.
How many instances are there in total? 2019 instances
What data does each instance consist of? Each instance is a question, a set of answers, and an id associated with each answer.
[More Information Needed]
The data was annotated in SQuAD style fashion, where each row contains:
data/COVID-QA.json : 2,019 question/answer pairs annotated by volunteer biomedical experts on scientific articles related to COVID-19.
[More Information Needed]
[More Information Needed]
The inital data collected comes from 147 scientific articles from the CORD-19 dataset. Question and answers were then annotated afterwards.
Who are the source language producers?[More Information Needed]
While annotators were volunteers, they were required to have at least a Master’s degree in biomedical sciences. The annotation team was led by a medical doctor (G.A.R.) who vetted the volunteer’s credentials and manually verified each question/answer pair produced. We used an existing, web-based annotation tool that had been created by deepset and is available at their Neural Search framework haystack .
Who are the annotators?The annotators are 15 volunteer biomedical experts on scientific articles related to COVID-19.
[More Information Needed]
The dataset aims to help build question answering models serving clinical and scientific researchers, public health authorities, and frontline workers. These QA systems can help them find answers and patterns in research papers by locating relevant answers to common questions from scientific articles.
[More Information Needed]
The listed authors in the homepage are maintaining/supporting the dataset.
[More Information Needed]
The Proto_qa dataset is licensed under the Apache License 2.0
@inproceedings{moller2020covid, title={COVID-QA: A Question Answering Dataset for COVID-19}, author={M{\"o}ller, Timo and Reina, Anthony and Jayakumar, Raghavan and Pietsch, Malte}, booktitle={Proceedings of the 1st Workshop on NLP for COVID-19 at ACL 2020}, year={2020} }
Thanks to @olinguyen for adding this dataset.