数据集:
covid_qa_ucsd
任务:
问答子任务:
closed-domain-qa计算机处理:
monolingual批注创建人:
found源数据集:
original预印本库:
arxiv:2005.05442许可:
license:unknownCOVID-Dialogue-Dataset-English is an English medical dialogue dataset about COVID-19 and other types of pneumonia. Patients who are concerned that they may be infected by COVID-19 or other pneumonia consult doctors and doctors provide advice. There are 603 consultations.
COVID-Dialogue-Dataset-Chinese is a Chinese medical dialogue dataset about COVID-19 and other types of pneumonia. Patients who are concerned that they may be infected by COVID-19 or other pneumonia consult doctors and doctors provide advice. There are 1393 consultations.
The dataset is present as a single text file. COVID-Dialogue-Dataset-Chinese.txt for Chinese and COVID-Dialogue-Dataset-English.txt for English.
Used for QA tasks. There is also a COVID-19 dialogue generation model available for the Chinese Data. The pre-print and more information is available in this arxiv pre-print .
Monolingual. The datasets are in English (EN) and Chinese (ZH)
An example of dialogue is:
{ 'dialogue_id': 602, 'dialogue_url': 'https://www.healthtap.com/member/fg?page=/search/covid', 'dialogue_turns': [{'speaker': 'Patient', 'utterance': 'Can coronavirus symptoms be mild for some people versus severe? For example, could it just involve being very fatigued, low grade fever for a few days and not the extreme symptoms? Or is it always a full blown cold and struggle to breathe?Can coronavirus symptoms be mild for some people versus severe? For example, could it just involve being very fatigued, low grade fever for a few days and not the extreme symptoms? Or is it always a full blown cold and struggle to breathe?'}, {'speaker': 'Doctor', 'utterance': 'In brief: Symptoms vary. Some may have no symptoms at all. Some can be life threatening. Would you like to video or text chat with me?'}] }
The dataset is built from icliniq.com , healthcaremagic.com , healthtap.com and all copyrights of the data belong to these websites. (for English)
The dataset is built from Haodf.com and all copyrights of the data belong to Haodf.com . (for Chinese)
Each consultation consists of the below:
For generating the QA only the below fields have been considered:
These are arranged as below in the prepared dataset. Each item will be represented with these parameters.
There are no data splits on the original data
[More Information Needed]
[More Information Needed]
Who are the source language producers?[More Information Needed]
[More Information Needed]
Who are the annotators?[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
@article{ju2020CovidDialog, title={CovidDialog: Medical Dialogue Datasets about COVID-19}, author={Ju, Zeqian and Chakravorty, Subrato and He, Xuehai and Chen, Shu and Yang, Xingyi and Xie, Pengtao}, journal={ https://github.com/UCSD-AI4H/COVID-Dialogue} , year={2020} }
Thanks to @vrindaprabhu for adding this dataset.