数据集:
FreedomIntelligence/huatuo_consultation_qa
We collected data from a website for medical consultation , consisting of many online consultation records by medical experts. Each record is a QA pair: a patient raises a question and a medical doctor answers the question. The basic information of doctors (including name, hospital organization, and department) was recorded.
We directly crawl patient’s questions and doctor’s answers as QA pairs, getting 32,708,346 pairs. Subsequently, we removed the QA pairs containing special characters and removed the repeated pairs. Finally, we got 25,341,578 QA pairs.
Please note that for some reasons we cannot directly provide text data, so the answer part of our data set is url. If you want to use text data, you can refer to the other two parts of our open source datasets ( huatuo_encyclopedia_qa 、 huatuo_knowledge_graph_qa ), or use url for data collection.
....
@misc{li2023huatuo26m, title={Huatuo-26M, a Large-scale Chinese Medical QA Dataset}, author={Jianquan Li and Xidong Wang and Xiangbo Wu and Zhiyi Zhang and Xiaolong Xu and Jie Fu and Prayag Tiwari and Xiang Wan and Benyou Wang}, year={2023}, eprint={2305.01526}, archivePrefix={arXiv}, primaryClass={cs.CL} }