数据集:
zhengyun21/PMC-Patients
PMC-Patients is a first-of-its-kind dataset consisting of 167k patient summaries extracted from case reports in PubMed Central (PMC), 3.1M patient-article relevance and 293k patient-patient similarity annotations defined by PubMed citation graph.
This is purely the patient summary dataset with relational annotations. For ReCDS benchmark, refer to this dataset
Based on PMC-Patients, we define two tasks to benchmark Retrieval-based Clinical Decision Support (ReCDS) systems: Patient-to-Article Retrieval (PAR) and Patient-to-Patient Retrieval (PPR). For details, please refer to our paper and leaderboard .
English (en).
This file contains all information about patients summaries in PMC-Patients, which is a list of dict with keys:
If you are interested in the collection of PMC-Patients and reproducing our baselines, please refer to this reporsitory .
If you find PMC-Patients helpful in your research, please cite our work by:
@misc{zhao2023pmcpatients, title={PMC-Patients: A Large-scale Dataset of Patient Summaries and Relations for Benchmarking Retrieval-based Clinical Decision Support Systems}, author={Zhengyun Zhao and Qiao Jin and Fangyuan Chen and Tuorui Peng and Sheng Yu}, year={2023}, eprint={2202.13876}, archivePrefix={arXiv}, primaryClass={cs.CL} }