数据集:
bigbio/ehr_rel
EHR-Rel is a novel open-source1 biomedical concept relatedness dataset consisting of 3630 concept pairs, six times more than the largest existing dataset. Instead of manually selecting and pairing concepts as done in previous work, the dataset is sampled from EHRs to ensure concepts are relevant for the EHR concept retrieval task. A detailed analysis of the concepts in the dataset reveals a far larger coverage compared to existing datasets.
@inproceedings{schulz-etal-2020-biomedical, title = {Biomedical Concept Relatedness {--} A large {EHR}-based benchmark}, author = {Schulz, Claudia and Levy-Kramer, Josh and Van Assel, Camille and Kepes, Miklos and Hammerla, Nils}, booktitle = {Proceedings of the 28th International Conference on Computational Linguistics}, month = {dec}, year = {2020}, address = {Barcelona, Spain (Online)}, publisher = {International Committee on Computational Linguistics}, url = {https://aclanthology.org/2020.coling-main.577}, doi = {10.18653/v1/2020.coling-main.577}, pages = {6565--6575}, }