数据集:
bigbio/ehr_rel
EHR-Rel is a novel open-source1 biomedical concept relatedness dataset consisting of 3630 concept pairs, six times more than the largest existing dataset. Instead of manually selecting and pairing concepts as done in previous work, the dataset is sampled from EHRs to ensure concepts are relevant for the EHR concept retrieval task. A detailed analysis of the concepts in the dataset reveals a far larger coverage compared to existing datasets.
@inproceedings{schulz-etal-2020-biomedical,
    title = {Biomedical Concept Relatedness {--} A large {EHR}-based benchmark},
    author = {Schulz, Claudia  and
      Levy-Kramer, Josh  and
      Van Assel, Camille  and
      Kepes, Miklos  and
      Hammerla, Nils},
    booktitle = {Proceedings of the 28th International Conference on Computational Linguistics},
    month = {dec},
    year = {2020},
    address = {Barcelona, Spain (Online)},
    publisher = {International Committee on Computational Linguistics},
    url = {https://aclanthology.org/2020.coling-main.577},
    doi = {10.18653/v1/2020.coling-main.577},
    pages = {6565--6575},
    }