数据集:
bigbio/n2c2_2018_track1
Track 1 of the 2018 National NLP Clinical Challenges shared tasks focused on identifying which patients in a corpus of longitudinal medical records meet and do not meet identified selection criteria.
This shared task aimed to determine whether NLP systems could be trained to identify if patients met or did not meet a set of selection criteria taken from real clinical trials. The selected criteria required measurement detection ( “Any HbA1c value between 6.5 and 9.5%”), inference (“Use of aspirin to prevent myocardial infarction”), temporal reasoning (“Diagnosis of ketoacidosis in the past year”), and expert judgment to assess (“Major diabetes-related complication”). For the corpus, we used the dataset of American English, longitudinal clinical narratives from the 2014 i2b2/UTHealth shared task 4.
The final selected 13 selection criteria are as follows:
The training consists of 202 patient records with document-level annotations, 10 records with textual spans indicating annotator’s evidence for their annotations while test set contains 86.
Note:
@article{DBLP:journals/jamia/StubbsFSHU19,
author = {
Amber Stubbs and
Michele Filannino and
Ergin Soysal and
Samuel Henry and
Ozlem Uzuner
},
title = {Cohort selection for clinical trials: n2c2 2018 shared task track 1},
journal = {J. Am. Medical Informatics Assoc.},
volume = {26},
number = {11},
pages = {1163--1171},
year = {2019},
url = {https://doi.org/10.1093/jamia/ocz163},
doi = {10.1093/jamia/ocz163},
timestamp = {Mon, 15 Jun 2020 16:56:11 +0200},
biburl = {https://dblp.org/rec/journals/jamia/StubbsFSHU19.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}