Dataset Card for GENETAG

Named entity recognition (NER) is an important first step for text mining the biomedical literature. Evaluating the performance of biomedical NER systems is impossible without a standardized test corpus. The annotation of such a corpus for gene/protein name NER is a difficult process due to the complexity of gene/protein names. We describe the construction and annotation of GENETAG, a corpus of 20K MEDLINE® sentences for gene/protein NER. 15K GENETAG sentences were used for the BioCreAtIvE Task 1A Competition..

Citation Information

@article{Tanabe2005,
  author    = {Lorraine Tanabe and Natalie Xie and Lynne H Thom and Wayne Matten and W John Wilbur},
  title     = {{GENETAG}: a tagged corpus for gene/protein named entity recognition},
  journal   = {{BMC} Bioinformatics},
  volume    = {6},
  year      = {2005},
  url       = {https://doi.org/10.1186/1471-2105-6-S1-S3},
  doi       = {10.1186/1471-2105-6-s1-s3},
  biburl    = {},
  bibsource = {}
}

作者:

bigbio

数据集大小:

36.72 KB