数据集:
bigbio/genetag
Named entity recognition (NER) is an important first step for text mining the biomedical literature. Evaluating the performance of biomedical NER systems is impossible without a standardized test corpus. The annotation of such a corpus for gene/protein name NER is a difficult process due to the complexity of gene/protein names. We describe the construction and annotation of GENETAG, a corpus of 20K MEDLINE® sentences for gene/protein NER. 15K GENETAG sentences were used for the BioCreAtIvE Task 1A Competition..
@article{Tanabe2005, author = {Lorraine Tanabe and Natalie Xie and Lynne H Thom and Wayne Matten and W John Wilbur}, title = {{GENETAG}: a tagged corpus for gene/protein named entity recognition}, journal = {{BMC} Bioinformatics}, volume = {6}, year = {2005}, url = {https://doi.org/10.1186/1471-2105-6-S1-S3}, doi = {10.1186/1471-2105-6-s1-s3}, biburl = {}, bibsource = {} }