数据集:
bigbio/nlmchem
NLM-Chem corpus consists of 150 full-text articles from the PubMed Central Open Access dataset, comprising 67 different chemical journals, aiming to cover a general distribution of usage of chemical names in the biomedical literature. Articles were selected so that human annotation was most valuable (meaning that they were rich in bio-entities, and current state-of-the-art named entity recognition systems disagreed on bio-entity recognition.
@Article{islamaj2021nlm, title={NLM-Chem, a new resource for chemical entity recognition in PubMed full text literature}, author={Islamaj, Rezarta and Leaman, Robert and Kim, Sun and Kwon, Dongseop and Wei, Chih-Hsuan and Comeau, Donald C and Peng, Yifan and Cissel, David and Coss, Cathleen and Fisher, Carol and others}, journal={Scientific Data}, volume={8}, number={1}, pages={1--12}, year={2021}, publisher={Nature Publishing Group} }