数据集:
tner/fin
FIN NER dataset formatted in a part of TNER project. FIN dataset contains training (FIN5) and test (FIN3) only, so we randomly sample a half size of test instances from the training set to create validation set.
An example of train looks as follows.
{ "tags": [0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], "tokens": ["1", ".", "1", ".", "4", "Borrower", "engages", "in", "criminal", "conduct", "or", "is", "involved", "in", "criminal", "activities", ";"] }
The label2id dictionary can be found at here .
{ "O": 0, "B-PER": 1, "B-LOC": 2, "B-ORG": 3, "B-MISC": 4, "I-PER": 5, "I-LOC": 6, "I-ORG": 7, "I-MISC": 8 }
name | train | validation | test |
---|---|---|---|
fin | 1014 | 303 | 150 |
@inproceedings{salinas-alvarado-etal-2015-domain, title = "Domain Adaption of Named Entity Recognition to Support Credit Risk Assessment", author = "Salinas Alvarado, Julio Cesar and Verspoor, Karin and Baldwin, Timothy", booktitle = "Proceedings of the Australasian Language Technology Association Workshop 2015", month = dec, year = "2015", address = "Parramatta, Australia", url = "https://aclanthology.org/U15-1010", pages = "84--90", }