数据集:
tner/ontonotes5
Ontonotes5 NER dataset formatted in a part of TNER project.
An example of train looks as follows.
{ 'tags': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 5, 0, 0, 0, 0, 11, 12, 12, 12, 12, 0, 0, 7, 0, 0, 0, 0, 0], 'tokens': ['``', 'It', "'s", 'very', 'costly', 'and', 'time', '-', 'consuming', ',', "''", 'says', 'Phil', 'Rosen', ',', 'a', 'partner', 'in', 'Fleet', '&', 'Leasing', 'Management', 'Inc.', ',', 'a', 'Boston', 'car', '-', 'leasing', 'company', '.'] }
The label2id dictionary can be found at here .
{ "O": 0, "B-CARDINAL": 1, "B-DATE": 2, "I-DATE": 3, "B-PERSON": 4, "I-PERSON": 5, "B-NORP": 6, "B-GPE": 7, "I-GPE": 8, "B-LAW": 9, "I-LAW": 10, "B-ORG": 11, "I-ORG": 12, "B-PERCENT": 13, "I-PERCENT": 14, "B-ORDINAL": 15, "B-MONEY": 16, "I-MONEY": 17, "B-WORK_OF_ART": 18, "I-WORK_OF_ART": 19, "B-FAC": 20, "B-TIME": 21, "I-CARDINAL": 22, "B-LOC": 23, "B-QUANTITY": 24, "I-QUANTITY": 25, "I-NORP": 26, "I-LOC": 27, "B-PRODUCT": 28, "I-TIME": 29, "B-EVENT": 30, "I-EVENT": 31, "I-FAC": 32, "B-LANGUAGE": 33, "I-PRODUCT": 34, "I-ORDINAL": 35, "I-LANGUAGE": 36 }
name | train | validation | test |
---|---|---|---|
ontonotes5 | 59924 | 8528 | 8262 |
@inproceedings{hovy-etal-2006-ontonotes, title = "{O}nto{N}otes: The 90{\%} Solution", author = "Hovy, Eduard and Marcus, Mitchell and Palmer, Martha and Ramshaw, Lance and Weischedel, Ralph", booktitle = "Proceedings of the Human Language Technology Conference of the {NAACL}, Companion Volume: Short Papers", month = jun, year = "2006", address = "New York City, USA", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N06-2015", pages = "57--60", }