数据集:
tner/tweebank_ner
TweeBank NER dataset formatted in a part of TNER project.
An example of train looks as follows.
{ 'tokens': ['RT', '@USER2362', ':', 'Farmall', 'Heart', 'Of', 'The', 'Holidays', 'Tabletop', 'Christmas', 'Tree', 'With', 'Lights', 'And', 'Motion', 'URL1087', '#Holiday', '#Gifts'], 'tags': [8, 8, 8, 2, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8] }
The label2id dictionary can be found at here .
{ "B-LOC": 0, "B-MISC": 1, "B-ORG": 2, "B-PER": 3, "I-LOC": 4, "I-MISC": 5, "I-ORG": 6, "I-PER": 7, "O": 8 }
name | train | validation | test |
---|---|---|---|
tweebank_ner | 1639 | 710 | 1201 |
@article{DBLP:journals/corr/abs-2201-07281, author = {Hang Jiang and Yining Hua and Doug Beeferman and Deb Roy}, title = {Annotating the Tweebank Corpus on Named Entity Recognition and Building {NLP} Models for Social Media Analysis}, journal = {CoRR}, volume = {abs/2201.07281}, year = {2022}, url = {https://arxiv.org/abs/2201.07281}, eprinttype = {arXiv}, eprint = {2201.07281}, timestamp = {Fri, 21 Jan 2022 13:57:15 +0100}, biburl = {https://dblp.org/rec/journals/corr/abs-2201-07281.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} }