数据集:
DFKI-SLT/few-nerd
任务:
标记分类语言:
en计算机处理:
monolingual大小:
100K<n<1M语言创建人:
found批注创建人:
expert-generated源数据集:
extended|wikipedia许可:
cc-by-sa-4.0This script is for loading the Few-NERD dataset from https://ningding97.github.io/fewnerd/ .
Few-NERD is a large-scale, fine-grained manually annotated named entity recognition dataset, which contains 8 coarse-grained types, 66 fine-grained types, 188,200 sentences, 491,711 entities, and 4,601,223 tokens. Three benchmark tasks are built, one is supervised (Few-NERD (SUP)) and the other two are few-shot (Few-NERD (INTRA) and Few-NERD (INTER)).
NER tags use the IO tagging scheme. The original data uses a 2-column CoNLL-style format, with empty lines to separate sentences. DOCSTART information is not provided since the sentences are randomly ordered.
For more details see https://ningding97.github.io/fewnerd/ and https://aclanthology.org/2021.acl-long.248/ .
English
Size of downloaded dataset files:
Size of the generated dataset:
Total amount of disk used: 366.8 MB
An example of 'train' looks as follows.
{ 'id': '1', 'tokens': ['It', 'starred', 'Hicks', "'s", 'wife', ',', 'Ellaline', 'Terriss', 'and', 'Edmund', 'Payne', '.'], 'ner_tags': [0, 0, 7, 0, 0, 0, 7, 7, 0, 7, 7, 0], 'fine_ner_tags': [0, 0, 51, 0, 0, 0, 50, 50, 0, 50, 50, 0] }
The data fields are the same among all splits.
Task | Train | Dev | Test |
---|---|---|---|
SUP | 131767 | 18824 | 37648 |
INTRA | 99519 | 19358 | 44059 |
INTER | 130112 | 18817 | 14007 |
@inproceedings{ding-etal-2021-nerd, title = "Few-{NERD}: A Few-shot Named Entity Recognition Dataset", author = "Ding, Ning and Xu, Guangwei and Chen, Yulin and Wang, Xiaobin and Han, Xu and Xie, Pengjun and Zheng, Haitao and Liu, Zhiyuan", booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)", month = aug, year = "2021", address = "Online", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2021.acl-long.248", doi = "10.18653/v1/2021.acl-long.248", pages = "3198--3213", }