数据集:
DFKI-SLT/mobie
语言:
de计算机处理:
monolingual大小:
10K<n<100K语言创建人:
found批注创建人:
expert-generated源数据集:
original许可:
cc-by-4.0This script is for loading the MobIE dataset from https://github.com/dfki-nlp/mobie .
MobIE is a German-language dataset which is human-annotated with 20 coarse- and fine-grained entity types and entity linking information for geographically linkable entities. The dataset consists of 3,232 social media texts and traffic reports with 91K tokens, and contains 20.5K annotated entities, 13.1K of which are linked to a knowledge base. A subset of the dataset is human-annotated with seven mobility-related, n-ary relation types, while the remaining documents are annotated using a weakly-supervised labeling approach implemented with the Snorkel framework. The dataset combines annotations for NER, EL and RE, and thus can be used for joint and multi-task learning of these fundamental information extraction tasks.
This version of the dataset loader provides NER tags only. NER tags use the BIO tagging scheme.
For more details see https://github.com/dfki-nlp/mobie and https://aclanthology.org/2021.konvens-1.22/ .
German
An example of 'train' looks as follows.
{ 'id': 'http://www.ndr.de/nachrichten/verkehr/index.html#2@2016-05-04T21:02:14.000+02:00', 'tokens': ['Vorsicht', 'bitte', 'auf', 'der', 'A28', 'Leer', 'Richtung', 'Oldenburg', 'zwischen', 'Zwischenahner', 'Meer', 'und', 'Neuenkruge', 'liegen', 'Gegenstände', '!'], 'ner_tags': [0, 0, 0, 0, 19, 13, 0, 13, 0, 11, 12, 0, 11, 0, 0, 0] }
The data fields are the same among all splits.
Train | Dev | Test | |
---|---|---|---|
MobIE | 4785 | 1082 | 1210 |
@inproceedings{hennig-etal-2021-mobie, title = "{M}ob{IE}: A {G}erman Dataset for Named Entity Recognition, Entity Linking and Relation Extraction in the Mobility Domain", author = "Hennig, Leonhard and Truong, Phuc Tran and Gabryszak, Aleksandra", booktitle = "Proceedings of the 17th Conference on Natural Language Processing (KONVENS 2021)", month = "6--9 " # sep, year = "2021", address = {D{\"u}sseldorf, Germany}, publisher = "KONVENS 2021 Organizers", url = "https://aclanthology.org/2021.konvens-1.22", pages = "223--227", }