数据集:
id_nergrit_corpus
任务:
标记分类语言:
id计算机处理:
monolingual大小:
10K<n<100K语言创建人:
expert-generated批注创建人:
expert-generated源数据集:
original许可:
otherNergrit Corpus is a dataset collection of Indonesian Named Entity Recognition, Statement Extraction, and Sentiment Analysis developed by PT Gria Inovasi Teknologi (GRIT) .
[More Information Needed]
Indonesian
A data point consists of sentences seperated by empty line and tab-seperated tokens and tags.
{'id': '0', 'tokens': ['Gubernur', 'Bank', 'Indonesia', 'menggelar', 'konferensi', 'pers'], 'ner_tags': [9, 28, 28, 38, 38, 38], }
[More Information Needed]
The ner_tags correspond to this list:
"B-CRD", "B-DAT", "B-EVT", "B-FAC", "B-GPE", "B-LAN", "B-LAW", "B-LOC", "B-MON", "B-NOR", "B-ORD", "B-ORG", "B-PER", "B-PRC", "B-PRD", "B-QTY", "B-REG", "B-TIM", "B-WOA", "I-CRD", "I-DAT", "I-EVT", "I-FAC", "I-GPE", "I-LAN", "I-LAW", "I-LOC", "I-MON", "I-NOR", "I-ORD", "I-ORG", "I-PER", "I-PRC", "I-PRD", "I-QTY", "I-REG", "I-TIM", "I-WOA", "O",
The ner_tags have the same format as in the CoNLL shared task: a B denotes the first item of a phrase and an I any non-initial word. The dataset contains 19 following entities
'CRD': Cardinal 'DAT': Date 'EVT': Event 'FAC': Facility 'GPE': Geopolitical Entity 'LAW': Law Entity (such as Undang-Undang) 'LOC': Location 'MON': Money 'NOR': Political Organization 'ORD': Ordinal 'ORG': Organization 'PER': Person 'PRC': Percent 'PRD': Product 'QTY': Quantity 'REG': Religion 'TIM': Time 'WOA': Work of Art 'LAN': LanguageSentiment Analysis
The ner_tags correspond to this list:
"B-NEG", "B-NET", "B-POS", "I-NEG", "I-NET", "I-POS", "O",Statement Extraction
The ner_tags correspond to this list:
"B-BREL", "B-FREL", "B-STAT", "B-WHO", "I-BREL", "I-FREL", "I-STAT", "I-WHO", "O"
The ner_tags have the same format as in the CoNLL shared task: a B denotes the first item of a phrase and an I any non-initial word.
The dataset is splitted in to train, validation and test sets.
[More Information Needed]
[More Information Needed]
Who are the source language producers?[More Information Needed]
[More Information Needed]
Who are the annotators?The annotators are listed in the Nergrit Corpus repository
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
Thanks to @cahya-wirawan for adding this dataset.