数据集:

kor_ner

任务:

标记分类

子任务:

named-entity-recognition

语言:

计算机处理:

monolingual

大小:

1K<n<10K

语言创建人:

other

批注创建人:

expert-generated

源数据集:

original

许可:

mit

数据集介绍文件清单

中文

Dataset Card for KorNER

Dataset Summary

[More Information Needed]

Supported Tasks and Leaderboards

[More Information Needed]

Languages

[More Information Needed]

Dataset Structure

Data Instances

[More Information Needed]

Data Fields

Each row consists of the following fields:

text : The full text, as is
annot_text : Annotated text including POS-tagged information
tokens : An ordered list of tokens from the full text
pos_tags : Part-of-speech tags for each token
ner_tags : Named entity recognition tags for each token

Note that by design, the length of tokens , pos_tags , and ner_tags will always be identical.

pos_tags corresponds to the list below:

['SO', 'SS', 'VV', 'XR', 'VCP', 'JC', 'VCN', 'JKB', 'MM', 'SP', 'XSN', 'SL', 'NNP', 'NP', 'EP', 'JKQ', 'IC', 'XSA', 'EC', 'EF', 'SE', 'XPN', 'ETN', 'SH', 'XSV', 'MAG', 'SW', 'ETM', 'JKO', 'NNB', 'MAJ', 'NNG', 'JKV', 'JKC', 'VA', 'NR', 'JKG', 'VX', 'SF', 'JX', 'JKS', 'SN']

ner_tags correspond to the following:

["I", "O", "B_OG", "B_TI", "B_LC", "B_DT", "B_PS"]

The prefix B denotes the first item of a phrase, and an I denotes any non-initial word. In addition, OG represens an organization; TI , time; DT , date, and PS , person.

Data Splits

[More Information Needed]

Dataset Creation

Curation Rationale

[More Information Needed]

Source Data

Initial Data Collection and Normalization

[More Information Needed]

Who are the source language producers?

[More Information Needed]

Annotations

Annotation process

[More Information Needed]

Who are the annotators?

[More Information Needed]

Personal and Sensitive Information

[More Information Needed]

Considerations for Using the Data

Social Impact of Dataset

[More Information Needed]

Discussion of Biases

[More Information Needed]

Other Known Limitations

[More Information Needed]

Additional Information

Dataset Curators

[More Information Needed]

Licensing Information

[More Information Needed]

Citation Information

[More Information Needed]

Contributions

Thanks to @jaketae for adding this dataset.

作者:

佚名

数据集大小:

14.47 KB