数据集:

turkish_ner

任务:

标记分类

子任务:

named-entity-recognition

语言:

计算机处理:

monolingual

大小:

100K<n<1M

语言创建人:

expert-generated

批注创建人:

machine-generated

源数据集:

original

预印本库:

arxiv:1702.02363

许可:

cc-by-4.0

数据集介绍文件清单

中文

Dataset Card for turkish_ner

Dataset Summary

Automatically annotated Turkish corpus for named entity recognition and text categorization using large-scale gazetteers. The constructed gazetteers contains approximately 300K entities with thousands of fine-grained entity types under 25 different domains.

Supported Tasks and Leaderboards

[Needs More Information]

Languages

Turkish

Dataset Structure

Data Instances

[More Information Needed]

Data Fields

[More Information Needed]

Data Splits

There's only the training set.

Dataset Creation

Curation Rationale

[More Information Needed]

Source Data

Initial Data Collection and Normalization

[More Information Needed]

Who are the source language producers?

[More Information Needed]

Annotations

Annotation process

[More Information Needed]

Who are the annotators?

[More Information Needed]

Personal and Sensitive Information

[More Information Needed]

Considerations for Using the Data

Social Impact of Dataset

[More Information Needed]

Discussion of Biases

[More Information Needed]

Other Known Limitations

[More Information Needed]

Additional Information

Dataset Curators

H. Bahadir Sahin, Caglar Tirkaz, Eray Yildiz, Mustafa Tolga Eren and Omer Ozan Sonmez

Licensing Information

Creative Commons Attribution 4.0 International

Citation Information

@InProceedings@article{DBLP:journals/corr/SahinTYES17, author = {H. Bahadir Sahin and Caglar Tirkaz and Eray Yildiz and Mustafa Tolga Eren and Omer Ozan Sonmez}, title = {Automatically Annotated Turkish Corpus for Named Entity Recognition and Text Categorization using Large-Scale Gazetteers}, journal = {CoRR}, volume = {abs/1702.02363}, year = {2017}, url = { http://arxiv.org/abs/1702.02363} , archivePrefix = {arXiv}, eprint = {1702.02363}, timestamp = {Mon, 13 Aug 2018 16:46:36 +0200}, biburl = { https://dblp.org/rec/journals/corr/SahinTYES17.bib} , bibsource = {dblp computer science bibliography, https://dblp.org} }

Contributions

Thanks to @merveenoyan for adding this dataset.

作者:

佚名

数据集大小:

15.16 KB