数据集:

bianet

任务:

翻译

计算机处理:

translation

语言创建人:

found

批注创建人:

found

源数据集:

original
中文

Dataset Card for [Dataset Name]

Dataset Summary

A parallel news corpus in Turkish, Kurdish and English; Bianet collects 3,214 Turkish articles with their sentence-aligned Kurdish or English translations from the Bianet online newspaper.

3 languages, 3 bitexts total number of files: 6 total number of tokens: 2.25M total number of sentence fragments: 0.14M

Supported Tasks and Leaderboards

[More Information Needed]

Languages

[More Information Needed]

Dataset Structure

Data Instances

[More Information Needed]

Data Fields

[More Information Needed]

Data Splits

[More Information Needed]

Dataset Creation

Curation Rationale

[More Information Needed]

Source Data

Initial Data Collection and Normalization

[More Information Needed]

Who are the source language producers?

[More Information Needed]

Annotations

Annotation process

[More Information Needed]

Who are the annotators?

[More Information Needed]

Personal and Sensitive Information

[More Information Needed]

Considerations for Using the Data

Social Impact of Dataset

[More Information Needed]

Discussion of Biases

[More Information Needed]

Other Known Limitations

[More Information Needed]

Additional Information

Dataset Curators

[More Information Needed]

Licensing Information

CC-BY-SA-4.0

Citation Information

@InProceedings{ATAMAN18.6, author = {Duygu Ataman}, title = {Bianet: A Parallel News Corpus in Turkish, Kurdish and English}, booktitle = {Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)}, year = {2018}, month = {may}, date = {7-12}, location = {Miyazaki, Japan}, editor = {Jinhua Du and Mihael Arcan and Qun Liu and Hitoshi Isahara}, publisher = {European Language Resources Association (ELRA)}, address = {Paris, France}, isbn = {979-10-95546-15-3}, language = {english} }

Contributions

Thanks to @param087 for adding this dataset.