数据集:

msr_zhen_translation_parity

任务:

翻译

语言:

计算机处理:

monolingual translation

大小:

1K<n<10K

语言创建人:

expert-generated machine-generated

批注创建人:

no-annotation

源数据集:

extended|other-newstest2017

许可:

ms-pl

数据集介绍文件清单

中文

Dataset Card for msr_zhen_translation_parity

Translator Human Parity Data

Repository:
Paper:

Achieving Human Parity on Automatic Chinese to English News Translation

Leaderboard:
Point of Contact:

Dataset Summary

Human evaluation results and translation output for the Translator Human Parity Data release, as described in https://blogs.microsoft.com/ai/machine-translation-news-test-set-human-parity/

The Translator Human Parity Data release contains all human evaluation results and translations related to our paper "Achieving Human Parity on Automatic Chinese to English News Translation", published on March 14, 2018. We have released this data to

allow external validation of our claim of having achieved human parity

to foster future research by releasing two additional human references for the Reference-WMT test set.

The dataset includes:

two new references for Chinese-English language pair of WMT17, one based on human translation from scratch (Reference-HT), the other based on human post-editing (Reference-PE);

human parity translations generated by our research systems Combo-4, Combo-5, and Combo-6, as well as translation output from online machine translation service Online-A-1710, collected on October 16, 2017;

The data package provided with the study also includes (but not parsed and provided as workable features of this dataset) all data points collected in human evaluation campaigns.

Supported Tasks and Leaderboards

[More Information Needed]

Languages

This dataset contains 6 extra English translations to Chinese-English language pair of WMT17.

Dataset Structure

Data Instances

[More Information Needed]

Data Fields

As mentioned in the summary, this dataset provides 6 extra English translations of Chinese-English language pair of WMT17.

Data fields are named exactly like the associated paper for easier cross-referenceing.

Reference-HT : human translation from scrach.
Reference-PE : human post-editing.
Combo-4 , Combo-5 , Combo-6 : three translations by research systems.
Online-A-1710 : a translation from an anonymous online machine translation service.

All data fields of a record are translations for the same Chinese source sentence.

Data Splits

[More Information Needed]

Dataset Creation

Curation Rationale

[More Information Needed]

Source Data

Initial Data Collection and Normalization

[More Information Needed]

Who are the source language producers?

[More Information Needed]

Annotations

Annotation process

[More Information Needed]

Who are the annotators?

[More Information Needed]

Personal and Sensitive Information

[More Information Needed]

Considerations for Using the Data

Social Impact of Dataset

[More Information Needed]

Discussion of Biases

[More Information Needed]

Other Known Limitations

[More Information Needed]

Additional Information

Dataset Curators

[More Information Needed]

Licensing Information

[More Information Needed]

Citation Information

Citation information is available at this link Achieving Human Parity on Automatic Chinese to English News Translation

Contributions

Thanks to @leoxzhao for adding this dataset.

作者:

佚名

数据集大小:

13.43 KB