数据集:

msr_zhen_translation_parity

中文

Dataset Card for msr_zhen_translation_parity

Translator Human Parity Data

  • Repository:
  • Paper:

Achieving Human Parity on Automatic Chinese to English News Translation

  • Leaderboard:
  • Point of Contact:

Dataset Summary

Human evaluation results and translation output for the Translator Human Parity Data release, as described in https://blogs.microsoft.com/ai/machine-translation-news-test-set-human-parity/

The Translator Human Parity Data release contains all human evaluation results and translations related to our paper "Achieving Human Parity on Automatic Chinese to English News Translation", published on March 14, 2018. We have released this data to

  • allow external validation of our claim of having achieved human parity
  • to foster future research by releasing two additional human references for the Reference-WMT test set.
  • The dataset includes:

  • two new references for Chinese-English language pair of WMT17, one based on human translation from scratch (Reference-HT), the other based on human post-editing (Reference-PE);

  • human parity translations generated by our research systems Combo-4, Combo-5, and Combo-6, as well as translation output from online machine translation service Online-A-1710, collected on October 16, 2017;

  • The data package provided with the study also includes (but not parsed and provided as workable features of this dataset) all data points collected in human evaluation campaigns.

    Supported Tasks and Leaderboards

    [More Information Needed]

    Languages

    This dataset contains 6 extra English translations to Chinese-English language pair of WMT17.

    Dataset Structure

    Data Instances

    [More Information Needed]

    Data Fields

    As mentioned in the summary, this dataset provides 6 extra English translations of Chinese-English language pair of WMT17.

    Data fields are named exactly like the associated paper for easier cross-referenceing.

    • Reference-HT : human translation from scrach.
    • Reference-PE : human post-editing.
    • Combo-4 , Combo-5 , Combo-6 : three translations by research systems.
    • Online-A-1710 : a translation from an anonymous online machine translation service.

    All data fields of a record are translations for the same Chinese source sentence.

    Data Splits

    [More Information Needed]

    Dataset Creation

    Curation Rationale

    [More Information Needed]

    Source Data

    Initial Data Collection and Normalization

    [More Information Needed]

    Who are the source language producers?

    [More Information Needed]

    Annotations

    Annotation process

    [More Information Needed]

    Who are the annotators?

    [More Information Needed]

    Personal and Sensitive Information

    [More Information Needed]

    Considerations for Using the Data

    Social Impact of Dataset

    [More Information Needed]

    Discussion of Biases

    [More Information Needed]

    Other Known Limitations

    [More Information Needed]

    Additional Information

    Dataset Curators

    [More Information Needed]

    Licensing Information

    [More Information Needed]

    Citation Information

    Citation information is available at this link Achieving Human Parity on Automatic Chinese to English News Translation

    Contributions

    Thanks to @leoxzhao for adding this dataset.