数据集:

mt_eng_vietnamese

任务:

翻译

语言:

en vi

计算机处理:

multilingual

大小:

100K<n<1M

语言创建人:

found

批注创建人:

found

源数据集:

original
中文

Dataset Card for mt_eng_vietnamese

Dataset Summary

Preprocessed Dataset from IWSLT'15 English-Vietnamese machine translation: English-Vietnamese.

Supported Tasks and Leaderboards

Machine Translation

Languages

English, Vietnamese

Dataset Structure

Data Instances

An example from the dataset:

{
  'translation': {
    'en': 'In 4 minutes , atmospheric chemist Rachel Pike provides a glimpse of the massive scientific effort behind the bold headlines on climate change , with her team -- one of thousands who contributed -- taking a risky flight over the rainforest in pursuit of data on a key molecule .', 
    'vi': 'Trong 4 phút , chuyên gia hoá học khí quyển Rachel Pike giới thiệu sơ lược về những nỗ lực khoa học miệt mài đằng sau những tiêu đề táo bạo về biến đổi khí hậu , cùng với đoàn nghiên cứu của mình -- hàng ngàn người đã cống hiến cho dự án này -- một chuyến bay mạo hiểm qua rừng già để tìm kiếm thông tin về một phân tử then chốt .'
    }
}

Data Fields

  • translation:
    • en: text in english
    • vi: text in vietnamese

Data Splits

train: 133318, validation: 1269, test: 1269

Dataset Creation

Curation Rationale

[More Information Needed]

Source Data

Initial Data Collection and Normalization

[More Information Needed]

Who are the source language producers?

[More Information Needed]

Annotations

Annotation process

[More Information Needed]

Who are the annotators?

[More Information Needed]

Personal and Sensitive Information

[More Information Needed]

Considerations for Using the Data

Social Impact of Dataset

[More Information Needed]

Discussion of Biases

[More Information Needed]

Other Known Limitations

[More Information Needed]

Additional Information

Dataset Curators

[More Information Needed]

Licensing Information

[More Information Needed]

Citation Information

@inproceedings{Luong-Manning:iwslt15,
        Address = {Da Nang, Vietnam}
        Author = {Luong, Minh-Thang  and Manning, Christopher D.},
        Booktitle = {International Workshop on Spoken Language Translation},
        Title = {Stanford Neural Machine Translation Systems for Spoken Language Domain},
        Year = {2015}}

Contributions

Thanks to @Nilanshrajput for adding this dataset.