数据集:
un_pc
任务:
翻译计算机处理:
multilingual大小:
10M<n<100M语言创建人:
found批注创建人:
found源数据集:
original许可:
license:unknownThis parallel corpus consists of manually translated UN documents from the last 25 years (1990 to 2014) for the six official UN languages, Arabic, Chinese, English, French, Russian, and Spanish. 6 languages, 15 bitexts
The underlying task is machine translation.
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
Who are the source language producers?[More Information Needed]
[More Information Needed]
Who are the annotators?[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
@inproceedings{ziemski-etal-2016-united, title = "The {U}nited {N}ations Parallel Corpus v1.0", author = "Ziemski, Micha{\\l} and Junczys-Dowmunt, Marcin and Pouliquen, Bruno", booktitle = "Proceedings of the Tenth International Conference on Language Resources and Evaluation ({LREC}'16)", month = may, year = "2016", address = "Portoro{\v{z}}, Slovenia", publisher = "European Language Resources Association (ELRA)", url = "https://www.aclweb.org/anthology/L16-1561", pages = "3530--3534", abstract = "This paper describes the creation process and statistics of the official United Nations Parallel Corpus, the first parallel corpus composed from United Nations documents published by the original data creator. The parallel corpus presented consists of manually translated UN documents from the last 25 years (1990 to 2014) for the six official UN languages, Arabic, Chinese, English, French, Russian, and Spanish. The corpus is freely available for download under a liberal license. Apart from the pairwise aligned documents, a fully aligned subcorpus for the six official UN languages is distributed. We provide baseline BLEU scores of our Moses-based SMT systems trained with the full data of language pairs involving English and for all possible translation directions of the six-way subcorpus.", }
Thanks to @patil-suraj for adding this dataset.