数据集:
opus_dogc
任务:
翻译计算机处理:
translation大小:
1M<n<10M语言创建人:
expert-generated批注创建人:
no-annotation源数据集:
original许可:
cc0-1.0OPUS DOGC is a collection of documents from the Official Journal of the Government of Catalonia, in Catalan and Spanish languages, provided by Antoni Oliver Gonzalez from the Universitat Oberta de Catalunya.
[More Information Needed]
Dataset is multilingual with parallel text in:
[More Information Needed]
A data instance contains the following fields:
[More Information Needed]
[More Information Needed]
[More Information Needed]
Who are the source language producers?[More Information Needed]
[More Information Needed]
Who are the annotators?[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
Dataset is in the Public Domain under CC0 1.0 .
@inproceedings{tiedemann-2012-parallel, title = "Parallel Data, Tools and Interfaces in {OPUS}", author = {Tiedemann, J{\"o}rg}, booktitle = "Proceedings of the Eighth International Conference on Language Resources and Evaluation ({LREC}'12)", month = may, year = "2012", address = "Istanbul, Turkey", publisher = "European Language Resources Association (ELRA)", url = "http://www.lrec-conf.org/proceedings/lrec2012/pdf/463_Paper.pdf", pages = "2214--2218", abstract = "This paper presents the current status of OPUS, a growing language resource of parallel corpora and related tools. The focus in OPUS is to provide freely available data sets in various formats together with basic annotation to be useful for applications in computational linguistics, translation studies and cross-linguistic corpus studies. In this paper, we report about new data sets and their features, additional annotation tools and models provided from the website and essential interfaces and on-line services included in the project.", }
Thanks to @albertvillanova for adding this dataset.