数据集:
scielo
任务:
翻译计算机处理:
multilingual大小:
100K<n<1M语言创建人:
found批注创建人:
found源数据集:
original预印本库:
arxiv:1905.01852许可:
license:unknownA parallel corpus of full-text scientific articles collected from Scielo database in the following languages:English, Portuguese and Spanish. The corpus is sentence aligned for all language pairs, as well as trilingual aligned for a small subset of sentences. Alignment was carried out using the Hunalign algorithm.
The underlying task is machine translation.
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
Who are the source language producers?[More Information Needed]
[More Information Needed]
Who are the annotators?[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
@inproceedings{soares2018large, title={A Large Parallel Corpus of Full-Text Scientific Articles}, author={Soares, Felipe and Moreira, Viviane and Becker, Karin}, booktitle={Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC-2018)}, year={2018} }
Thanks to @patil-suraj for adding this dataset.