数据集:
hrenwac_para
任务:
翻译计算机处理:
translation大小:
10K<n<100K语言创建人:
found批注创建人:
no-annotation源数据集:
original许可:
cc-by-sa-3.0The hrenWaC corpus version 2.0 consists of parallel Croatian-English texts crawled from the .hr top-level domain for Croatia. The corpus was built with Spidextor ( https://github.com/abumatran/spidextor ), a tool that glues together the output of SpiderLing used for crawling and Bitextor used for bitext extraction. The accuracy of the extracted bitext on the segment level is around 80% and on the word level around 84%.
[More Information Needed]
Dataset is bilingual with Croatian and English languages.
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
Who are the source language producers?[More Information Needed]
[More Information Needed]
Who are the annotators?[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
Dataset is under the CC-BY-SA 3.0 license.
@misc{11356/1058, title = {Croatian-English parallel corpus {hrenWaC} 2.0}, author = {Ljube{\v s}i{\'c}, Nikola and Espl{\`a}-Gomis, Miquel and Ortiz Rojas, Sergio and Klubi{\v c}ka, Filip and Toral, Antonio}, url = {http://hdl.handle.net/11356/1058}, note = {Slovenian language resource repository {CLARIN}.{SI}}, copyright = {{CLARIN}.{SI} User Licence for Internet Corpora}, year = {2016} }
Thanks to @IvanZidov for adding this dataset.