数据集:
id_panl_bppt
任务:
计算机处理:
translation大小:
10K<n<100K语言创建人:
expert-generated批注创建人:
expert-generated源数据集:
original许可:
Parallel Text Corpora for Multi-Domain Translation System created by BPPT (Indonesian Agency for the Assessment and Application of Technology) for PAN Localization Project (A Regional Initiative to Develop Local Language Computing Capacity in Asia). The dataset contains around 24K sentences divided in 4 difference topics (Economic, international, Science and Technology and Sport).
[More Information Needed]
Indonesian
[More Information Needed]
An example of the dataset:
{
'id': '0',
'topic': 0,
'translation':
{
'en': 'Minister of Finance Sri Mulyani Indrawati said that a sharp correction of the composite
inde x by up to 4 pct in Wedenesday?s trading was a mere temporary effect of regional factors like
decline in plantation commodity prices and the financial crisis in Thailand.',
'id': 'Menteri Keuangan Sri Mulyani mengatakan koreksi tajam pada Indeks Harga Saham Gabungan
IHSG hingga sekitar 4 persen dalam perdagangan Rabu 10/1 hanya efek sesaat dari faktor-faktor regional
seperti penurunan harga komoditi perkebunan dan krisis finansial di Thailand.'
}
}
The dataset is splitted in to train, validation and test sets.
[More Information Needed]
[More Information Needed]
Who are the source language producers?[More Information Needed]
[More Information Needed]
Who are the annotators?[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
@inproceedings{id_panl_bppt,
author = {PAN Localization - BPPT},
title = {Parallel Text Corpora, English Indonesian},
year = {2009},
url = {http://digilib.bppt.go.id/sampul/p92-budiono.pdf},
}
Thanks to @cahya-wirawan for adding this dataset.