数据集:
albertvillanova/sat
SAT (Style Augmented Translation) dataset contains roughly 3.3 million English-Vietnamese pairs of texts.
The languages in the dataset are:
{
'translation': {
'en': 'Rachel Pike : The science behind a climate headline',
'vi': 'Khoa học đằng sau một tiêu đề về khí hậu'
}
}
The dataset is split in "train" and "test".
| train | test | |
|---|---|---|
| Number of examples | 3359574 | 7221 |
[More Information Needed]
[More Information Needed]
Who are the source language producers?[More Information Needed]
[More Information Needed]
Who are the annotators?[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
Unknown.
Unknown.
Thanks to @albertvillanova for adding this dataset.