数据集:
csebuetnlp/BanglaParaphrase
任务:
语言:
计算机处理:
monolingual大小:
100K<n<1M语言创建人:
found批注创建人:
found源数据集:
original预印本库:
arxiv:2210.05109许可:
We present BanglaParaphrase, a high quality synthetic Bangla paraphrase dataset containing about 466k paraphrase pairs. The paraphrases ensures high quality by being semantically coherent and syntactically diverse.
from datasets import load_dataset
from datasets import load_dataset
ds = load_dataset("csebuetnlp/BanglaParaphrase")
One example from the train part of the dataset is given below in JSON format.
{
"source": "বেশিরভাগ সময় প্রকৃতির দয়ার ওপরেই বেঁচে থাকতেন উপজাতিরা।",
"target": "বেশিরভাগ সময়ই উপজাতিরা প্রকৃতির দয়ার উপর নির্ভরশীল ছিল।"
}
Dataset with train-dev-test example counts are given below:
| Language | ISO 639-1 Code | Train | Validation | Test |
|---|---|---|---|---|
| Bengali | bn | 419, 967 | 233, 31 | 233, 32 |
Contents of this repository are restricted to only non-commercial research purposes under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0) . Copyright of the dataset contents belongs to the original copyright holders.
@article{akil2022banglaparaphrase,
title={BanglaParaphrase: A High-Quality Bangla Paraphrase Dataset},
author={Akil, Ajwad and Sultana, Najrin and Bhattacharjee, Abhik and Shahriyar, Rifat},
journal={arXiv preprint arXiv:2210.05109},
year={2022}
}