This is a dataset of paraphrases created by ChatGPT.
Model based on this dataset is avaible: model
Generate 5 similar paraphrases for this question, show it like a numbered list without commentaries: {text}
This dataset is based on the Quora paraphrase question , texts from the SQUAD 2.0 and the CNN news dataset .
We generated 5 paraphrases for each sample, totally this dataset has about 420k data rows. You can make 30 rows from a row from each sample. In this way you can make 12.6 millions train pairs (420k rows with 5 paraphrases -> 6x5x420000 = 12.6 millions of bidirected or 6x5x420000/2 = 6.3 millions of unique pairs).
Data is based on OpenAI’s gpt-3.5-turbo, whose terms of use prohibit developing models that compete with OpenAI. So if you use this dataset to train a model, don't compete with OpenAI.
@inproceedings{chatgpt_paraphrases_dataset, author={Vladimir Vorobev, Maxim Kuznetsov}, title={ChatGPT paraphrases dataset}, year={2023} }