Manually paraphrased corpus in Spanish
This corpus is designed to assess the similarity between a pair of texts and to evaluate different similarity measures, both for whole documents or for individual sentences. It is built around the subject of a Spanish blog article related to Sushi . Several volunteers (undergraduate, graduate, and Ph.D. students) were asked to intentionally reformulate or paraphrase this article. The paraphrase of the article was carried out on two levels, according to the rules:
If you use the corpus please cite the following articles:
Gómez-Adorno H., Bel-Enguix G., Sierra G., Torres-Moreno JM., Martinez R., Serrano P. (2020) Evaluation of Similarity Measures in a Benchmark for Spanish Paraphrasing Detection. In: Martínez-Villaseñor L., Herrera-Alcántara O., Ponce H., Castro-Espinoza F.A. (eds) Advances in Computational Intelligence. MICAI 2020. Lecture Notes in Computer Science, vol 12469. Springer, Cham. https://doi.org/10.1007/978-3-030-60887-3_19
Castro, B., Sierra, G., Torres-Moreno, J.M., Da Cunha, I.: El discurso y la semántica como recursos para la detección de similitud textual. In: Proceedings of the III RST Meeting (8th Brazilian Symposium in Information and Human Language Technology, STIL 2011). Brazilian Computer Society, Cuiabá (2011)
The work was done with partial support of CONACYT project A1-S-27780 and UNAM-PAPIIT projects IA401219, TA100520, AG400119.