数据集:
metaeval/sts-companion
https://ixa2.si.ehu.eus/stswiki/index.php/STSbenchmark
The companion datasets to the STS Benchmark comprise the rest of the English datasets used in the STS tasks organized by us in the context of SemEval between 2012 and 2017. Authors collated two datasets, one with pairs of sentences related to machine translation evaluation. Another one with the rest of datasets, which can be used for domain adaptation studies.
@inproceedings{cer-etal-2017-semeval, title = "{S}em{E}val-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation", author = "Cer, Daniel and Diab, Mona and Agirre, Eneko and Lopez-Gazpio, I{\~n}igo and Specia, Lucia", booktitle = "Proceedings of the 11th International Workshop on Semantic Evaluation ({S}em{E}val-2017)", month = aug, year = "2017", address = "Vancouver, Canada", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/S17-2001", doi = "10.18653/v1/S17-2001", pages = "1--14", }