tasksource/parade | ATYUN.COM 官网-人工智能教程资讯全方位服务平台

数据集:

tasksource/parade

任务:

句子相似度

文本分类

语言:

数据集介绍文件清单

中文

https://github.com/heyunh2015/PARADE_dataset

@inproceedings{he-etal-2020-parade,
    title = "{PARADE}: {A} {N}ew {D}ataset for {P}araphrase {I}dentification {R}equiring {C}omputer {S}cience {D}omain {K}nowledge",
    author = "He, Yun  and
      Wang, Zhuoer  and
      Zhang, Yin  and
      Huang, Ruihong  and
      Caverlee, James",
    booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.emnlp-main.611",
    doi = "10.18653/v1/2020.emnlp-main.611",
    pages = "7572--7582",
    abstract = "We present a new benchmark dataset called PARADE for paraphrase identification that requires specialized domain knowledge. PARADE contains paraphrases that overlap very little at the lexical and syntactic level but are semantically equivalent based on computer science domain knowledge, as well as non-paraphrases that overlap greatly at the lexical and syntactic level but are not semantically equivalent based on this domain knowledge. Experiments show that both state-of-the-art neural models and non-expert human annotators have poor performance on PARADE. For example, BERT after fine-tuning achieves an F1 score of 0.709, which is much lower than its performance on other paraphrase identification datasets. PARADE can serve as a resource for researchers interested in testing models that incorporate domain knowledge. We make our data and code freely available.",
}

作者:

tasksource

数据集大小:

2.33 MB