数据集:
sick
任务:
文本分类语言:
en计算机处理:
monolingual大小:
1K<n<10K语言创建人:
crowdsourced批注创建人:
crowdsourced许可:
cc-by-nc-sa-3.0Shared and internationally recognized benchmarks are fundamental for the development of any computational system. We aim to help the research community working on compositional distributional semantic models (CDSMs) by providing SICK (Sentences Involving Compositional Knowldedge), a large size English benchmark tailored for them. SICK consists of about 10,000 English sentence pairs that include many examples of the lexical, syntactic and semantic phenomena that CDSMs are expected to account for, but do not require dealing with other aspects of existing sentential data sets (idiomatic multiword expressions, named entities, telegraphic language) that are not within the scope of CDSMs. By means of crowdsourcing techniques, each pair was annotated for two crucial semantic tasks: relatedness in meaning (with a 5-point rating scale as gold score) and entailment relation between the two elements (with three possible gold labels: entailment, contradiction, and neutral). The SICK data set was used in SemEval-2014 Task 1, and it freely available for research purposes.
[Needs More Information]
The dataset is in English.
Example instance:
{ "entailment_AB": "A_neutral_B", "entailment_BA": "B_neutral_A", "label": 1, "id": "1", "relatedness_score": 4.5, "sentence_A": "A group of kids is playing in a yard and an old man is standing in the background", "sentence_A_dataset": "FLICKR", "sentence_A_original": "A group of children playing in a yard, a man in the background.", "sentence_B": "A group of boys in a yard is playing and a man is standing in the background", "sentence_B_dataset": "FLICKR", "sentence_B_original": "A group of children playing in a yard, a man in the background." }
Train Trial Test 4439 495 4906
[Needs More Information]
[Needs More Information]
Who are the source language producers?[Needs More Information]
[Needs More Information]
Who are the annotators?[Needs More Information]
[Needs More Information]
[Needs More Information]
[Needs More Information]
[Needs More Information]
[Needs More Information]
[Needs More Information]
@inproceedings{marelli-etal-2014-sick, title = "A {SICK} cure for the evaluation of compositional distributional semantic models", author = "Marelli, Marco and Menini, Stefano and Baroni, Marco and Bentivogli, Luisa and Bernardi, Raffaella and Zamparelli, Roberto", booktitle = "Proceedings of the Ninth International Conference on Language Resources and Evaluation ({LREC}'14)", month = may, year = "2014", address = "Reykjavik, Iceland", publisher = "European Language Resources Association (ELRA)", url = "http://www.lrec-conf.org/proceedings/lrec2014/pdf/363_Paper.pdf", pages = "216--223", }
Thanks to @calpt for adding this dataset.