数据集:
assin2
任务:
文本分类语言:
pt计算机处理:
monolingual大小:
1K<n<10K语言创建人:
found批注创建人:
expert-generated源数据集:
original许可:
license:unknownThe ASSIN 2 corpus is composed of rather simple sentences. Following the procedures of SemEval 2014 Task 1. The training and validation data are composed, respectively, of 6,500 and 500 sentence pairs in Brazilian Portuguese, annotated for entailment and semantic similarity. Semantic similarity values range from 1 to 5, and text entailment classes are either entailment or none. The test data are composed of approximately 3,000 sentence pairs with the same annotation. All data were manually annotated.
[More Information Needed]
The language supported is Portuguese.
An example from the ASSIN 2 dataset looks as follows:
{ "entailment_judgment": 1, "hypothesis": "Uma criança está segurando uma pistola de água", "premise": "Uma criança risonha está segurando uma pistola de água e sendo espirrada com água", "relatedness_score": 4.5, "sentence_pair_id": 1 }
The data is split into train, validation and test set. The split sizes are as follow:
Train | Val | Test |
---|---|---|
6500 | 500 | 2448 |
[More Information Needed]
[More Information Needed]
Who are the source language producers?[More Information Needed]
[More Information Needed]
Who are the annotators?[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
@inproceedings{real2020assin, title={The assin 2 shared task: a quick overview}, author={Real, Livy and Fonseca, Erick and Oliveira, Hugo Goncalo}, booktitle={International Conference on Computational Processing of the Portuguese Language}, pages={406--412}, year={2020}, organization={Springer} }
Thanks to @jonatasgrosman for adding this dataset.