数据集:
ruanchaves/rerelem
任务:
文本分类语言:
pt计算机处理:
monolingual大小:
1K<n<10K语言创建人:
found批注创建人:
expert-generated源数据集:
extended|haremThe ReRelEM dataset is designed for the detection and classification of relations between named entities in Portuguese text. It contains 2226 training, 701 validation, and 805 test instances. Each instance contains two sentences with two entities enclosed by the tags [E1] and [E2]. The dataset provides a fourfold relationship classification: identity, included-in, located-in, and other (which is detailed into twenty different relations).
It's important to note that, although we maintained more than 99% of the original instances, this is not a full representation of the original ReRelEM dataset. The dataset was split into train, validation, and test sets, after which 21 instances with relation types not included in the training set were dropped from the test set. Furthermore, 7 instances from the original dataset that had formatting errors and could not be resolved into post-processed records were also dropped.
An example data instance from the dataset:
{ "docid": "cver", "sentence1": "O PRESIDENTE Sarkozy abriu a Conferência de Dadores realizada em Paris com uma frase grandiloquente sobre a necessidade urgente de criar um Estado palestiniano no fim de 2008 . O Presidente ou é mentiroso ou finge-se ignorante, ou as duas coisas. Depois do falhanço esperado da cimeira de Annapolis , um modo de [E2]Condoleezza Rice[/E2] salvar a face e de a Administração | Administração americana e a Europa continuarem a fingir que estão interessadas em resolver o conflito israelo-palestiniano e de lavarem as mãos de tudo o resto, Sarkozy não pode ignorar que o momento para pronunciamentos débeis é o menos adequado. Tony Blair , depois de ter minado todo o processo de paz do Médio Oriente ao ordenar a invasão do Iraque de braço dado com [E1]Bush[/E1] , continua a emitir piedades deste género, e diz que está na altura de resolver o problema e que ele pode ser resolvido. Blair não sabe o que diz.", "sentence2": "nan", "label": "relacao_profissional", "same_text": true }
train | validation | test | |
---|---|---|---|
Instances | 2226 | 701 | 805 |
The dataset was divided in a manner that ensured sentences from the same document did not appear in more than one split.
@inproceedings{freitas2009relation, title={Relation detection between named entities: report of a shared task}, author={Freitas, Cl{\\'a}udia and Santos, Diana and Mota, Cristina and Oliveira, Hugo Gon{\\c{c}}alo and Carvalho, Paula}, booktitle={Proceedings of the Workshop on Semantic Evaluations: Recent Achievements and Future Directions (SEW-2009)}, pages={129--137}, year={2009} }
Thanks to @ruanchaves for adding this dataset.