数据集:

ruanchaves/rerelem

语言:

pt

计算机处理:

monolingual

大小:

1K<n<10K

语言创建人:

found

批注创建人:

expert-generated

源数据集:

extended|harem
中文

Dataset Card for ReRelEM

Dataset Summary

The ReRelEM dataset is designed for the detection and classification of relations between named entities in Portuguese text. It contains 2226 training, 701 validation, and 805 test instances. Each instance contains two sentences with two entities enclosed by the tags [E1] and [E2]. The dataset provides a fourfold relationship classification: identity, included-in, located-in, and other (which is detailed into twenty different relations).

It's important to note that, although we maintained more than 99% of the original instances, this is not a full representation of the original ReRelEM dataset. The dataset was split into train, validation, and test sets, after which 21 instances with relation types not included in the training set were dropped from the test set. Furthermore, 7 instances from the original dataset that had formatting errors and could not be resolved into post-processed records were also dropped.

Supported Tasks and Leaderboards

  • Relation extraction: The primary task of this dataset is to classify relations between named entities.

Languages

  • Portuguese

Dataset Structure

Data Instances

An example data instance from the dataset:

{
    "docid": "cver",
    "sentence1": "O PRESIDENTE Sarkozy abriu a Conferência de Dadores realizada em Paris com uma frase grandiloquente sobre a necessidade urgente de criar um Estado palestiniano no fim de 2008 . O Presidente ou é mentiroso ou finge-se ignorante, ou as duas coisas. Depois do falhanço esperado da cimeira de Annapolis , um modo de [E2]Condoleezza Rice[/E2] salvar a face e de a Administração | Administração americana e a Europa continuarem a fingir que estão interessadas em resolver o conflito israelo-palestiniano e de lavarem as mãos de tudo o resto, Sarkozy não pode ignorar que o momento para pronunciamentos débeis é o menos adequado. Tony Blair , depois de ter minado todo o processo de paz do Médio Oriente ao ordenar a invasão do Iraque de braço dado com [E1]Bush[/E1] , continua a emitir piedades deste género, e diz que está na altura de resolver o problema e que ele pode ser resolvido. Blair não sabe o que diz.",
    "sentence2": "nan",
    "label": "relacao_profissional",
    "same_text": true
}

Data Fields

  • docid : Document ID of both sentences (sentence1 and sentence2)
  • sentence1 : The first sentence with an entity span enclosed by the tags [E1] and [/E1]
  • sentence2 : The second sentence with an entity span enclosed by the tags [E2] and [/E2]
  • label : The type of relation between the entities
  • same_text : True if both entity spans appear in the same sentence. If True, sentence2 will be empty.

Data Splits

train validation test
Instances 2226 701 805

The dataset was divided in a manner that ensured sentences from the same document did not appear in more than one split.

Citation Information

@inproceedings{freitas2009relation,
  title={Relation detection between named entities: report of a shared task},
  author={Freitas, Cl{\\'a}udia and Santos, Diana and Mota, Cristina and Oliveira, Hugo Gon{\\c{c}}alo and Carvalho, Paula},
  booktitle={Proceedings of the Workshop on Semantic Evaluations: Recent Achievements and Future Directions (SEW-2009)},
  pages={129--137},
  year={2009}
}

Contributions

Thanks to @ruanchaves for adding this dataset.