数据集:
gap
任务:
标记分类语言:
en计算机处理:
monolingual大小:
1K<n<10K语言创建人:
found批注创建人:
crowdsourced源数据集:
original预印本库:
arxiv:1810.05201许可:
license:unknownGAP is a gender-balanced dataset containing 8,908 coreference-labeled pairs of (ambiguous pronoun, antecedent name), sampled from Wikipedia and released by Google AI Language for the evaluation of coreference resolution in practical applications.
An example of 'validation' looks as follows.
{ "A": "aliquam ultrices sagittis", "A-coref": false, "A-offset": 208, "B": "elementum curabitur vitae", "B-coref": false, "B-offset": 435, "ID": "validation-1", "Pronoun": "condimentum mattis pellentesque", "Pronoun-offset": 948, "Text": "Lorem ipsum dolor", "URL": "sem fringilla ut" }
The data fields are the same among all splits.
defaultname | train | validation | test |
---|---|---|---|
default | 2000 | 454 | 2000 |
@article{webster-etal-2018-mind, title = "Mind the {GAP}: A Balanced Corpus of Gendered Ambiguous Pronouns", author = "Webster, Kellie and Recasens, Marta and Axelrod, Vera and Baldridge, Jason", journal = "Transactions of the Association for Computational Linguistics", volume = "6", year = "2018", address = "Cambridge, MA", publisher = "MIT Press", url = "https://aclanthology.org/Q18-1042", doi = "10.1162/tacl_a_00240", pages = "605--617", }
Thanks to @thomwolf , @patrickvonplaten , @otakumesi , @lewtun for adding this dataset.