数据集:
DFKI-SLT/kbp37
任务:
文本分类语言:
en计算机处理:
monolingual大小:
10K<n<100K语言创建人:
found批注创建人:
other源数据集:
extended|other预印本库:
arxiv:1508.01006许可:
otherKBP37 is a revision of MIML-RE annotation dataset, provided by Gabor Angeli et al. (2014). They use both the 2010 and 2013 KBP official document collections, as well as a July 2013 dump of Wikipedia as the text corpus for annotation. There are 33811 sentences been annotated. Zhang and Wang made several refinements:
KBP37 contains 18 directional relations and an additional ' no_relation ' relation, resulting in 37 relation classes.
Note:
The language data in KBP37 is in English (BCP-47 en)
{ "id": "0", "sentence": "<e1> Thom Yorke </e1> of <e2> Radiohead </e2> has included the + for many of his signature distortion sounds using a variety of guitars to achieve various tonal options .", "relation": 27 }kbp37_formatted
{ "id": "1", "token": ["Leland", "High", "School", "is", "a", "public", "high", "school", "located", "in", "the", "Almaden", "Valley", "in", "San", "Jose", "California", "USA", "in", "the", "San", "Jose", "Unified", "School", "District", "."], "e1_start": 0, "e1_end": 3, "e2_start": 14, "e2_end": 16, "relation": 3 }
{"no_relation": 0, "org:alternate_names(e1,e2)": 1, "org:alternate_names(e2,e1)": 2, "org:city_of_headquarters(e1,e2)": 3, "org:city_of_headquarters(e2,e1)": 4, "org:country_of_headquarters(e1,e2)": 5, "org:country_of_headquarters(e2,e1)": 6, "org:founded(e1,e2)": 7, "org:founded(e2,e1)": 8, "org:founded_by(e1,e2)": 9, "org:founded_by(e2,e1)": 10, "org:members(e1,e2)": 11, "org:members(e2,e1)": 12, "org:stateorprovince_of_headquarters(e1,e2)": 13, "org:stateorprovince_of_headquarters(e2,e1)": 14, "org:subsidiaries(e1,e2)": 15, "org:subsidiaries(e2,e1)": 16, "org:top_members/employees(e1,e2)": 17, "org:top_members/employees(e2,e1)": 18, "per:alternate_names(e1,e2)": 19, "per:alternate_names(e2,e1)": 20, "per:cities_of_residence(e1,e2)": 21, "per:cities_of_residence(e2,e1)": 22, "per:countries_of_residence(e1,e2)": 23, "per:countries_of_residence(e2,e1)": 24, "per:country_of_birth(e1,e2)": 25, "per:country_of_birth(e2,e1)": 26, "per:employee_of(e1,e2)": 27, "per:employee_of(e2,e1)": 28, "per:origin(e1,e2)": 29, "per:origin(e2,e1)": 30, "per:spouse(e1,e2)": 31, "per:spouse(e2,e1)": 32, "per:stateorprovinces_of_residence(e1,e2)": 33, "per:stateorprovinces_of_residence(e2,e1)": 34, "per:title(e1,e2)": 35, "per:title(e2,e1)": 36}kbp37_formatted
Train | Dev | Test | |
---|---|---|---|
kbp37 | 15917 | 1724 | 3405 |
kbp37_formatted | 15807 | 1714 | 3379 |
@article{DBLP:journals/corr/ZhangW15a, author = {Dongxu Zhang and Dong Wang}, title = {Relation Classification via Recurrent Neural Network}, journal = {CoRR}, volume = {abs/1508.01006}, year = {2015}, url = {http://arxiv.org/abs/1508.01006}, eprinttype = {arXiv}, eprint = {1508.01006}, timestamp = {Fri, 04 Nov 2022 18:37:50 +0100}, biburl = {https://dblp.org/rec/journals/corr/ZhangW15a.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} }
Thanks to @phucdev for adding this dataset.