数据集:
tne
任务:
文本检索子任务:
document-retrieval语言:
en计算机处理:
monolingual大小:
1K<n<10K语言创建人:
found批注创建人:
crowdsourced源数据集:
original预印本库:
arxiv:2109.12085许可:
mitText-based NP Enrichment (TNE) is a natural language understanding (NLU) task, which focus on relations between noun phrases (NPs) that can be mediated via prepositions. The dataset contains 5,497 documents, annotated exhaustively with all possible links between the NPs in each document.
The main data comes from WikiNews, which is used for train/dev/test. We also collected an additional set of 509 documents to serve as out of distribution (OOD) data points, from the Book Corpus, IMDB reviews and Reddit.
The data contain both the main data for the TNE task, as well as coreference resolution data. There are two leaderboards for the TNE data, one for the standard test set, and another one for the OOD test set:
The text in the dataset is in English, as spoken in the different domains we include. The associated BCP-47 code is en .
The original files are in a jsonl format, containing a dictionary of a single document, in each line. Each document contain a different amount of labels, due to the different amount of NPs. The test and ood splits come without the annotated labels.
A document consists of:
The dataset is spread across four files, for the four different splits: train, dev, test and test_ood. Additional details on the data statistics can be found in the paper
TNE was build as a new task for language understanding, focusing on extracting relations between nouns, moderated by prepositions.
[Needs More Information]
Who are the source language producers?[Needs More Information]
[Needs More Information]
Who are the annotators?[Needs More Information]
[Needs More Information]
[Needs More Information]
[Needs More Information]
[Needs More Information]
The dataset was created by Yanai Elazar, Victoria Basmov, Yoav Goldberg, Reut Tsarfaty, during work done at Bar-Ilan University, and AI2.
The data is released under the MIT license.
@article{tne, author = {Elazar, Yanai and Basmov, Victoria and Goldberg, Yoav and Tsarfaty, Reut}, title = "{Text-based NP Enrichment}", journal = {Transactions of the Association for Computational Linguistics}, year = {2022}, }
Thanks to @yanaiela , who is also the first author of the paper, for adding this dataset.