数据集:
zest
语言:
en计算机处理:
monolingual大小:
10K<n<100K语言创建人:
crowdsourced批注创建人:
crowdsourced源数据集:
original预印本库:
arxiv:2011.08115许可:
cc-by-4.0ZEST tests whether NLP systems can perform unseen tasks in a zero-shot way, given a natural language description of the task. It is an instantiation of our proposed framework "learning from task descriptions". The tasks include classification, typed entity extraction and relationship extraction, and each task is paired with 20 different annotated (input, output) examples. ZEST's structure allows us to systematically test whether models can generalize in five different ways.
A leaderboard is included with accepatbility metrics for each of the four generalization types outlined in the paper. The metrics are novel acceptability metrics also proposed by the authors.
The dataset is in English.
[More Information Needed]
[More Information Needed]
[More Information Needed]
To evaluate the ability of a model to generalize to unseen tasks based only on a task description in a zero-shot manner.
[More Information Needed]
Who are the source language producers?Mechanical Turk crowdsource workers.
[More Information Needed]
Who are the annotators?Mechanical Turk crowdsource workers.
[More Information Needed]
The dataset emphasizes a model's ability to generalize to unseen tasks with only a natural language description of the task. The long-term vision of this type of evaluation is to facilitate the creation of models which can perform arbitrary tasks with only a prompt from a non-technical user. This could broaden the frontier of what a user can ask something like a chatbot to do for them, but it is unclear how restrictions would be put in place to prevent users from prompting a system to perform unethical tasks.
[More Information Needed]
[More Information Needed]
[More Information Needed]
This dataset is licensed under CC BY 4.0 .
@inproceedings{weller-etal-2020-learning, title = "Learning from Task Descriptions", author = "Weller, Orion and Lourie, Nicholas and Gardner, Matt and Peters, Matthew", booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)", month = nov, year = "2020", address = "Online", publisher = "Association for Computational Linguistics", url = "https://www.aclweb.org/anthology/2020.emnlp-main.105", pages = "1361--1375", abstract = "Typically, machine learning systems solve new tasks by training on thousands of examples. In contrast, humans can solve new tasks by reading some instructions, with perhaps an example or two. To take a step toward closing this gap, we introduce a framework for developing NLP systems that solve new tasks after reading their descriptions, synthesizing prior work in this area. We instantiate this frame- work with a new English language dataset, ZEST, structured for task-oriented evaluation on unseen tasks. Formulating task descriptions as questions, we ensure each is general enough to apply to many possible inputs, thus comprehensively evaluating a model{'}s ability to solve each task. Moreover, the dataset{'}s structure tests specific types of systematic generalization. We find that the state-of-the-art T5 model achieves a score of 12% on ZEST, leaving a significant challenge for NLP researchers.", }
Thanks to @joeddav for adding this dataset.