数据集:

zest

任务:

问答

标记分类

子任务:

closed-domain-qa extractive-qa

语言:

计算机处理:

monolingual

大小:

10K<n<100K

语言创建人:

crowdsourced

批注创建人:

crowdsourced

源数据集:

original

预印本库:

arxiv:2011.08115

其他:

output-structure yes-no-qa

许可:

cc-by-4.0

数据集介绍文件清单

中文

Dataset Card for "ZEST: ZEroShot learning from Task descriptions"

Dataset Summary

ZEST tests whether NLP systems can perform unseen tasks in a zero-shot way, given a natural language description of the task. It is an instantiation of our proposed framework "learning from task descriptions". The tasks include classification, typed entity extraction and relationship extraction, and each task is paired with 20 different annotated (input, output) examples. ZEST's structure allows us to systematically test whether models can generalize in five different ways.

Supported Tasks and Leaderboards

A leaderboard is included with accepatbility metrics for each of the four generalization types outlined in the paper. The metrics are novel acceptability metrics also proposed by the authors.

Languages

The dataset is in English.

Dataset Structure

Data Instances

[More Information Needed]

Data Fields

[More Information Needed]

Data Splits

[More Information Needed]

Dataset Creation

Curation Rationale

To evaluate the ability of a model to generalize to unseen tasks based only on a task description in a zero-shot manner.

Source Data

Initial Data Collection and Normalization

[More Information Needed]

Who are the source language producers?

Mechanical Turk crowdsource workers.

Annotations

Annotation process

[More Information Needed]

Who are the annotators?

Mechanical Turk crowdsource workers.

Personal and Sensitive Information

[More Information Needed]

Considerations for Using the Data

Social Impact of Dataset

The dataset emphasizes a model's ability to generalize to unseen tasks with only a natural language description of the task. The long-term vision of this type of evaluation is to facilitate the creation of models which can perform arbitrary tasks with only a prompt from a non-technical user. This could broaden the frontier of what a user can ask something like a chatbot to do for them, but it is unclear how restrictions would be put in place to prevent users from prompting a system to perform unethical tasks.

Discussion of Biases

[More Information Needed]

Other Known Limitations

[More Information Needed]

Additional Information

Dataset Curators

[More Information Needed]

Licensing Information

This dataset is licensed under CC BY 4.0 .

Citation Information

@inproceedings{weller-etal-2020-learning,
    title = "Learning from Task Descriptions",
    author = "Weller, Orion  and
      Lourie, Nicholas  and
      Gardner, Matt  and
      Peters, Matthew",
    booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.emnlp-main.105",
    pages = "1361--1375",
    abstract = "Typically, machine learning systems solve new tasks by training on thousands of examples. In contrast, humans can solve new tasks by reading some instructions, with perhaps an example or two. To take a step toward closing this gap, we introduce a framework for developing NLP systems that solve new tasks after reading their descriptions, synthesizing prior work in this area. We instantiate this frame- work with a new English language dataset, ZEST, structured for task-oriented evaluation on unseen tasks. Formulating task descriptions as questions, we ensure each is general enough to apply to many possible inputs, thus comprehensively evaluating a model{'}s ability to solve each task. Moreover, the dataset{'}s structure tests specific types of systematic generalization. We find that the state-of-the-art T5 model achieves a score of 12% on ZEST, leaving a significant challenge for NLP researchers.",
}

Contributions

Thanks to @joeddav for adding this dataset.

作者:

佚名

数据集大小:

16.34 KB