数据集:

snips_built_in_intents

任务:

文本分类

子任务:

intent-classification

语言:

计算机处理:

monolingual

大小:

n<1K

语言创建人:

expert-generated

批注创建人:

expert-generated

源数据集:

original

预印本库:

arxiv:1805.10190

许可:

cc0-1.0

数据集介绍文件清单

中文

Dataset Card for Snips Built In Intents

Dataset Summary

Snips' built in intents dataset was initially used to compare different voice assistants and released as a public dataset hosted at https://github.com/sonos/nlu-benchmark in folder 2016-12-built-in-intents. The dataset contains 328 utterances over 10 intent classes. A related Medium post is https://medium.com/snips-ai/benchmarking-natural-language-understanding-systems-d35be6ce568d .

Supported Tasks and Leaderboards

There are no related shared tasks that we are aware of.

Languages

English

Dataset Structure

Data Instances

The dataset contains 328 utterances over 10 intent classes. Each sample looks like: {'label': 8, 'text': 'Transit directions to Barcelona Pizza.'}

Data Fields

text : The text utterance expressing some user intent.
label : The intent label of the piece of text utterance.

Data Splits

The source data is not split.

Dataset Creation

Curation Rationale

The dataset was originally created to compare the performance of a number of voice assistants. However, the labelled utterances are useful for developing and benchmarking text chatbots as well.

Source Data

Initial Data Collection and Normalization

It is not clear how the data was collected. From the Medium post: The benchmark relies on a set of 328 queries built by the business team at Snips, and kept secret from data scientists and engineers throughout the development of the solution.

Who are the source language producers?

Originally prepared by snips.ai. The Snips team has since joined Sonos in November 2019. These open datasets remain available and their access is now managed by the Sonos Voice Experience Team. Please email sve-research@sonos.com with any question.

Annotations

Annotation process

Who are the annotators?

[More Information Needed]

Personal and Sensitive Information

[More Information Needed]

Considerations for Using the Data

Social Impact of Dataset

[More Information Needed]

Discussion of Biases

[More Information Needed]

Other Known Limitations

[More Information Needed]

Additional Information

Dataset Curators

Licensing Information

The source data is licensed under Creative Commons Zero v1.0 Universal.

Citation Information

Any publication based on these datasets must include a full citation to the following paper in which the results were published by the Snips Team:

Coucke A. et al., "Snips Voice Platform: an embedded Spoken Language Understanding system for private-by-design voice interfaces." CoRR 2018, https://arxiv.org/abs/1805.10190

Contributions

Thanks to @bduvenhage for adding this dataset.

作者:

佚名

数据集大小:

15.26 KB