数据集:
nlu_evaluation_data
任务:
文本分类语言:
en计算机处理:
monolingual大小:
10K<n<100K语言创建人:
expert-generated批注创建人:
expert-generated源数据集:
original预印本库:
arxiv:1903.05566许可:
cc-by-4.0Dataset with short utterances from conversational domain annotated with their corresponding intents and scenarios.
It has 25 715 non-zero examples (original dataset has 25716 examples) belonging to 18 scenarios and 68 intents. Originally, the dataset was crowd-sourced and annotated with both intents and named entities in order to evaluate commercial NLU systems such as RASA, IBM's Watson, Microsoft's LUIS and Google's Dialogflow. This version of the dataset only includes intent annotations!
In contrast to paper claims, released data contains 68 unique intents. This is due to the fact, that NLU systems were evaluated on more curated part of this dataset which only included 64 most important intents. Read more in github issue .
Intent classification, intent detection
English
An example of 'train' looks as follows:
{ 'label': 2, # integer label corresponding to "alarm_set" intent 'scenario': 'alarm', 'text': 'wake me up at five am this week' }
Intent names are mapped to label in the following way:
label | intent |
---|---|
0 | alarm_query |
1 | alarm_remove |
2 | alarm_set |
3 | audio_volume_down |
4 | audio_volume_mute |
5 | audio_volume_other |
6 | audio_volume_up |
7 | calendar_query |
8 | calendar_remove |
9 | calendar_set |
10 | cooking_query |
11 | cooking_recipe |
12 | datetime_convert |
13 | datetime_query |
14 | email_addcontact |
15 | email_query |
16 | email_querycontact |
17 | email_sendemail |
18 | general_affirm |
19 | general_commandstop |
20 | general_confirm |
21 | general_dontcare |
22 | general_explain |
23 | general_greet |
24 | general_joke |
25 | general_negate |
26 | general_praise |
27 | general_quirky |
28 | general_repeat |
29 | iot_cleaning |
30 | iot_coffee |
31 | iot_hue_lightchange |
32 | iot_hue_lightdim |
33 | iot_hue_lightoff |
34 | iot_hue_lighton |
35 | iot_hue_lightup |
36 | iot_wemo_off |
37 | iot_wemo_on |
38 | lists_createoradd |
39 | lists_query |
40 | lists_remove |
41 | music_dislikeness |
42 | music_likeness |
43 | music_query |
44 | music_settings |
45 | news_query |
46 | play_audiobook |
47 | play_game |
48 | play_music |
49 | play_podcasts |
50 | play_radio |
51 | qa_currency |
52 | qa_definition |
53 | qa_factoid |
54 | qa_maths |
55 | qa_stock |
56 | recommendation_events |
57 | recommendation_locations |
58 | recommendation_movies |
59 | social_post |
60 | social_query |
61 | takeaway_order |
62 | takeaway_query |
63 | transport_query |
64 | transport_taxi |
65 | transport_ticket |
66 | transport_traffic |
67 | weather_query |
Dataset statistics | Train |
---|---|
Number of examples | 25 715 |
Average character length | 34.32 |
Number of intents | 68 |
Number of scenarios | 18 |
The dataset was prepared for a wide coverage evaluation and comparison of some of the most popular NLU services. At that time, previous benchmarks were done with few intents and spawning limited number of domains. Here, the dataset is much larger and contains 68 intents from 18 scenarios, which is much larger that any previous evaluation. For more discussion see the paper.
[More Information Needed]
Who are the source language producers?[More Information Needed]
To build the NLU component we collected real user data via Amazon Mechanical Turk (AMT). We designed tasks where the Turker’s goal was to answer questions about how people would interact with the home robot, in a wide range of scenarios designed in advance, namely: alarm, audio, audiobook, calendar, cooking, datetime, email, game, general, IoT, lists, music, news, podcasts, general Q&A, radio, recommendations, social, food takeaway, transport, and weather. The questions put to Turkers were designed to capture the different requests within each given scenario. In the ‘calendar’ scenario, for example, these pre-designed intents were included: ‘set event’, ‘delete event’ and ‘query event’. An example question for intent ‘set event’ is: “How would you ask your PDA to schedule a meeting with someone?” for which a user’s answer example was “Schedule a chat with Adam on Thursday afternoon”. The Turkers would then type in their answers to these questions and select possible entities from the pre-designed suggested entities list for each of their answers.The Turkers didn’t always follow the instructions fully, e.g. for the specified ‘delete event’ Intent, an answer was: “PDA what is my next event?”; which clearly belongs to ‘query event’ Intent. We have manually corrected all such errors either during post-processing or the subsequent annotations.
Who are the annotators?[More Information Needed]
[More Information Needed]
The purpose of this dataset it to help develop better intent detection systems.
[More Information Needed]
[More Information Needed]
[More Information Needed]
Creative Commons Attribution 4.0 International License (CC BY 4.0)
@InProceedings{XLiu.etal:IWSDS2019, author = {Xingkun Liu, Arash Eshghi, Pawel Swietojanski and Verena Rieser}, title = {Benchmarking Natural Language Understanding Services for building Conversational Agents}, booktitle = {Proceedings of the Tenth International Workshop on Spoken Dialogue Systems Technology (IWSDS)}, month = {April}, year = {2019}, address = {Ortigia, Siracusa (SR), Italy}, publisher = {Springer}, pages = {xxx--xxx}, url = {http://www.xx.xx/xx/} }
Thanks to @dkajtoch for adding this dataset.