数据集:
meta_woz
子任务:
dialogue-modeling语言:
en计算机处理:
monolingual大小:
10K<n<100K语言创建人:
crowdsourced批注创建人:
crowdsourced源数据集:
original预印本库:
arxiv:2003.01680许可:
otherMetaLWOz: A Dataset of Multi-Domain Dialogues for the Fast Adaptation of Conversation Models. We introduce the Meta-Learning Wizard of Oz (MetaLWOz) dialogue dataset for developing fast adaptation methods for conversation models. This data can be used to train task-oriented dialogue models, specifically to develop methods to quickly simulate user responses with a small amount of data. Such fast-adaptation models fall into the research areas of transfer learning and meta learning. The dataset consists of 37,884 crowdsourced dialogues recorded between two human users in a Wizard of Oz setup, in which one was instructed to behave like a bot, and the other a true human user. The users are assigned a task belonging to a particular domain, for example booking a reservation at a particular restaurant, and work together to complete the task. Our dataset spans 47 domains having 227 tasks total. Dialogues are a minimum of 10 turns long.
This dataset supports a range of task.
The text in the dataset is in English ( en ).
A data instance is a full multi-turn dialogue between two crowd-workers, one had the role of being a bot , and the other one was the user . Both were given a domain and a task . Each turn has a single utterance, e.g.:
Domain: Ski User Task: You want to know if there are good ski hills an hour’s drive from your current location. Bot Task: Tell the user that there are no ski hills in their immediate location. Bot: Hello how may I help you? User: Is there any good ski hills an hour’s drive from my current location? Bot: I’m sorry to inform you that there are no ski hills in your immediate location User: Can you help me find the nearest? Bot: Absolutely! It looks like you’re about 3 hours away from Bear Mountain. That seems to be the closest. User: Hmm.. sounds good Bot: Alright! I can help you get your lift tickets now!When will you be going? User: Awesome! please get me a ticket for 10pax Bot: You’ve got it. Anything else I can help you with? User: None. Thanks again! Bot: No problem!
Example of input/output for this dialog:
Input: dialog history = Hello how may I help you?; Is there any good ski hills an hour’s drive from my current location?; I’m sorry to inform you that there are no ski hills in your immediate location Output: user response = Can you help me find the nearest?
Each dialogue instance has the following fields:
Each task instance has following fields:
The dataset is split into a train and test split with the following sizes:
Training MetaLWOz | Evaluation MetaLWOz | Combined | |
---|---|---|---|
Total Domains | 47 | 4 | 51 |
Total Tasks | 226 | 14 | 240 |
Total Dialogs | 37884 | 2319 | 40203 |
Below are the various statistics of the dataset:
Statistic | Mean | Minimum | Maximum |
---|---|---|---|
Number of tasks per domain | 4.8 | 3 | 11 |
Number of dialogs per domain | 806.0 | 288 | 1990 |
Number of dialogs per task | 167.6 | 32 | 285 |
Number of turns per dialog | 11.4 | 10 | 46 |
[More Information Needed]
[More Information Needed]
Who are the source language producers?[More Information Needed]
[More Information Needed]
Who are the annotators?[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
The dataset v1 version is created by team of researchers from Microsoft Research (Montreal, Canada)
The dataset is released under Microsoft Research Data License Agreement
You can cite the following for the various versions of MetaLWOz:
Version 1.0
@InProceedings{shalyminov2020fast, author = {Shalyminov, Igor and Sordoni, Alessandro and Atkinson, Adam and Schulz, Hannes}, title = {Fast Domain Adaptation For Goal-Oriented Dialogue Using A Hybrid Generative-Retrieval Transformer}, booktitle = {2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, year = {2020}, month = {April}, url = {https://www.microsoft.com/en-us/research/publication/fast-domain-adaptation-for-goal-oriented-dialogue-using-a -hybrid-generative-retrieval-transformer/}, }
Thanks to @pacman100 for adding this dataset.