数据集:

GEM/schema_guided_dialog

任务:

对话

语言:

计算机处理:

unknown

大小:

size_categories:unknown

语言创建人:

unknown

批注创建人:

crowd-sourced

源数据集:

original

预印本库:

arxiv:1909.05855 arxiv:2004.15006 arxiv:2002.01359

其他:

dialog-response-generation

许可:

cc-by-sa-4.0

数据集介绍文件清单

中文

Dataset Card for GEM/schema_guided_dialog

Link to Main Data Card

You can find the main data card on the GEM Website .

Dataset Summary

The GEM version of this dataset functions as a response generation dataset. The input specifies dialog acts that a model needs to verbalize. The Schema-Guided Dialog dataset is challenging since it comprises multiple domains from hotel and travel to restaurants, and a wide range of dialog acts. The context of each conversation is provided as well.

You can load the dataset via:

import datasets
data = datasets.load_dataset('GEM/schema_guided_dialog')

The data loader can be found here .

website

n/a

paper

Arxiv

authors

Abhinav Rastogi, Xiaoxue Zang, Srinivas Sunkara, Raghav Gupta, Pranav Khaitan, Amir Fayazi, Maria Wang, and Guan-Lin Chao

Dataset Overview

Where to find the Data and its Documentation

Download

[Github[( https://github.com/google-research-datasets/dstc8-schema-guided-dialogue )

Paper

Arxiv

BibTex

{
@inproceedings{rastogi2020towards,
  title={Towards scalable multi-domain conversational agents: The schema-guided dialogue dataset},
  author={Rastogi, Abhinav and Zang, Xiaoxue and Sunkara, Srinivas and Gupta, Raghav and Khaitan, Pranav},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={34},
  number={05},
  pages={8689--8696},
  year={2020}
}

Contact Name

Abhinav Rastogi

Contact Email

schema-guided-dst@google.com

Has a Leaderboard?

Languages and Intended Use

Multilingual?

Covered Languages

English

Whose Language?

The language structure is machine-generated, and the language realizations are produced by crowd workers. The dataset paper does not provide demographic information for the crowd workers.

License

cc-by-sa-4.0: Creative Commons Attribution Share Alike 4.0 International

Intended Use

The Schema-Guided Dialogue (SGD) dataset contains 18K multi-domain task-oriented dialogues between a human and a virtual assistant, which covers 17 domains ranging from banks and events to media, calendar, travel, and weather. The language presents in the datset is only English. The SGD dataset provides a challenging testbed for a number of tasks in task-oriented dialogue, including language understanding, slot filling, dialogue state tracking and response generation. For the creation of the SGD dataset, they developed a multi-domain dialogue simulator that generates dialogue outlines over an arbitrary combination of APIs, dialogue states and system actions. Then, they used a crowd-sourcing procedure to paraphrase these outlines to natural language utterances. This novel crowd-sourcing procedure preserves all annotations obtained from the simulator and does not require any extra annotations after dialogue collection.

Primary Task

Dialog Response Generation

Communicative Goal

The goal of a speaker who generates the target utterance is to help users accomplish tasks including but not limited to finding flights, booking restaurants, searching for nearby events and movies.

Credit

Curation Organization Type(s)

industry

Curation Organization(s)

Google

Dataset Creators

Abhinav Rastogi, Xiaoxue Zang, Srinivas Sunkara, Raghav Gupta, Pranav Khaitan, Amir Fayazi, Maria Wang, and Guan-Lin Chao

Funding

Google

Who added the Dataset to GEM?

Wanyu Du wrote the initial data card and Yacine Jernite the data loader. Simon Mille updated the data card with the additional splits. Sebastian Gehrmann migrated the data card and loader to the v2 version and extended the missing information.

Dataset Structure

Data Fields

Each dialog instance has the following fields:

dialogue_id : A unique identifier for a dialogue.
services : A list of services present in the dialogue.
turns : A list of annotated system or user utterances. Each turn consists of the following fields:
- speaker : The speaker for the turn, either USER or SYSTEM .
- utterance : A string containing the natural language utterance.
- frames : A list of frames, each frame containing annotations for a single service and consists of the following fields:
  - service : The name of the service corresponding to the frame. The slots and intents used in the following fields are taken from the schema of this service.
  - slots : A list of slot spans in the utterance, only provided for non-categorical slots. Each slot span contains the following fields:
    - slot : The name of the slot.
    - start : The index of the starting character in the utterance corresponding to the slot value.
    - exclusive_end : The index of the character just after the last character corresponding to the slot value in the utterance.
  - actions : A list of actions corresponding to the system. Each action has the following fields:
    - act : The type of action.
    - slot : (optional) A slot argument for some of the actions.
    - values : (optional) A list of values assigned to the slot. If the values list is non-empty, then the slot must be present.
    - canonical_values : (optional) The values in their canonicalized form as used by the service. It is a list of strings of the same length as values.
  - service_call : (system turns only, optional) The request sent to the service. It consists of the following fields:
    - method : The name of the intent or function of the service or API being executed.
    - parameters : A pair of lists of the same lengths: parameter_slot_name contains slot names and parameter_canonical_value contains the corresponding values in their canonicalized form.
  - service_results : (system turns only, optional) A list of entities containing the results obtained from the service. It is only available for turns in which a service call is made. Each entity is represented as a pair of lists of the same length: service_slot_name contains slot names and service_canonical_value contains the corresponding canonical values.
  - state : (user turns only) The dialogue state corresponding to the service. It consists of the following fields:
    - active_intent : The intent corresponding to the service of the frame which is currently being fulfilled by the system. It takes the value "NONE" if none of the intents are active.
    - requested_slots : A list of slots requested by the user in the current turn.
    - slot_values : A pair of lists of the same lengths: slot_name contains slot names and slot_value_list contains the corresponding lists of strings. For categorical slots, this list contains a single value assigned to the slot. For non-categorical slots, all the values in this list are spoken variations of each other and are equivalent (e.g, "6 pm", "six in the evening", "evening at 6" etc.).

Example Instance

{'dialogue_id': '1_00000',
 'services': ['Restaurants_1'],
 'turns':
 {'frames':
     [{'actions': [{'act': [6],
      'canonical_values': [['FindRestaurants']],
      'slot': ['intent'],
      'values': [['FindRestaurants']]}],
      'service': ['Restaurants_1'],
      'service_call': [{'method': '',
      'parameters': {'parameter_canonical_value': [],
       'parameter_slot_name': []}}],
      'service_results': [{'service_results_list': []}],
      'slots': [{'exclusive_end': [], 'slot': [], 'start': []}],
      'state': [{'active_intent': 'FindRestaurants',
                   'requested_slots': [],
                   'slot_values': {'slot_name': [], 'slot_value_list': []}}]},
     {'actions': [{'act': [13],
      'canonical_values': [[]],
      'slot': ['city'],
      'values': [[]]}],
      'service': ['Restaurants_1'],
      'service_call': [{'method': '',
      'parameters': {'parameter_canonical_value': [],
       'parameter_slot_name': []}}],
      'service_results': [{'service_results_list': []}],
      'slots': [{'exclusive_end': [], 'slot': [], 'start': []}],
      'state': [{'active_intent': '',
                 'requested_slots': [],
                 'slot_values': {'slot_name': [], 'slot_value_list': []}}]},
    ...,]}
 'speaker': [0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1],
 'utterance': [
   'I am feeling hungry so I would like to find a place to eat.',
   'Do you have a specific which you want the eating place to be located at?',
   'I would like for it to be in San Jose.',
   'Is there a specific cuisine type you enjoy, such as Mexican, Italian or something else?',
   'I usually like eating the American type of food.',
   'I see that at 71 Saint Peter there is a good restaurant which is in San Jose.',
   'Can you give me the address of this restaurant.',
   'If you want to go to this restaurant you can find it at 71 North San Pedro Street.',
   'Can you give me the phone number that I can contact them with?',
   'If you want to phone them you can at 408-971-8523.',
   'Is there some other restaurant which you can suggest?',
   'How would you like Bazille restaurant which is situated in San Jose.',
   'Do you have another restaurant matching my needs? For example a restaurant which is economical and is located in Palo Alto.',
   'I see that 7 restaurants suit to what you requested. Bird Dog seems as a good restaurant and is located in Palo Alto.',
   'Alright, that seems good. I would like to make a booking at this restaurant.',
   'For which time do you want the booking to be?',
   'I will be eating there at 11:30 am so make it for then.',
   'Can you please confirm that you want to book a table for 2 at 11:30 am at the Bird Dog restaurant in Palo Alto for today.',
   'That suits me well. Can you tell me if they feature live music?',
   'Your booking has been made without errors, but unfortunately they do not have live music.',
   'Will I be able to find liquor there? Can you give me the address of their location?',
   'The restaurant is located at 420 Ramona Street. Unfortunately they do not serve alcohol at the restaurant.',
   'I appreciate it very much. That would be all.',
   'Have a good time!'
 ]}

Data Splits

The dataset is split into a train, validation, and test set with the following sizes:

Train	Validation	Test
# of dialogues	16142	2482	4201
# of turns	48426	7446	12603

Splitting Criteria

The data is generally split i.i.d, but some topics only appear in training and some only for testing. For example, the domains Messaging, Payment, and Train are test-only.

Dataset in GEM

Rationale for Inclusion in GEM

Why is the Dataset in GEM?

This dataset comprises a wide range of dialog capabilities and thus enables the evaluation of many more generation capabilities of comparable datasets. Its collection methodology ensures a high diversity but also high quality of the data.

Similar Datasets

yes

Unique Language Coverage

Difference from other GEM datasets

The domains a lot more diverse than other datasets.

Ability that the Dataset measures

surface realization, compositionality.

GEM-Specific Curation

Modificatied for GEM?

yes

GEM Modifications

data points modified

Modification Details

We are focusing on the response-generation part of the dataset and thus reformatted the dataset to treat the service agent utterances as the targets to be generated and the previous customer utterance and the agent's dialog act as the input. We additionally reformat the dialog acts to directly conform to the format described in this paper .

Additional Splits?

yes

Split Information

9 challenge sets for Schema-Guided Dialog were added to the GEM evaluation suite.

We created subsets of the training and development sets of 500 randomly selected inputs each.

We applied 5 transformations to respectively 5 sets of 500 randomly selected inputs: (i) back-translation, (ii)-(iii) introduction of typographical errors, using Butterfingers with two thresholds (0.02 and 0.05), resulting in two sets with different amounts of typos introduced (there are more typos with the 0.05 threshold than with the 0.02 one), (iv) removal of final punctuations (when any), and (v) input scrambling, for which the order of the dialogue acts was randomly reassigned.

For the input size, we created subpopulations based on the number of dialogue acts in the input.

DA number	Frequency English
1	5049
2	2517
3	1328
4	469
5	335
6	256
7	46

We also split the test data according to the type of dialogue act, represented by cardinal numbers in the dataset.

DA type	Frequency English
2	1397
3	983
4	1027
5	958
9	72
10	1024
11	1246
12	500
13	2078
15	715

Split Motivation

Generalization and Robustness.

Getting Started with the Task

Pointers to Resources

Previous Results

Measured Model Abilities

Surface realization and compositionally.

Metrics

BLEURT , BLEU , ROUGE

Proposed Evaluation

The original paper focused on the task of dialog state prediction instead of response generation and thus did not suggest any set of metrics.

Previous results available?

Dataset Curation

Original Curation

Original Curation Rationale

Previous multi-domain task-oriented dialogue datsets do not sufficiently capture the real-world challenges in virtual assistants, since they cover few domains and assume a single static ontology per domain. The SGD datset is created to cover 17 domains with over 16K dialogues, and contain multiple different APIs in most domains, many of which have overlapping functionalities but different interfaces, which reflects common real-world scenarios. The wide range of available annotations can be used for intent prediction, slot filling, dialogue state tracking, policy imitation learning, language generation, user simulation learning, among other tasks in large-scale virtual assistants.

Communicative Goal

The goal of a speaker who generates the target utterance is to help users accomplish tasks including but not limited to finding flights, booking restaurants, searching for nearby events and movies.

Sourced from Different Sources

Language Data

How was Language Data Obtained?

Machine-generated

Generation Method Link

Github

Language Producers

The dialogue outlines are first generated by a simulator. The dialogue simulator interacts with the services to generate dialogue outlines. It consists of two agents playing the roles of the user and the system, interacting with each other using a finite set of actions specified through dialogue acts over a probabilistic automaton designed to capture varied dialogue trajectories. It is worth noting that the simulation automaton does not include any domain-specific constraints: all domain-specific constraints are encoded in the schema and scenario.

The dialogue paraphrasing framework then converts the outlines generated by the simulator into a natural conversation. Users may refer to the slot values in the dialogue acts in various different ways during the conversation, e.g., “los angeles” may be referred to as “LA” or “LAX”. To introduce these natural variations in the slot values, different slot values are replaced with a randomly selected variation while being kept consistent across user turns in a dialogue. The actions are then converted to pseudo-natural language utterances using a set of manually defined action-to-text templates, and the resulting utterances for the different actions in a turn are concatenated together.

Topics Covered

The dataset covers the following domains: Alarm, Banks, Buses, Calendar, Events, Flights, Homes, Hotels, Media, Messaging, Movies, Music, Payment, RentalCars, Restaurants, RideSharing, Services, Train, Travel, and Weather. The domain ‘Service’ includes salons, dentists, doctors etc. The ‘Alarm’, ‘Messaging’, ‘Payment’ and ‘Train’ domains are only present in the dev or test sets. to test generalization to new domains.

Data Validation

not validated

Was Data Filtered?

not filtered

Structured Annotations

Additional Annotations?

crowd-sourced

Number of Raters

unknown

Raters per Training Example

Raters per Test Example

Annotation Service?

unknown

Annotation Values

The dialogue transformed by these steps is sent to the crowd workers to be reformulated into more natural language. One crowd worker is tasked with paraphrasing all utterances of a dialogue to ensure naturalness and coherence. The crowd workers are asked to exactly repeat the slot values in their paraphrases so that the span indices for the slots can be recovered via string matching.

Any Quality Control?

none

Consent

Any Consent Policy?

Justification for Using the Data

While no policy is reported, we assume that one was in place for the collection.

Private Identifying Information (PII)

Contains PII?

no PII

Justification for no PII

The SGD dataset does not use identity categories and does not contain sensitive data.

Maintenance

Any Maintenance Plan?

Broader Social Context

Previous Work on the Social Impact of the Dataset

Usage of Models based on the Data

Impact on Under-Served Communities

Addresses needs of underserved Communities?

Discussion of Biases

Any Documented Social Biases?

Are the Language Producers Representative of the Language?

Due to the combination of the automatic generation and crowd rater paraphasing, the language can be very formulaic. While this may be acceptable for the model part (i.e., we may actually desire an automated agent to form formulaic responses), the input utterances of the simulated customers likely do not cover the entire spectrum of the English language.

Considerations for Using the Data

PII Risks and Liability

Licenses

open license - commercial use allowed

Known Technical Limitations

Technical Limitations

The dialogues under each domain distributed unevenly, where the flights domain has 3644 dialogues while the payment domain only contains 222 dialogues. Besides, all dialogues are paraphrased by crowd-workers, and it is possible that crow-workers with different culture backgrounds will exhibit biased opinions.

Unsuited Applications

Since the initial data was automatically generated, the coverage of entity names is necessarily biased. An agent thus needs to be evaluated in a more realistic environment.

作者:

GEM

数据集大小:

68.74 KB