数据集:
GEM/schema_guided_dialog
You can find the main data card on the GEM Website .
The GEM version of this dataset functions as a response generation dataset. The input specifies dialog acts that a model needs to verbalize. The Schema-Guided Dialog dataset is challenging since it comprises multiple domains from hotel and travel to restaurants, and a wide range of dialog acts. The context of each conversation is provided as well.
You can load the dataset via:
import datasets data = datasets.load_dataset('GEM/schema_guided_dialog')
The data loader can be found here .
websiten/a
paper authorsAbhinav Rastogi, Xiaoxue Zang, Srinivas Sunkara, Raghav Gupta, Pranav Khaitan, Amir Fayazi, Maria Wang, and Guan-Lin Chao
[Github[( https://github.com/google-research-datasets/dstc8-schema-guided-dialogue )
Paper BibTex{ @inproceedings{rastogi2020towards, title={Towards scalable multi-domain conversational agents: The schema-guided dialogue dataset}, author={Rastogi, Abhinav and Zang, Xiaoxue and Sunkara, Srinivas and Gupta, Raghav and Khaitan, Pranav}, booktitle={Proceedings of the AAAI Conference on Artificial Intelligence}, volume={34}, number={05}, pages={8689--8696}, year={2020} }Contact Name
Abhinav Rastogi
Contact Emailschema-guided-dst@google.com
Has a Leaderboard?no
no
Covered LanguagesEnglish
Whose Language?The language structure is machine-generated, and the language realizations are produced by crowd workers. The dataset paper does not provide demographic information for the crowd workers.
Licensecc-by-sa-4.0: Creative Commons Attribution Share Alike 4.0 International
Intended UseThe Schema-Guided Dialogue (SGD) dataset contains 18K multi-domain task-oriented dialogues between a human and a virtual assistant, which covers 17 domains ranging from banks and events to media, calendar, travel, and weather. The language presents in the datset is only English. The SGD dataset provides a challenging testbed for a number of tasks in task-oriented dialogue, including language understanding, slot filling, dialogue state tracking and response generation. For the creation of the SGD dataset, they developed a multi-domain dialogue simulator that generates dialogue outlines over an arbitrary combination of APIs, dialogue states and system actions. Then, they used a crowd-sourcing procedure to paraphrase these outlines to natural language utterances. This novel crowd-sourcing procedure preserves all annotations obtained from the simulator and does not require any extra annotations after dialogue collection.
Primary TaskDialog Response Generation
Communicative GoalThe goal of a speaker who generates the target utterance is to help users accomplish tasks including but not limited to finding flights, booking restaurants, searching for nearby events and movies.
industry
Curation Organization(s)Abhinav Rastogi, Xiaoxue Zang, Srinivas Sunkara, Raghav Gupta, Pranav Khaitan, Amir Fayazi, Maria Wang, and Guan-Lin Chao
FundingWanyu Du wrote the initial data card and Yacine Jernite the data loader. Simon Mille updated the data card with the additional splits. Sebastian Gehrmann migrated the data card and loader to the v2 version and extended the missing information.
Each dialog instance has the following fields:
{'dialogue_id': '1_00000', 'services': ['Restaurants_1'], 'turns': {'frames': [{'actions': [{'act': [6], 'canonical_values': [['FindRestaurants']], 'slot': ['intent'], 'values': [['FindRestaurants']]}], 'service': ['Restaurants_1'], 'service_call': [{'method': '', 'parameters': {'parameter_canonical_value': [], 'parameter_slot_name': []}}], 'service_results': [{'service_results_list': []}], 'slots': [{'exclusive_end': [], 'slot': [], 'start': []}], 'state': [{'active_intent': 'FindRestaurants', 'requested_slots': [], 'slot_values': {'slot_name': [], 'slot_value_list': []}}]}, {'actions': [{'act': [13], 'canonical_values': [[]], 'slot': ['city'], 'values': [[]]}], 'service': ['Restaurants_1'], 'service_call': [{'method': '', 'parameters': {'parameter_canonical_value': [], 'parameter_slot_name': []}}], 'service_results': [{'service_results_list': []}], 'slots': [{'exclusive_end': [], 'slot': [], 'start': []}], 'state': [{'active_intent': '', 'requested_slots': [], 'slot_values': {'slot_name': [], 'slot_value_list': []}}]}, ...,]} 'speaker': [0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1], 'utterance': [ 'I am feeling hungry so I would like to find a place to eat.', 'Do you have a specific which you want the eating place to be located at?', 'I would like for it to be in San Jose.', 'Is there a specific cuisine type you enjoy, such as Mexican, Italian or something else?', 'I usually like eating the American type of food.', 'I see that at 71 Saint Peter there is a good restaurant which is in San Jose.', 'Can you give me the address of this restaurant.', 'If you want to go to this restaurant you can find it at 71 North San Pedro Street.', 'Can you give me the phone number that I can contact them with?', 'If you want to phone them you can at 408-971-8523.', 'Is there some other restaurant which you can suggest?', 'How would you like Bazille restaurant which is situated in San Jose.', 'Do you have another restaurant matching my needs? For example a restaurant which is economical and is located in Palo Alto.', 'I see that 7 restaurants suit to what you requested. Bird Dog seems as a good restaurant and is located in Palo Alto.', 'Alright, that seems good. I would like to make a booking at this restaurant.', 'For which time do you want the booking to be?', 'I will be eating there at 11:30 am so make it for then.', 'Can you please confirm that you want to book a table for 2 at 11:30 am at the Bird Dog restaurant in Palo Alto for today.', 'That suits me well. Can you tell me if they feature live music?', 'Your booking has been made without errors, but unfortunately they do not have live music.', 'Will I be able to find liquor there? Can you give me the address of their location?', 'The restaurant is located at 420 Ramona Street. Unfortunately they do not serve alcohol at the restaurant.', 'I appreciate it very much. That would be all.', 'Have a good time!' ]}Data Splits
The dataset is split into a train, validation, and test set with the following sizes:
Train | Validation | Test | |
---|---|---|---|
# of dialogues | 16142 | 2482 | 4201 |
# of turns | 48426 | 7446 | 12603 |
The data is generally split i.i.d, but some topics only appear in training and some only for testing. For example, the domains Messaging, Payment, and Train are test-only.
This dataset comprises a wide range of dialog capabilities and thus enables the evaluation of many more generation capabilities of comparable datasets. Its collection methodology ensures a high diversity but also high quality of the data.
Similar Datasetsyes
Unique Language Coverageno
Difference from other GEM datasetsThe domains a lot more diverse than other datasets.
Ability that the Dataset measuressurface realization, compositionality.
yes
GEM Modificationsdata points modified
Modification DetailsWe are focusing on the response-generation part of the dataset and thus reformatted the dataset to treat the service agent utterances as the targets to be generated and the previous customer utterance and the agent's dialog act as the input. We additionally reformat the dialog acts to directly conform to the format described in this paper .
Additional Splits?yes
Split Information9 challenge sets for Schema-Guided Dialog were added to the GEM evaluation suite.
DA number | Frequency English |
---|---|
1 | 5049 |
2 | 2517 |
3 | 1328 |
4 | 469 |
5 | 335 |
6 | 256 |
7 | 46 |
We also split the test data according to the type of dialogue act, represented by cardinal numbers in the dataset.
DA type | Frequency English |
---|---|
2 | 1397 |
3 | 983 |
4 | 1027 |
5 | 958 |
9 | 72 |
10 | 1024 |
11 | 1246 |
12 | 500 |
13 | 2078 |
15 | 715 |
Generalization and Robustness.
Surface realization and compositionally.
MetricsBLEURT , BLEU , ROUGE
Proposed EvaluationThe original paper focused on the task of dialog state prediction instead of response generation and thus did not suggest any set of metrics.
Previous results available?no
Previous multi-domain task-oriented dialogue datsets do not sufficiently capture the real-world challenges in virtual assistants, since they cover few domains and assume a single static ontology per domain. The SGD datset is created to cover 17 domains with over 16K dialogues, and contain multiple different APIs in most domains, many of which have overlapping functionalities but different interfaces, which reflects common real-world scenarios. The wide range of available annotations can be used for intent prediction, slot filling, dialogue state tracking, policy imitation learning, language generation, user simulation learning, among other tasks in large-scale virtual assistants.
Communicative GoalThe goal of a speaker who generates the target utterance is to help users accomplish tasks including but not limited to finding flights, booking restaurants, searching for nearby events and movies.
Sourced from Different Sourcesno
Machine-generated
Generation Method Link Language ProducersThe dialogue outlines are first generated by a simulator. The dialogue simulator interacts with the services to generate dialogue outlines. It consists of two agents playing the roles of the user and the system, interacting with each other using a finite set of actions specified through dialogue acts over a probabilistic automaton designed to capture varied dialogue trajectories. It is worth noting that the simulation automaton does not include any domain-specific constraints: all domain-specific constraints are encoded in the schema and scenario.
The dialogue paraphrasing framework then converts the outlines generated by the simulator into a natural conversation. Users may refer to the slot values in the dialogue acts in various different ways during the conversation, e.g., “los angeles” may be referred to as “LA” or “LAX”. To introduce these natural variations in the slot values, different slot values are replaced with a randomly selected variation while being kept consistent across user turns in a dialogue. The actions are then converted to pseudo-natural language utterances using a set of manually defined action-to-text templates, and the resulting utterances for the different actions in a turn are concatenated together.
Topics CoveredThe dataset covers the following domains: Alarm, Banks, Buses, Calendar, Events, Flights, Homes, Hotels, Media, Messaging, Movies, Music, Payment, RentalCars, Restaurants, RideSharing, Services, Train, Travel, and Weather. The domain ‘Service’ includes salons, dentists, doctors etc. The ‘Alarm’, ‘Messaging’, ‘Payment’ and ‘Train’ domains are only present in the dev or test sets. to test generalization to new domains.
Data Validationnot validated
Was Data Filtered?not filtered
crowd-sourced
Number of Ratersunknown
Raters per Training Example0
Raters per Test Example0
Annotation Service?unknown
Annotation ValuesThe dialogue transformed by these steps is sent to the crowd workers to be reformulated into more natural language. One crowd worker is tasked with paraphrasing all utterances of a dialogue to ensure naturalness and coherence. The crowd workers are asked to exactly repeat the slot values in their paraphrases so that the span indices for the slots can be recovered via string matching.
Any Quality Control?none
no
Justification for Using the DataWhile no policy is reported, we assume that one was in place for the collection.
no PII
Justification for no PIIThe SGD dataset does not use identity categories and does not contain sensitive data.
no
no
no
no
Are the Language Producers Representative of the Language?Due to the combination of the automatic generation and crowd rater paraphasing, the language can be very formulaic. While this may be acceptable for the model part (i.e., we may actually desire an automated agent to form formulaic responses), the input utterances of the simulated customers likely do not cover the entire spectrum of the English language.
open license - commercial use allowed
Copyright Restrictions on the Language Dataopen license - commercial use allowed
The dialogues under each domain distributed unevenly, where the flights domain has 3644 dialogues while the payment domain only contains 222 dialogues. Besides, all dialogues are paraphrased by crowd-workers, and it is possible that crow-workers with different culture backgrounds will exhibit biased opinions.
Unsuited ApplicationsSince the initial data was automatically generated, the coverage of entity names is necessarily biased. An agent thus needs to be evaluated in a more realistic environment.