You can find the main data card on the GEM Website .
The DSTC10 Track2 Task 2 follows the DSTC9 Track1 task, where participants have to implement knowledge-grounded dialog systems. The training dataset is inherited from the DSTC9 challenge and is in the written domain, while the test set is newly collected and consists of noisy ASR transcripts. Hence, the dataset facilitates building models for grounded dialog response generation.
You can load the dataset via:
import datasets data = datasets.load_dataset('GEM/dstc10_track2_task2')
The data loader can be found here .
websitehttps://github.com/alexa/alexa-with-dstc10-track2-dataset
paper authorsSeokhwan Kim, Yang Liu, Di Jin, Alexandros Papangelis, Karthik Gopalakrishnan, Behnam Hedayatnia, Dilek Hakkani-Tur (Amazon Alexa AI)
https://github.com/alexa/alexa-with-dstc10-track2-dataset
Downloadhttps://github.com/alexa/alexa-with-dstc10-track2-dataset
Paper BibTex@inproceedings{kim2021robust, title={" How Robust ru?": Evaluating Task-Oriented Dialogue Systems on Spoken Conversations}, author={Kim, Seokhwan and Liu, Yang and Jin, Di and Papangelis, Alexandros and Gopalakrishnan, Karthik and Hedayatnia, Behnam and Hakkani-Tur, Dilek}, journal={IEEE Automatic Speech Recognition and Understanding Workshop}, year={2021} }
Contact NameSeokhwan Kim
Contact Emailseokhwk@amazon.com
Has a Leaderboard?yes
Leaderboard Linkhttps://eval.ai/challenge/1663/overview
Leaderboard DetailsIt evaluates the models based on the automatic metrics defined in the task paper for the three tasks of detection, selection and generation.
no
Covered LanguagesEn
Licenseapache-2.0: Apache License 2.0
Intended UseTo conduct research on dialogue state tracking and knowledge-grounded response generation.
Primary TaskDialog Response Generation
Communicative GoalThis dataset aims to explore the robustness of conversational models when trained on spoken data. It has two aspects, multi-domain dialogue state tracking and conversation modeling with access to unstructured knowledge.
industry
Curation Organization(s)Amazon
Dataset CreatorsSeokhwan Kim, Yang Liu, Di Jin, Alexandros Papangelis, Karthik Gopalakrishnan, Behnam Hedayatnia, Dilek Hakkani-Tur (Amazon Alexa AI)
FundingAmazon
Who added the Dataset to GEM?Alexandros Papangelis (Amazon Alexa AI), Di Jin (Amazon Alexa AI), Nico Daheim (RWTH Aachen University)
features = datasets.Features( { "id": datasets.Value("string"), "gem_id": datasets.Value("string"), "turns": [ { "speaker": datasets.Value("string"), "text": datasets.Value("string"), "nbest": [ { "hyp": datasets.Value("string"), "score": datasets.Value("float"), } ], } ], "knowledge": { "domain": datasets.Value("string"), "entity_name": datasets.Value("string"), "title": datasets.Value("string"), "body": datasets.Value("string"), }, "response": datasets.Value("string"), "source": datasets.Value("string"), "linearized_input": datasets.Value("string"), "target": datasets.Value("string"), "references": [datasets.Value("string")], } )
nbest contains an nbest list of outputs generated by an ASR system along with their scores.
knowledge defines the annotated grounding as well as its metadata
Reason for StructureIt was kept compatible with MultiWox 2.X data.
Example Instance{'id': '0', 'gem_id': 'GEM-dstc10_track2_task2-test-0', 'turns': [{'speaker': 'U', 'text': "hi uh i'm looking for restaurant in lower ha", 'nbest': [{'hyp': "hi uh i'm looking for restaurant in lower ha", 'score': -25.625450134277344}, {'hyp': "hi uh i'm looking for restaurant in lower hai", 'score': -25.969446182250977}, {'hyp': "hi uh i'm looking for restaurant in lower haig", 'score': -32.816890716552734}, {'hyp': "hi uh i'm looking for restaurant in lower haigh", 'score': -32.84316635131836}, {'hyp': "hi uh i'm looking for restaurant in lower hag", 'score': -32.8637580871582}, {'hyp': "hi uh i'm looking for restaurant in lower hah", 'score': -33.1048698425293}, {'hyp': "hi uh i'm looking for restaurant in lower hait", 'score': -33.96509552001953}, {'hyp': "hi um i'm looking for restaurant in lower hai", 'score': -33.97885513305664}, {'hyp': "hi um i'm looking for restaurant in lower haig", 'score': -34.56083679199219}, {'hyp': "hi um i'm looking for restaurant in lower haigh", 'score': -34.58711242675781}]}, {'speaker': 'S', 'text': 'yeah definitely i can go ahead and help you with that ummm what kind of option in a restaurant are you looking for', 'nbest': []}, {'speaker': 'U', 'text': 'yeah umm am looking for an expensive restaurant', 'nbest': [{'hyp': 'yeah umm am looking for an expensive restaurant', 'score': -21.272899627685547}, {'hyp': 'yeah umm m looking for an expensive restaurant', 'score': -21.444047927856445}, {'hyp': 'yeah umm a m looking for an expensive restaurant', 'score': -21.565458297729492}, {'hyp': 'yeah ummm am looking for an expensive restaurant', 'score': -21.68832778930664}, {'hyp': 'yeah ummm m looking for an expensive restaurant', 'score': -21.85947608947754}, {'hyp': 'yeah ummm a m looking for an expensive restaurant', 'score': -21.980886459350586}, {'hyp': "yeah umm a'm looking for an expensive restaurant", 'score': -22.613924026489258}, {'hyp': "yeah ummm a'm looking for an expensive restaurant", 'score': -23.02935218811035}, {'hyp': 'yeah um am looking for an expensive restaurant', 'score': -23.11180305480957}, {'hyp': 'yeah um m looking for an expensive restaurant', 'score': -23.28295135498047}]}, {'speaker': 'S', 'text': "lemme go ahead and see what i can find for you ok great so i do ummm actually no i'm sorry is there something else i can help you find i don't see anything expensive", 'nbest': []}, {'speaker': 'U', 'text': "sure ummm maybe if you don't have anything expensive how about something in the moderate price range", 'nbest': [{'hyp': "sure ummm maybe if you don't have anything expensive how about something in the moderate price range", 'score': -27.492507934570312}, {'hyp': "sure umm maybe if you don't have anything expensive how about something in the moderate price range", 'score': -27.75853729248047}, {'hyp': "sure ummm maybe if you don't have anything expensive how about something in the moderate price rang", 'score': -29.44410514831543}, {'hyp': "sure umm maybe if you don't have anything expensive how about something in the moderate price rang", 'score': -29.710134506225586}, {'hyp': "sure um maybe if you don't have anything expensive how about something in the moderate price range", 'score': -31.136560440063477}, {'hyp': "sure um maybe if you don't have anything expensive how about something in the moderate price rang", 'score': -33.088157653808594}, {'hyp': "sure ummm maybe i you don't have anything expensive how about something in the moderate price range", 'score': -36.127620697021484}, {'hyp': "sure umm maybe i you don't have anything expensive how about something in the moderate price range", 'score': -36.39365005493164}, {'hyp': "sure ummm maybe if yo don't have anything expensive how about something in the moderate price range", 'score': -36.43605041503906}, {'hyp': "sure umm maybe if yo don't have anything expensive how about something in the moderate price range", 'score': -36.70207977294922}]}, {'speaker': 'S', 'text': 'ok moderate lemme go ahead and check to see what i can find for moderate ok great i do have several options coming up how does the view lounge sound', 'nbest': []}, {'speaker': 'U', 'text': 'that sounds good ummm do they have any sort of happy hour special', 'nbest': [{'hyp': 'that sounds good ummm do they have any sort of happy hour special', 'score': -30.316478729248047}, {'hyp': 'that sounds good umm do they have any sort of happy hour special', 'score': -30.958009719848633}, {'hyp': 'that sounds good um do they have any sort of happy hour special', 'score': -34.463165283203125}, {'hyp': 'that sounds good ummm do they have any sirt of happy hour special', 'score': -34.48350143432617}, {'hyp': 'that sounds good umm do they have any sirt of happy hour special', 'score': -35.12503433227539}, {'hyp': 'that sounds good ummm do they have any sord of happy hour special', 'score': -35.61939239501953}, {'hyp': 'that sounds good umm do they have any sord of happy hour special', 'score': -36.26092529296875}, {'hyp': 'that sounds good ummm do they have any sont of happy hour special', 'score': -37.697105407714844}, {'hyp': 'that sounds good umm do they have any sont of happy hour special', 'score': -38.33863830566406}, {'hyp': 'that sounds good um do they have any sirt of happy hour special', 'score': -38.630191802978516}]}], 'knowledge': {'domain': 'restaurant', 'entity_name': 'The View Lounge', 'title': 'Does The View Lounge offer happy hour?', 'body': 'The View Lounge offers happy hour.'}, 'response': 'uhhh great question lemme go ahead and check that out for you ok fantastic so it looks like they do offer happy hour', 'source': 'sf_spoken', 'linearized_input': " hi uh i'm looking for restaurant in lower ha yeah definitely i can go ahead and help you with that ummm what kind of option in a restaurant are you looking for yeah umm am looking for an expensive restaurant lemme go ahead and see what i can find for you ok great so i do ummm actually no i'm sorry is there something else i can help you find i don't see anything expensive sure ummm maybe if you don't have anything expensive how about something in the moderate price range ok moderate lemme go ahead and check to see what i can find for moderate ok great i do have several options coming up how does the view lounge sound that sounds good ummm do they have any sort of happy hour special || knowledge domain: restaurant, entity: The View Lounge, title: Does The View Lounge offer happy hour?, information: The View Lounge offers happy hour.", 'target': 'uhhh great question lemme go ahead and check that out for you ok fantastic so it looks like they do offer happy hour', 'references': ['uhhh great question lemme go ahead and check that out for you ok fantastic so it looks like they do offer happy hour']}
Data Splitstrain: training set, val: validation set, test: test set
Splitting CriteriaThe track dataset originally only consists of a validation and test set in the spoken domain with noisy ASR transcripts. The training set is taken from the predecessor task DSTC9 Track 1 and contains written conversations.
This dataset can be used to evaluate conversational models on spoken inputs (using ASR hypotheses). In particular, we can evaluate the models’ ability to understand language by tracking the dialogue state, and their ability to generate knowledge-grounded responses.
Similar Datasetsyes
Unique Language Coverageno
Difference from other GEM datasetsThis dataset contains transcribed spoken interactions.
Ability that the Dataset measuresWe can measure the model’s ability to understand language and to generate knowledge-grounded responses.
no
Additional Splits?no
This dataset can be used to evaluate conversational models on spoken inputs (using ASR hypotheses). In particular, we can evaluate the models’ ability to generate knowledge-grounded responses.
MetricsOther: Other Metrics
Other MetricsBLEU-1, BLEU-2, BLEU-3, BLEU-4, METEOR, ROGUE-1, ROGUE-2, ROGUE-L
Previous results available?no
We want to explore how conversational models perform on spoken data.
Communicative GoalThis dataset aims to explore the robustness of conversational models when evaluated on spoken data. It has two aspects, multi-domain dialogue state tracking and conversation modeling with access to unstructured knowledge.
Sourced from Different Sourcesno
Other
Topics CoveredThe conversations revolve around 5 domains (or topics): hotels, restaurants, attractions, taxi, train.
Data Validationnot validated
Was Data Filtered?not filtered
none
Annotation Service?no
yes
no PII
Justification for no PIIThe subjects were instructed to conduct fictional conversations about booking restaurants or requesting fictional information.
no
no
no
unsure
There should be no risk related to PII as the subjects conduct fictional conversations.
open license - commercial use allowed
Copyright Restrictions on the Language Dataopen license - commercial use allowed