数据集:

GEM/dstc10_track2_task2

任务:

对话

语言:

计算机处理:

unknown

大小:

size_categories:unknown

语言创建人:

unknown

批注创建人:

none

源数据集:

original

其他:

dialog-response-generation

许可:

apache-2.0

数据集介绍文件清单

中文

Dataset Card for GEM/dstc10_track2_task2

Link to Main Data Card

You can find the main data card on the GEM Website .

Dataset Summary

The DSTC10 Track2 Task 2 follows the DSTC9 Track1 task, where participants have to implement knowledge-grounded dialog systems. The training dataset is inherited from the DSTC9 challenge and is in the written domain, while the test set is newly collected and consists of noisy ASR transcripts. Hence, the dataset facilitates building models for grounded dialog response generation.

You can load the dataset via:

import datasets
data = datasets.load_dataset('GEM/dstc10_track2_task2')

The data loader can be found here .

website

https://github.com/alexa/alexa-with-dstc10-track2-dataset

paper

https://assets.amazon.science/54/a1/5282d47044179737b4289622c824/how-robust-are-you-evaluating-task-oriented-dialogue-systems-on-spoken-conversations.pdf

authors

Seokhwan Kim, Yang Liu, Di Jin, Alexandros Papangelis, Karthik Gopalakrishnan, Behnam Hedayatnia, Dilek Hakkani-Tur (Amazon Alexa AI)

Dataset Overview

Where to find the Data and its Documentation

Webpage

https://github.com/alexa/alexa-with-dstc10-track2-dataset

Download

https://github.com/alexa/alexa-with-dstc10-track2-dataset

Paper

https://assets.amazon.science/54/a1/5282d47044179737b4289622c824/how-robust-are-you-evaluating-task-oriented-dialogue-systems-on-spoken-conversations.pdf

BibTex

@inproceedings{kim2021robust, title={" How Robust ru?": Evaluating Task-Oriented Dialogue Systems on Spoken Conversations}, author={Kim, Seokhwan and Liu, Yang and Jin, Di and Papangelis, Alexandros and Gopalakrishnan, Karthik and Hedayatnia, Behnam and Hakkani-Tur, Dilek}, journal={IEEE Automatic Speech Recognition and Understanding Workshop}, year={2021} }

Contact Name

Seokhwan Kim

Contact Email

seokhwk@amazon.com

Has a Leaderboard?

yes

Leaderboard Link

https://eval.ai/challenge/1663/overview

Leaderboard Details

It evaluates the models based on the automatic metrics defined in the task paper for the three tasks of detection, selection and generation.

Languages and Intended Use

Multilingual?

Covered Languages

License

apache-2.0: Apache License 2.0

Intended Use

To conduct research on dialogue state tracking and knowledge-grounded response generation.

Primary Task

Dialog Response Generation

Communicative Goal

This dataset aims to explore the robustness of conversational models when trained on spoken data. It has two aspects, multi-domain dialogue state tracking and conversation modeling with access to unstructured knowledge.

Credit

Curation Organization Type(s)

industry

Curation Organization(s)

Amazon

Dataset Creators

Seokhwan Kim, Yang Liu, Di Jin, Alexandros Papangelis, Karthik Gopalakrishnan, Behnam Hedayatnia, Dilek Hakkani-Tur (Amazon Alexa AI)

Funding

Amazon

Who added the Dataset to GEM?

Alexandros Papangelis (Amazon Alexa AI), Di Jin (Amazon Alexa AI), Nico Daheim (RWTH Aachen University)

Dataset Structure

Data Fields

   features = datasets.Features(
        {
            "id": datasets.Value("string"),
            "gem_id": datasets.Value("string"),
            "turns": [
                {
                    "speaker": datasets.Value("string"),
                    "text": datasets.Value("string"),
                    "nbest": [
                        {
                            "hyp": datasets.Value("string"),
                            "score": datasets.Value("float"),
                        }
                    ],
                }
            ],
            "knowledge": {
                "domain": datasets.Value("string"),
                "entity_name": datasets.Value("string"),
                "title": datasets.Value("string"),
                "body": datasets.Value("string"),
            },
            "response": datasets.Value("string"),
            "source": datasets.Value("string"),
            "linearized_input": datasets.Value("string"),
            "target": datasets.Value("string"),
            "references": [datasets.Value("string")],
        }
    )

nbest contains an nbest list of outputs generated by an ASR system along with their scores.

knowledge defines the annotated grounding as well as its metadata

Reason for Structure

It was kept compatible with MultiWox 2.X data.

Example Instance

{'id': '0', 'gem_id': 'GEM-dstc10_track2_task2-test-0', 'turns': [{'speaker': 'U', 'text': "hi uh i'm looking for restaurant in lower ha", 'nbest': [{'hyp': "hi uh i'm looking for restaurant in lower ha", 'score': -25.625450134277344}, {'hyp': "hi uh i'm looking for restaurant in lower hai", 'score': -25.969446182250977}, {'hyp': "hi uh i'm looking for restaurant in lower haig", 'score': -32.816890716552734}, {'hyp': "hi uh i'm looking for restaurant in lower haigh", 'score': -32.84316635131836}, {'hyp': "hi uh i'm looking for restaurant in lower hag", 'score': -32.8637580871582}, {'hyp': "hi uh i'm looking for restaurant in lower hah", 'score': -33.1048698425293}, {'hyp': "hi uh i'm looking for restaurant in lower hait", 'score': -33.96509552001953}, {'hyp': "hi um i'm looking for restaurant in lower hai", 'score': -33.97885513305664}, {'hyp': "hi um i'm looking for restaurant in lower haig", 'score': -34.56083679199219}, {'hyp': "hi um i'm looking for restaurant in lower haigh", 'score': -34.58711242675781}]}, {'speaker': 'S', 'text': 'yeah definitely i can go ahead and help you with that ummm what kind of option in a restaurant are you looking for', 'nbest': []}, {'speaker': 'U', 'text': 'yeah umm am looking for an expensive restaurant', 'nbest': [{'hyp': 'yeah umm am looking for an expensive restaurant', 'score': -21.272899627685547}, {'hyp': 'yeah umm m looking for an expensive restaurant', 'score': -21.444047927856445}, {'hyp': 'yeah umm a m looking for an expensive restaurant', 'score': -21.565458297729492}, {'hyp': 'yeah ummm am looking for an expensive restaurant', 'score': -21.68832778930664}, {'hyp': 'yeah ummm m looking for an expensive restaurant', 'score': -21.85947608947754}, {'hyp': 'yeah ummm a m looking for an expensive restaurant', 'score': -21.980886459350586}, {'hyp': "yeah umm a'm looking for an expensive restaurant", 'score': -22.613924026489258}, {'hyp': "yeah ummm a'm looking for an expensive restaurant", 'score': -23.02935218811035}, {'hyp': 'yeah um am looking for an expensive restaurant', 'score': -23.11180305480957}, {'hyp': 'yeah um m looking for an expensive restaurant', 'score': -23.28295135498047}]}, {'speaker': 'S', 'text': "lemme go ahead and see what i can find for you ok great so i do ummm actually no i'm sorry is there something else i can help you find i don't see anything expensive", 'nbest': []}, {'speaker': 'U', 'text': "sure ummm maybe if you don't have anything expensive how about something in the moderate price range", 'nbest': [{'hyp': "sure ummm maybe if you don't have anything expensive how about something in the moderate price range", 'score': -27.492507934570312}, {'hyp': "sure umm maybe if you don't have anything expensive how about something in the moderate price range", 'score': -27.75853729248047}, {'hyp': "sure ummm maybe if you don't have anything expensive how about something in the moderate price rang", 'score': -29.44410514831543}, {'hyp': "sure umm maybe if you don't have anything expensive how about something in the moderate price rang", 'score': -29.710134506225586}, {'hyp': "sure um maybe if you don't have anything expensive how about something in the moderate price range", 'score': -31.136560440063477}, {'hyp': "sure um maybe if you don't have anything expensive how about something in the moderate price rang", 'score': -33.088157653808594}, {'hyp': "sure ummm maybe i you don't have anything expensive how about something in the moderate price range", 'score': -36.127620697021484}, {'hyp': "sure umm maybe i you don't have anything expensive how about something in the moderate price range", 'score': -36.39365005493164}, {'hyp': "sure ummm maybe if yo don't have anything expensive how about something in the moderate price range", 'score': -36.43605041503906}, {'hyp': "sure umm maybe if yo don't have anything expensive how about something in the moderate price range", 'score': -36.70207977294922}]}, {'speaker': 'S', 'text': 'ok moderate lemme go ahead and check to see what i can find for moderate ok great i do have several options coming up how does the view lounge sound', 'nbest': []}, {'speaker': 'U', 'text': 'that sounds good ummm do they have any sort of happy hour special', 'nbest': [{'hyp': 'that sounds good ummm do they have any sort of happy hour special', 'score': -30.316478729248047}, {'hyp': 'that sounds good umm do they have any sort of happy hour special', 'score': -30.958009719848633}, {'hyp': 'that sounds good um do they have any sort of happy hour special', 'score': -34.463165283203125}, {'hyp': 'that sounds good ummm do they have any sirt of happy hour special', 'score': -34.48350143432617}, {'hyp': 'that sounds good umm do they have any sirt of happy hour special', 'score': -35.12503433227539}, {'hyp': 'that sounds good ummm do they have any sord of happy hour special', 'score': -35.61939239501953}, {'hyp': 'that sounds good umm do they have any sord of happy hour special', 'score': -36.26092529296875}, {'hyp': 'that sounds good ummm do they have any sont of happy hour special', 'score': -37.697105407714844}, {'hyp': 'that sounds good umm do they have any sont of happy hour special', 'score': -38.33863830566406}, {'hyp': 'that sounds good um do they have any sirt of happy hour special', 'score': -38.630191802978516}]}], 'knowledge': {'domain': 'restaurant', 'entity_name': 'The View Lounge', 'title': 'Does The View Lounge offer happy hour?', 'body': 'The View Lounge offers happy hour.'}, 'response': 'uhhh great question lemme go ahead and check that out for you ok fantastic so it looks like they do offer happy hour', 'source': 'sf_spoken', 'linearized_input': " hi uh i'm looking for restaurant in lower ha yeah definitely i can go ahead and help you with that ummm what kind of option in a restaurant are you looking for yeah umm am looking for an expensive restaurant lemme go ahead and see what i can find for you ok great so i do ummm actually no i'm sorry is there something else i can help you find i don't see anything expensive sure ummm maybe if you don't have anything expensive how about something in the moderate price range ok moderate lemme go ahead and check to see what i can find for moderate ok great i do have several options coming up how does the view lounge sound that sounds good ummm do they have any sort of happy hour special || knowledge domain: restaurant, entity: The View Lounge, title: Does The View Lounge offer happy hour?, information: The View Lounge offers happy hour.", 'target': 'uhhh great question lemme go ahead and check that out for you ok fantastic so it looks like they do offer happy hour', 'references': ['uhhh great question lemme go ahead and check that out for you ok fantastic so it looks like they do offer happy hour']}

Data Splits

train: training set, val: validation set, test: test set

Splitting Criteria

The track dataset originally only consists of a validation and test set in the spoken domain with noisy ASR transcripts. The training set is taken from the predecessor task DSTC9 Track 1 and contains written conversations.

Dataset in GEM

Rationale for Inclusion in GEM

Why is the Dataset in GEM?

This dataset can be used to evaluate conversational models on spoken inputs (using ASR hypotheses). In particular, we can evaluate the models’ ability to understand language by tracking the dialogue state, and their ability to generate knowledge-grounded responses.

Similar Datasets

yes

Unique Language Coverage

Difference from other GEM datasets

This dataset contains transcribed spoken interactions.

Ability that the Dataset measures

We can measure the model’s ability to understand language and to generate knowledge-grounded responses.

GEM-Specific Curation

Modificatied for GEM?

Additional Splits?

Getting Started with the Task

Previous Results

Measured Model Abilities

This dataset can be used to evaluate conversational models on spoken inputs (using ASR hypotheses). In particular, we can evaluate the models’ ability to generate knowledge-grounded responses.

Metrics

Other: Other Metrics

Other Metrics

BLEU-1, BLEU-2, BLEU-3, BLEU-4, METEOR, ROGUE-1, ROGUE-2, ROGUE-L

Previous results available?

Dataset Curation

Original Curation

Original Curation Rationale

We want to explore how conversational models perform on spoken data.

Communicative Goal

This dataset aims to explore the robustness of conversational models when evaluated on spoken data. It has two aspects, multi-domain dialogue state tracking and conversation modeling with access to unstructured knowledge.

Sourced from Different Sources

Language Data

How was Language Data Obtained?

Other

Topics Covered

The conversations revolve around 5 domains (or topics): hotels, restaurants, attractions, taxi, train.

Data Validation

not validated

Was Data Filtered?

not filtered

Structured Annotations

Additional Annotations?

none

Annotation Service?

Consent

Any Consent Policy?

yes

Private Identifying Information (PII)

Contains PII?

no PII

Justification for no PII

The subjects were instructed to conduct fictional conversations about booking restaurants or requesting fictional information.

Maintenance

Any Maintenance Plan?

Broader Social Context

Previous Work on the Social Impact of the Dataset

Usage of Models based on the Data

Impact on Under-Served Communities

Addresses needs of underserved Communities?

Discussion of Biases

Any Documented Social Biases?

unsure

Considerations for Using the Data

PII Risks and Liability

Potential PII Risk

There should be no risk related to PII as the subjects conduct fictional conversations.

Licenses

open license - commercial use allowed

Known Technical Limitations

作者:

GEM

数据集大小:

49.26 KB