数据集:
taskmaster1
子任务:
dialogue-modeling语言:
en计算机处理:
monolingual大小:
1K<n<10K语言创建人:
crowdsourced批注创建人:
crowdsourced源数据集:
original预印本库:
arxiv:1909.05358许可:
cc-by-4.0Taskmaster-1 is a goal-oriented conversational dataset. It includes 13,215 task-based dialogs comprising six domains. Two procedures were used to create this collection, each with unique advantages. The first involves a two-person, spoken "Wizard of Oz" (WOz) approach in which trained agents and crowdsourced workers interact to complete the task while the second is "self-dialog" in which crowdsourced workers write the entire dialog themselves.
[More Information Needed]
The dataset is in English language.
A typical example looks like this
{ "conversation_id":"dlg-336c8165-068e-4b4b-803d-18ef0676f668", "instruction_id":"restaurant-table-2", "utterances":[ { "index":0, "segments":[ ], "speaker":"USER", "text":"Hi, I'm looking for a place that sells spicy wet hotdogs, can you think of any?" }, { "index":1, "segments":[ { "annotations":[ { "name":"restaurant_reservation.name.restaurant.reject" } ], "end_index":37, "start_index":16, "text":"Spicy Wet Hotdogs LLC" } ], "speaker":"ASSISTANT", "text":"You might enjoy Spicy Wet Hotdogs LLC." }, { "index":2, "segments":[ ], "speaker":"USER", "text":"That sounds really good, can you make me a reservation?" }, { "index":3, "segments":[ ], "speaker":"ASSISTANT", "text":"Certainly, when would you like a reservation?" }, { "index":4, "segments":[ { "annotations":[ { "name":"restaurant_reservation.num.guests" }, { "name":"restaurant_reservation.num.guests" } ], "end_index":20, "start_index":18, "text":"50" } ], "speaker":"USER", "text":"I have a party of 50 who want a really sloppy dog on Saturday at noon." } ] }
Each conversation in the data file has the following structure:
Each utterance has the following fields:
Each segment has the following fields:
Each annotation has a single field:
The data in one_person_dialogs config is split into train , dev and test splits.
train | validation | test | |
---|---|---|---|
N. Instances | 6168 | 770 | 770 |
The data in woz_dialogs config has no default splits.
train | |
---|---|
N. Instances | 5507 |
[More Information Needed]
[More Information Needed]
Initial Data Collection and Normalization[More Information Needed]
Who are the source language producers?[More Information Needed]
[More Information Needed]
Annotation process[More Information Needed]
Who are the annotators?[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
The dataset is licensed under Creative Commons Attribution 4.0 License
[More Information Needed]
@inproceedings{48484, title = {Taskmaster-1: Toward a Realistic and Diverse Dialog Dataset}, author = {Bill Byrne and Karthik Krishnamoorthi and Chinnadhurai Sankar and Arvind Neelakantan and Daniel Duckworth and Semih Yavuz and Ben Goodrich and Amit Dubey and Kyu-Young Kim and Andy Cedilnik}, year = {2019} }
Thanks to @patil-suraj for adding this dataset.