数据集:
coached_conv_pref
语言:
en计算机处理:
monolingual大小:
n<1K语言创建人:
found批注创建人:
expert-generated源数据集:
original许可:
cc-by-sa-4.0A dataset consisting of 502 English dialogs with 12,000 annotated utterances between a user and an assistant discussing movie preferences in natural language. It was collected using a Wizard-of-Oz methodology between two paid crowd-workers, where one worker plays the role of an 'assistant', while the other plays the role of a 'user'. The 'assistant' elicits the 'user’s' preferences about movies following a Coached Conversational Preference Elicitation (CCPE) method. The assistant asks questions designed to minimize the bias in the terminology the 'user' employs to convey his or her preferences as much as possible, and to obtain these preferences in natural language. Each dialog is annotated with entity mentions, preferences expressed about entities, descriptions of entities provided, and other statements of entities.
The text in the dataset is in English. The associated BCP-47 code is en .
A typical data point comprises of a series of utterances between the 'assistant' and the 'user'. Each such utterance is annotated into categories mentioned in data fields.
An example from the Coached Conversational Preference Elicitation dataset looks as follows:
{'conversationId': 'CCPE-6faee', 'utterances': {'index': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15], 'segments': [{'annotations': [{'annotationType': [], 'entityType': []}], 'endIndex': [0], 'startIndex': [0], 'text': ['']}, {'annotations': [{'annotationType': [0], 'entityType': [0]}, {'annotationType': [1], 'entityType': [0]}], 'endIndex': [20, 27], 'startIndex': [14, 0], 'text': ['comedy', 'I really like comedy movies']}, {'annotations': [{'annotationType': [0], 'entityType': [0]}], 'endIndex': [24], 'startIndex': [16], 'text': ['comedies']}, {'annotations': [{'annotationType': [1], 'entityType': [0]}], 'endIndex': [15], 'startIndex': [0], 'text': ['I love to laugh']}, {'annotations': [{'annotationType': [], 'entityType': []}], 'endIndex': [0], 'startIndex': [0], 'text': ['']}, {'annotations': [{'annotationType': [0], 'entityType': [1]}, {'annotationType': [1], 'entityType': [1]}], 'endIndex': [21, 21], 'startIndex': [8, 0], 'text': ['Step Brothers', 'I liked Step Brothers']}, {'annotations': [{'annotationType': [], 'entityType': []}], 'endIndex': [0], 'startIndex': [0], 'text': ['']}, {'annotations': [{'annotationType': [1], 'entityType': [1]}], 'endIndex': [32], 'startIndex': [0], 'text': ['Had some amazing one-liners that']}, {'annotations': [{'annotationType': [], 'entityType': []}], 'endIndex': [0], 'startIndex': [0], 'text': ['']}, {'annotations': [{'annotationType': [0], 'entityType': [1]}, {'annotationType': [1], 'entityType': [1]}], 'endIndex': [15, 15], 'startIndex': [13, 0], 'text': ['RV', "I don't like RV"]}, {'annotations': [{'annotationType': [], 'entityType': []}], 'endIndex': [0], 'startIndex': [0], 'text': ['']}, {'annotations': [{'annotationType': [1], 'entityType': [1]}, {'annotationType': [1], 'entityType': [1]}], 'endIndex': [48, 66], 'startIndex': [18, 50], 'text': ['It was just so slow and boring', "I didn't like it"]}, {'annotations': [{'annotationType': [0], 'entityType': [1]}], 'endIndex': [63], 'startIndex': [33], 'text': ['Jurassic World: Fallen Kingdom']}, {'annotations': [{'annotationType': [0], 'entityType': [1]}, {'annotationType': [3], 'entityType': [1]}], 'endIndex': [52, 52], 'startIndex': [22, 0], 'text': ['Jurassic World: Fallen Kingdom', 'I have seen the movie Jurassic World: Fallen Kingdom']}, {'annotations': [{'annotationType': [], 'entityType': []}], 'endIndex': [0], 'startIndex': [0], 'text': ['']}, {'annotations': [{'annotationType': [1], 'entityType': [1]}, {'annotationType': [1], 'entityType': [1]}, {'annotationType': [1], 'entityType': [1]}], 'endIndex': [24, 125, 161], 'startIndex': [0, 95, 135], 'text': ['I really like the actors', 'I just really like the scenery', 'the dinosaurs were awesome']}], 'speaker': [1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0], 'text': ['What kinds of movies do you like?', 'I really like comedy movies.', 'Why do you like comedies?', "I love to laugh and comedy movies, that's their whole purpose. Make you laugh.", 'Alright, how about a movie you liked?', 'I liked Step Brothers.', 'Why did you like that movie?', 'Had some amazing one-liners that still get used today even though the movie was made awhile ago.', 'Well, is there a movie you did not like?', "I don't like RV.", 'Why not?', "And I just didn't It was just so slow and boring. I didn't like it.", 'Ok, then have you seen the movie Jurassic World: Fallen Kingdom', 'I have seen the movie Jurassic World: Fallen Kingdom.', 'What is it about these kinds of movies that you like or dislike?', 'I really like the actors. I feel like they were doing their best to make the movie better. And I just really like the scenery, and the the dinosaurs were awesome.']}}
Each conversation has the following fields:
Each utterance has the following fields:
Each semantic annotation segment has the following fields:
Each annotation has two fields:
EXPLANATION OF ONTOLOGY
In the corpus, preferences and the entities that these preferences refer to are annotated with an annotation type as well as an entity type.
Annotation types fall into four categories:
Entity types are marked as belonging to one of four categories:
There is a single split of the dataset named 'train' which contains the whole datset.
Train | |
---|---|
Input Conversations | 502 |
[More Information Needed]
[More Information Needed]
Who are the source language producers?[More Information Needed]
[More Information Needed]
Who are the annotators?[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
Creative Commons Attribution 4.0 License
@inproceedings{radlinski-etal-2019-ccpe, title = {Coached Conversational Preference Elicitation: A Case Study in Understanding Movie Preferences}, author = {Filip Radlinski and Krisztian Balog and Bill Byrne and Karthik Krishnamoorthi}, booktitle = {Proceedings of the Annual Meeting of the Special Interest Group on Discourse and Dialogue ({SIGDIAL})}, year = 2019 }
Thanks to @vineeths96 for adding this dataset.