数据集:
craigslist_bargains
子任务:
dialogue-modeling语言:
en计算机处理:
monolingual大小:
1K<n<10K语言创建人:
crowdsourced批注创建人:
machine-generated源数据集:
original预印本库:
arxiv:1808.09637许可:
license:unknownWe study negotiation dialogues where two agents, a buyer and a seller, negotiate over the price of an time for sale. We collected a dataset of more than 6K negotiation dialogues over multiple categories of products scraped from Craigslist. Our goal is to develop an agent that negotiates with humans through such conversations. The challenge is to handle both the negotiation strategy and the rich language for bargaining. To this end, we develop a modular framework which separates strategy learning from language generation. Specifically, we learn strategies in a coarse dialogue act space and instantiate that into utterances conditioned on dialogue history.
This dataset is English
{ 'agent_info': { 'Bottomline': [ 'None', 'None' ], 'Role': [ 'buyer', 'seller' ], 'Target': [ 7.0, 10.0 ] }, 'agent_turn': [ 0, 1, ... ], 'dialogue_acts': { 'intent': [ 'init-price', 'unknown', ... ], 'price': [ 5.0, -1.0, ... ] }, 'items': { 'Category': [ 'phone', 'phone' ], 'Description': [ 'Charge two devices simultaneously on the go..., ... ], 'Images': [ 'phone/6149527852_0.jpg', 'phone/6149527852_0.jpg' ], 'Price': [ 10.0, 10.0 ], 'Title': [ 'Verizon Car Charger with Dual Output Micro USB and ...', ... ] }, 'utterance': [ 'Hi, not sure if the charger would work for my car...' 'It will work...', ... ] }
This dataset contains three splits, train , validation and test . Note that test is not provided with dialogue_acts information as described above. To ensure schema consistency across dataset splits, the dialogue_acts field in the test split is populated with the default values: {"price": -1.0, "intent": ""}
The counts of examples in each split are as follows:
| | Train | Valid | Test | | Input Examples | 5247 | 597 | 838 | | Average Dialogue Length | 9.14 | 9.17 | 9.24 |
Note that
From the source paper for this dataset:
To generate the negotiation scenarios, we scraped postings on sfbay.craigslist.org from the 6 most popular categories (housing, furniture, cars, bikes, phones, and electronics). Each posting produces three scenarios with the buyer’s target prices at 0.5x, 0.7x and 0.9x of the listing price. Statistics of the scenarios are shown in Table 2. We collected 6682 human-human dialogues on AMT using the interface shown in Appendix A Figure 2. The dataset statistics in Table 3 show that CRAIGSLISTBARGAIN has longer dialogues and more diverse utterances compared to prior datasets. Furthermore, workers were encouraged to embellish the item and negotiate side offers such as free delivery or pick-up. This highly relatable scenario leads to richer dialogues such as the one shown in Table 1. We also observed various persuasion techniques listed in Table 4 such as embellishment,
See Dataset Creation
See Dataset Creation
Initial Data Collection and NormalizationSee Dataset Creation
Who are the source language producers?See Dataset Creation
If the dataset contains annotations which are not part of the initial data collection, describe them in the following paragraphs.
Annotation processAnnotations for the dialogue_acts in train and test were generated via a rules-based system which can be found in this script
Who are the annotators?[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
He He and Derek Chen and Anusha Balakrishnan and Percy Liang Computer Science Department, Stanford University {hehe,derekchen14,anusha,pliang}@cs.stanford.edu
The work through which this data was produced was supported by DARPA Communicating with Computers (CwC) program under ARO prime contract no. W911NF15-1-0462
[More Information Needed]
@misc{he2018decoupling, title={Decoupling Strategy and Generation in Negotiation Dialogues}, author={He He and Derek Chen and Anusha Balakrishnan and Percy Liang}, year={2018}, eprint={1808.09637}, archivePrefix={arXiv}, primaryClass={cs.CL} }
Thanks to @ZacharySBrown for adding this dataset.