数据集:
allenai/prosocial-dialog
ProsocialDialog is the first large-scale multi-turn English dialogue dataset to teach conversational agents to respond to problematic content following social norms. Covering diverse unethical, problematic, biased, and toxic situations, ProsocialDialog contains responses that encourage prosocial behavior, grounded in commonsense social rules (i.e., rules-of-thumb, RoTs). Created via a human-AI collaborative framework, ProsocialDialog consists of 58K dialogues, with 331K utterances, 160K unique RoTs, and 497K dialogue safety labels accompanied by free-form rationales.
English
attribute | type | description |
---|---|---|
context | str | the potentially unsafe utterance |
response | str | the guiding utterance grounded on rules-of-thumb ( rots ) |
rots | list of str|null | the relevant rules-of-thumb for text not labeled as __casual__ |
safety_label | str | the final verdict of the context according to safety_annotations : {__casual__, __possibly_needs_caution__, __probably_needs_caution__, __needs_caution__, __needs_intervention__} |
safety_annotations | list of str | raw annotations from three workers: {casual, needs caution, needs intervention} |
safety_annotation_reasons | list of str | the reasons behind the safety annotations in free-form text from each worker |
source | str | the source of the seed text that was used to craft the first utterance of the dialogue: {socialchemistry, sbic, ethics_amt, ethics_reddit} |
etc | str|null | other information |
dialogue_id | int | the dialogue index |
response_id | int | the response index |
episode_done | bool | an indicator of whether it is the end of the dialogue |
To create ProsocialDialog, we set up a human-AI collaborative data creation framework, where GPT-3 generates the potentially unsafe utterances, and crowdworkers provide prosocial responses to them. This approach allows us to circumvent two substantial challenges: (1) there are no available large-scale corpora of multiturn prosocial conversations between humans, and (2) asking humans to write unethical, toxic, or problematic utterances could result in psychological harms (Roberts, 2017; Steiger et al., 2021).
Please refer to our paper .
Please cite our work if you found the resources in this repository useful:
@inproceedings{kim2022prosocialdialog, title={ProsocialDialog: A Prosocial Backbone for Conversational Agents}, author={Hyunwoo Kim and Youngjae Yu and Liwei Jiang and Ximing Lu and Daniel Khashabi and Gunhee Kim and Yejin Choi and Maarten Sap}, booktitle={EMNLP}, year=2022 }