数据集:
HuggingFaceH4/instruct_me
Instruct Me is a dataset of prompts and instruction dialogues between a human user and AI assistant. The prompts are derived from (prompt, completion) pairs in the Helpful Instructions dataset . The goal is to train a language model to that is "chatty" and can answer the kind of questions or tasks a human user might instruct an AI assistant to perform.
We provide 3 configs that can be used for training RLHF models:
instruction_tuningSingle-turn user/bot dialogues for instruction tuning.
reward_modelingPrompts to generate model completions and collect human preference data
ppoPrompts to generate model completions for optimization of the instruction-tuned model with techniques like PPO.