数据集:
3AAnthropic/hh-rlhf 3ADahoas/instruct-synthetic-prompt-responses 3Adatabricks/databricks-dolly-15k 3Aopenai/summarize_from_feedback 3Aopenai/webgpt_comparisons 3AOpenAssistant/oasst1 Anthropic/hh-rlhf Dahoas/instruct-synthetic-prompt-responses databricks/databricks-dolly-15k openai/summarize_from_feedback openai/webgpt_comparisons OpenAssistant/oasst1预印本库:
arxiv:2304.07327其他:
custom_code deberta-v2 electra gpt_neox gpt_neox_reward_model human-feedback opt RefinedWeb RefinedWebModel reward-model reward_model RLHF sft text-generation-inference大小:
100K<n<1M