数据集:
vicgalle/alpaca-gpt4
此数据集包含通过使用Alpaca提示进行GPT-4生成的英文指令跟随数据。
数据集最初在此存储库中共享: https://github.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM 。这只是为了与huggingface的datasets库兼容而进行的包装。
它包含由GPT-4生成的52K个指令跟踪数据,使用与Alpaca相同的提示。数据集的格式与Alpaca数据相同,唯一不同的是输出由GPT-4生成:
- `instruction`: `str`, describes the task the model should perform. Each of the 52K instructions is unique. - `input`: `str`, optional context or input for the task. - `output`: `str`, the answer to the instruction as generated by `GPT-4`. - `text`: `str`, all the previous fields concatenated together, plus the same prompt used in Alpaca at the beginnig.
原始的Alpaca数据集使用text-davinci-003来完成提示。这个数据集使用相同的提示,但是用GPT-4生成完成。因此,一般来说,回答的质量和长度更高。这是一个示例:
来自Alpaca-GPT4的示例:{'instruction': 'Identify the odd one out.', 'input': 'Twitter, Instagram, Telegram', 'output': 'The odd one out is Telegram. Twitter and Instagram are social media platforms mainly for sharing information, images and videos while Telegram is a cloud-based instant messaging and voice-over-IP service.', 'text': 'Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\nIdentify the odd one out.\n\n### Input:\nTwitter, Instagram, Telegram\n\n### Response:\nThe odd one out is Telegram. Twitter and Instagram are social media platforms mainly for sharing information, images and videos while Telegram is a cloud-based instant messaging and voice-over-IP service.'}原始Alpaca的相同示例:
{'instruction': 'Identify the odd one out.', 'input': 'Twitter, Instagram, Telegram', 'output': 'Telegram', 'text': 'Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\nIdentify the odd one out.\n\n### Input:\nTwitter, Instagram, Telegram\n\n### Response:\nTelegram'}