数据集:

vicgalle/alpaca-gpt4

中文

Dataset Card for "alpaca-gpt4"

This dataset contains English Instruction-Following generated by GPT-4 using Alpaca prompts for fine-tuning LLMs.

The dataset was originaly shared in this repository: https://github.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM . This is just a wraper for compatibility with huggingface's datasets library.

Dataset structure

It contains 52K instruction-following data generated by GPT-4 using the same prompts as in Alpaca. The dataset has the same format as Alpaca data, except the output is generated by GPT-4:

- `instruction`: `str`, describes the task the model should perform. Each of the 52K instructions is unique.
- `input`: `str`, optional context or input for the task. 
- `output`: `str`, the answer to the instruction as generated by `GPT-4`.
- `text`: `str`, all the previous fields concatenated together, plus the same prompt used in Alpaca at the beginnig.

Difference with the original Alpaca dataset

The original Alpaca dataset used text-davinci-003 to complete the prompts. This dataset uses those same prompts, but generating the completions with GPT-4. Thus, in general, the responses are of higher quality and lenght. Here is an example:

Example from Alpaca-GPT4:
{'instruction': 'Identify the odd one out.',
 'input': 'Twitter, Instagram, Telegram',
 'output': 'The odd one out is Telegram. Twitter and Instagram are social media platforms mainly for sharing information, images and videos while Telegram is a cloud-based instant messaging and voice-over-IP service.',
 'text': 'Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\nIdentify the odd one out.\n\n### Input:\nTwitter, Instagram, Telegram\n\n### Response:\nThe odd one out is Telegram. Twitter and Instagram are social media platforms mainly for sharing information, images and videos while Telegram is a cloud-based instant messaging and voice-over-IP service.'}
Same example from original Alpaca:
{'instruction': 'Identify the odd one out.',
 'input': 'Twitter, Instagram, Telegram',
 'output': 'Telegram',
 'text': 'Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\nIdentify the odd one out.\n\n### Input:\nTwitter, Instagram, Telegram\n\n### Response:\nTelegram'}

Licensing Information

The dataset is available under the Creative Commons NonCommercial (CC BY-NC 4.0) .