数据集:

shibing624/alpaca-zh

中文

Dataset Card for "alpaca-zh"

本数据集是参考Alpaca方法基于GPT4得到的self-instruct数据,约5万条。

Dataset from https://github.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM

It is the chinese dataset from https://github.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM/blob/main/data/alpaca_gpt4_data_zh.json

Usage and License Notices

The data is intended and licensed for research use only. The dataset is CC BY NC 4.0 (allowing only non-commercial use) and models trained using the dataset should not be used outside of research purposes.

train model with alpaca-zh dataset: https://github.com/shibing624/textgen

English Dataset

Found here

Citation

@article{peng2023gpt4llm,
    title={Instruction Tuning with GPT-4},
    author={Baolin Peng, Chunyuan Li, Pengcheng He, Michel Galley, Jianfeng Gao},
    journal={arXiv preprint arXiv:2304.03277},
    year={2023}
}