数据集:
shibing624/alpaca-zh
本数据集是参考Alpaca方法基于GPT4得到的self-instruct数据,约5万条。
Dataset from https://github.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM
It is the chinese dataset from https://github.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM/blob/main/data/alpaca_gpt4_data_zh.json
The data is intended and licensed for research use only. The dataset is CC BY NC 4.0 (allowing only non-commercial use) and models trained using the dataset should not be used outside of research purposes.
train model with alpaca-zh dataset: https://github.com/shibing624/textgen
@article{peng2023gpt4llm, title={Instruction Tuning with GPT-4}, author={Baolin Peng, Chunyuan Li, Pengcheng He, Michel Galley, Jianfeng Gao}, journal={arXiv preprint arXiv:2304.03277}, year={2023} }