数据集:

nomic-ai/gpt4all-j-prompt-generations

语言:

en

大小:

100K<n<1M

许可:

apache-2.0
中文

Dataset Card for [GPT4All-J Prompt Generations]

Dataset Description

Dataset used to train GPT4All-J and GPT4All-J-LoRA

We release several versions of datasets

  • v1.0: The original dataset we used to finetune GPT-J on
  • v1.1-breezy : A filtered dataset where we removed all instances of AI language model
  • v1.2-jazzy : A filtered dataset where we also removed instances like I'm sorry, I can't answer... and AI language model
  • v1.3-groovy : The v1.2 dataset with ShareGPT and Dolly added with ~8% of semantic duplicates removed from the dataset using Atlas

The dataset defaults to main which is v1.0 . To download a specific version, you can pass an argument to the keyword revision in load_dataset :

from datasets import load_dataset

jazzy = load_dataset("nomic-ai/gpt4all-j-prompt-generations", revision='v1.2-jazzy')