Dataset Card for [GPT4All-J Prompt Generations]
Dataset Description
Dataset used to train
GPT4All-J
and
GPT4All-J-LoRA
We release several versions of datasets
-
v1.0:
The original dataset we used to finetune GPT-J on
-
v1.1-breezy
: A filtered dataset where we removed all instances of
AI language model
-
v1.2-jazzy
: A filtered dataset where we also removed instances like
I'm sorry, I can't answer...
and
AI language model
-
v1.3-groovy
: The v1.2 dataset with ShareGPT and Dolly added with ~8% of semantic duplicates removed from the dataset using
Atlas
The dataset defaults to
main
which is
v1.0
. To download a specific version, you can pass an argument to the keyword
revision
in
load_dataset
:
from datasets import load_dataset
jazzy = load_dataset("nomic-ai/gpt4all-j-prompt-generations", revision='v1.2-jazzy')