模型:
nlpcloud/instruct-gpt-j-fp16
This model demonstrates that GPT-J can work perfectly well as an "instruct" model when properly fine-tuned. It is an fp16 version that makes it easy to deploy the model on entry level GPU like an NVIDIA Tesla T4. Want to know more about NLP Cloud? Have a look at our platform here .
We fine-tuned GPT-J on an instruction dataset created by the Stanford Alpaca team . You can find the original dataset here .
The dataset was slightly reworked in order to match the GPT-J fine-tuning format with Mesh Transformer Jax on TPUs. Here is the final dataset we used .
The base GPT-J model needs few-shot learning in order to properly understand what you want. See more details here about how to properly use few-shot learning . For example let's say that you want to correct spelling with GPT-J. Here is an example of a prompt you had to use:
I love goin to the beach. Correction: I love going to the beach. ### Let me hav it! Correction: Let me have it! ### It have too many drawbacks. Correction: It has too many drawbacks. ### I do not wan to go Correction:
Now, with Instruct GPT-J, you can ask things in natural language "like a human":
Correct spelling and grammar from the following text. I do not wan to go\n
Which returns the following:
I do not want to go.
You can also perfectly keep using few-shot learning on this model for very advanced use cases.
Using the model in fp16 with the text generation pipeline, here is what you can do:
from transformers import pipeline import torch generator = pipeline(model="nlpcloud/instruct-gpt-j-fp16", torch_dtype=torch.float16, device=0) prompt = "Correct spelling and grammar from the following text.\nI do not wan to go\n" print(generator(prompt))
You can also use the generate() function. Here is what you can do:
from transformers import AutoTokenizer, AutoModelForCausalLM import torch tokenizer = AutoTokenizer.from_pretrained('nlpcloud/instruct-gpt-j-fp16') generator = AutoModelForCausalLM.from_pretrained("nlpcloud/instruct-gpt-j-fp16",torch_dtype=torch.float16).cuda() prompt = "Correct spelling and grammar from the following text.\nI do not wan to go\n" inputs = tokenizer(prompt, return_tensors='pt') outputs = generator.generate(inputs.input_ids.cuda()) print(tokenizer.decode(outputs[0]))
Due to the way this model was fine-tuned, you should always use new lines at the end of your instructions.
For example the following instruction might not always work:
Correct spelling and grammar from the following text.\nI do not wan to go
But this one would:
Correct spelling and grammar from the following text.\nI do not wan to go\n
This model is an fp16 version of our fine-tuned model, which works very well on a GPU with 16GB of VRAM like an NVIDIA Tesla T4.
We did not notice any difference between the fp32 and fp16 versions in terms of quality.