TRL Model

This is a TRL language model that has been fine-tuned with reinforcement learning to guide the model outputs according to a value, function, or human feedback. The model can be used for text generation.

Training logs

The training logs can be found here

Usage

To use this model for inference, first install the TRL library:

python -m pip install trl

You can then generate text as follows:

from transformers import pipeline

generator = pipeline("text-generation", model="ybelkada//var/tmp/tmppugfzd45/ybelkada/gpt-neo-125m-detoxified-small-context")
outputs = generator("Hello, my llama is cute")

If you want to use the model for training or to obtain the outputs from the value head, load the model as follows:

from transformers import AutoTokenizer
from trl import AutoModelForCausalLMWithValueHead

tokenizer = AutoTokenizer.from_pretrained("ybelkada//var/tmp/tmppugfzd45/ybelkada/gpt-neo-125m-detoxified-small-context")
model = AutoModelForCausalLMWithValueHead.from_pretrained("ybelkada//var/tmp/tmppugfzd45/ybelkada/gpt-neo-125m-detoxified-small-context")

inputs = tokenizer("Hello, my llama is cute", return_tensors="pt")
outputs = model(**inputs, labels=inputs["input_ids"])

作者:

Younes Belkada

数据集大小:

528.87 MB