模型:
ITG/DialoGPT-medium-spanish-chitchat
This is a transformer-decoder GPT-2 model , adapted for the single-turn dialogue task in Spanish . We fine-tuned a DialoGPT-medium 345M parameter model from Microsoft, following the CLM (Causal Language Modelling) objective.
We used one of the datasets available in the Bot Framework Tools repository . We processed the professional-styled personality chat dataset in Spanish , the file is available to download here
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
CHAT_TURNS = 5
MAX_LENGTH = 1000
model = AutoModelForCausalLM.from_pretrained('ITG/DialoGPT-medium-spanish-chitchat')
tokenizer = AutoTokenizer.from_pretrained('ITG/DialoGPT-medium-spanish-chitchat')
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model.to(device)
for i in range(CHAT_TURNS):
user_input = input(f"Step - {i} >> user prompt ->")
with torch.no_grad():
# User turn, where "user_input" is the question (single-turn dialogue task)
user_inputs_ids = tokenizer.encode(user_input + tokenizer.eos_token, return_tensors="pt")
user_inputs_ids = user_inputs_ids.to(device)
# The chat history adds the generated tokens for the answer
chat_history = model.generate(user_inputs_ids, max_length=MAX_LENGTH, pad_token_id=tokenizer.eos_token_id)
# decode just the last generated output tokens from the model (do not include the user prompt again)
step_model_answer = tokenizer.decode(chat_history[:, user_inputs_ids.shape[-1]:][0], skip_special_tokens=True)
print(f"Step - {i} >> DialoGPT-spanish model answer -> {step_model_answer}")
| Role | Response |
|---|---|
| User | Hola, buenos dias |
| Bot | Buenos días |
| User | ¿Qué tal estás? |
| Bot | Fenomenal, gracias. |
| User | ¡Tengo hambre! |
| Bot | Quizá picar algo te ayude. |
| User | Estoy triste |
| Bot | Siento escuchar eso. |
| User | ¿Tienes personalidad? |
| Bot | Soy un bot. |
| User | ¡Cuenta un chiste! |
| Bot | En realidad no tengo tanta gracia. |
| User | Jajajajajajajaja |
| Bot | ¡Me alegro de que te rías! |
| Hyper-parameter | Value |
|---|---|
| Validation partition (%) | 20% |
| Training batch size | 8 |
| Learning rate | 5e-4 |
| Max training epochs | 20 |
| Warmup training steps (%) | 6% |
| Weight decay | 0.01 |
| Optimiser (beta1, beta2, epsilon) | AdamW (0.9, 0.999, 1e-08) |
| Monitoring metric (delta, patience) | Validation loss (0.1, 3) |
If you want to fine-tune your own dialogue model, we recommend you to start from the DialoGPT model . You can check the original GitHub repository .
Since our approach can assign a probability to any Unicode string, this allows us to evaluate our LMs on any dataset regardless of pre-processing, tokenization, or vocab size.