模型:
s-nlp/roberta_toxicity_classifier
This model is trained for toxicity classification task. The dataset used for training is the merge of the English parts of the three datasets by Jigsaw ( Jigsaw 2018 , Jigsaw 2019 , Jigsaw 2020 ), containing around 2 million examples. We split it into two parts and fine-tune a RoBERTa model ( RoBERTa: A Robustly Optimized BERT Pretraining Approach ) on it. The classifiers perform closely on the test set of the first Jigsaw competition, reaching the AUC-ROC of 0.98 and F1-score of 0.76.
from transformers import RobertaTokenizer, RobertaForSequenceClassification
# load tokenizer and model weights
tokenizer = RobertaTokenizer.from_pretrained('SkolkovoInstitute/roberta_toxicity_classifier')
model = RobertaForSequenceClassification.from_pretrained('SkolkovoInstitute/roberta_toxicity_classifier')
# prepare the input
batch = tokenizer.encode('you are amazing', return_tensors='pt')
# inference
model(batch)
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License .