模型:
s-nlp/roberta_toxicity_classifier
This model is trained for toxicity classification task. The dataset used for training is the merge of the English parts of the three datasets by Jigsaw ( Jigsaw 2018 , Jigsaw 2019 , Jigsaw 2020 ), containing around 2 million examples. We split it into two parts and fine-tune a RoBERTa model ( RoBERTa: A Robustly Optimized BERT Pretraining Approach ) on it. The classifiers perform closely on the test set of the first Jigsaw competition, reaching the AUC-ROC of 0.98 and F1-score of 0.76.
from transformers import RobertaTokenizer, RobertaForSequenceClassification # load tokenizer and model weights tokenizer = RobertaTokenizer.from_pretrained('SkolkovoInstitute/roberta_toxicity_classifier') model = RobertaForSequenceClassification.from_pretrained('SkolkovoInstitute/roberta_toxicity_classifier') # prepare the input batch = tokenizer.encode('you are amazing', return_tensors='pt') # inference model(batch)
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License .