s-nlp/roberta_toxicity_classifier | ATYUN.COM 官网-人工智能教程资讯全方位服务平台

模型:

s-nlp/roberta_toxicity_classifier

任务:

文本分类

类库:

PyTorch Transformers

语言:

其他:

roberta toxic comments classification toxic+comments+classification

预印本库:

arxiv:1907.11692

模型介绍文件清单

中文

Toxicity Classification Model

This model is trained for toxicity classification task. The dataset used for training is the merge of the English parts of the three datasets by Jigsaw ( Jigsaw 2018 , Jigsaw 2019 , Jigsaw 2020 ), containing around 2 million examples. We split it into two parts and fine-tune a RoBERTa model ( RoBERTa: A Robustly Optimized BERT Pretraining Approach ) on it. The classifiers perform closely on the test set of the first Jigsaw competition, reaching the AUC-ROC of 0.98 and F1-score of 0.76.

How to use

from transformers import RobertaTokenizer, RobertaForSequenceClassification

# load tokenizer and model weights
tokenizer = RobertaTokenizer.from_pretrained('SkolkovoInstitute/roberta_toxicity_classifier')
model = RobertaForSequenceClassification.from_pretrained('SkolkovoInstitute/roberta_toxicity_classifier')

# prepare the input
batch = tokenizer.encode('you are amazing', return_tensors='pt')

# inference
model(batch)

Licensing Information

Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License .

作者:

s-nlp

数据集大小:

479.03 MB