英文

Bert-based分类器(从 Conversational Rubert 微调)训练于从2ch.hk收集的合并的俄语语言有害评论 dataset 和从ok.ru收集的有害俄语评论 dataset

数据集被合并、洗牌,并以80-10-10的比例划分为训练、验证和测试集。从测试数据集获得的指标如下

precision recall f1-score support
0 0.98 0.99 0.98 21384
1 0.94 0.92 0.93 4886
accuracy 0.97 26270
macro avg 0.96 0.96 0.96 26270
weighted avg 0.97 0.97 0.97 26270

如何使用

from transformers import BertTokenizer, BertForSequenceClassification

# load tokenizer and model weights
tokenizer = BertTokenizer.from_pretrained('SkolkovoInstitute/russian_toxicity_classifier')
model = BertForSequenceClassification.from_pretrained('SkolkovoInstitute/russian_toxicity_classifier')

# prepare the input
batch = tokenizer.encode('ты супер', return_tensors='pt')

# inference
model(batch)

许可信息

Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License