s-nlp/russian_toxicity_classifier | ATYUN.COM 官网-人工智能教程资讯全方位服务平台

模型:

s-nlp/russian_toxicity_classifier

任务:

文本分类

类库:

PyTorch TensorFlow Safetensors Transformers

语言:

其他:

bert toxic comments classification toxic+comments+classification

模型介绍文件清单

中文

Bert-based classifier (finetuned from Conversational Rubert ) trained on merge of Russian Language Toxic Comments dataset collected from 2ch.hk and Toxic Russian Comments dataset collected from ok.ru.

The datasets were merged, shuffled, and split into train, dev, test splits in 80-10-10 proportion. The metrics obtained from test dataset is as follows

precision	recall	f1-score	support
0	0.98	0.99	0.98	21384
1	0.94	0.92	0.93	4886
accuracy	0.97	26270
macro avg	0.96	0.96	0.96	26270
weighted avg	0.97	0.97	0.97	26270

How to use

from transformers import BertTokenizer, BertForSequenceClassification

# load tokenizer and model weights
tokenizer = BertTokenizer.from_pretrained('SkolkovoInstitute/russian_toxicity_classifier')
model = BertForSequenceClassification.from_pretrained('SkolkovoInstitute/russian_toxicity_classifier')

# prepare the input
batch = tokenizer.encode('ты супер', return_tensors='pt')

# inference
model(batch)

Licensing Information

Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License .

作者:

s-nlp

数据集大小:

1.99 GB