This model is a fine-tuned version of the bert-base-uncased model to classify toxic comments.
You can use the model with the following code.
from transformers import BertForSequenceClassification, BertTokenizer, TextClassificationPipeline model_path = "JungleLee/bert-toxic-comment-classification" tokenizer = BertTokenizer.from_pretrained(model_path) model = BertForSequenceClassification.from_pretrained(model_path, num_labels=2) pipeline = TextClassificationPipeline(model=model, tokenizer=tokenizer) print(pipeline("You're a fucking nerd."))
The training data comes from this Kaggle competition . We use 90% of the train.csv data to train the model.
The model achieves 0.95 AUC in a 1500 rows held-out test set.