Fine-tuned-Indonesian-Sentiment-Classifier

This model is a fine-tuned version of indobenchmark/indobert-base-p1 on the IndoNLU's SmSA dataset. It achieves the following results on the evaluation dataset:

Loss: 0.3233
Accuracy: 0.9317
F1: 0.9034

And the results of the test dataset:

Accuracy: 0.928
F1 macro: 0.9113470780757361
F1 micro: 0.928
F1 weighted: 0.9261959965604815

Model description

This model can be used to determine the sentiment of a text with three possible outputs [positive, negative, or neutral]

How to use

from transformers import AutoTokenizer, AutoModelForSequenceClassification

Pre-trained = "hanifnoerr/Fine-tuned-Indonesian-Sentiment-Classifier"
tokenizer = AutoTokenizer.from_pretrained(Pre-trained)
model = AutoModelForSequenceClassification.from_pretrained(Pre-trained)

make classification

pretrained_name = "hanifnoerr/Fine-tuned-Indonesian-Sentiment-Classifier"
sentimen = pipeline(tokenizer=pretrained_name, model=pretrained_name)

kalimat = "buku ini jelek sekali"
sentimen(kalimat)

output: [{'label': 'negative', 'score': 0.9996247291564941}]

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy	F1
0.08	1.0	688	0.3532	0.9310	0.9053
0.0523	2.0	1376	0.3233	0.9317	0.9034
0.045	3.0	2064	0.3949	0.9286	0.8995
0.0252	4.0	2752	0.4662	0.9310	0.9049
0.0149	5.0	3440	0.6251	0.9246	0.8899
0.0091	6.0	4128	0.6148	0.9254	0.8928
0.0111	7.0	4816	0.6259	0.9222	0.8902
0.0106	8.0	5504	0.6123	0.9238	0.8882
0.0092	9.0	6192	0.6353	0.9230	0.8928
0.0085	10.0	6880	0.6733	0.9254	0.8989
0.0062	11.0	7568	0.6666	0.9302	0.9027
0.0036	12.0	8256	0.7578	0.9230	0.8962
0.0055	13.0	8944	0.7378	0.9270	0.8947
0.0023	14.0	9632	0.7758	0.9230	0.8978
0.0009	15.0	10320	0.7051	0.9278	0.9006
0.0033	16.0	11008	0.7442	0.9214	0.8902
0.0	17.0	11696	0.7513	0.9254	0.8974
0.0	18.0	12384	0.7554	0.9270	0.8999

Although trained with 18 epochs, this model uses the best weight (Epoch 2)

Framework versions

Transformers 4.27.4
Pytorch 2.0.0+cu118
Datasets 2.11.0
Tokenizers 0.13.3

作者:

Hanif Noer Rofiq

数据集大小:

475.78 MB