w11wo/indonesian-roberta-base-posp-tagger | ATYUN.COM 官网-人工智能教程资讯全方位服务平台

模型:

w11wo/indonesian-roberta-base-posp-tagger

任务:

标记分类

类库:

PyTorch TensorFlow Safetensors Transformers

数据集:

indonlu 3Aindonlu

语言:

其他:

roberta indonesian-roberta-base-posp-tagger AutoTrain Compatible

预印本库:

arxiv:1907.11692

许可:

mit

模型介绍文件清单

中文

Indonesian RoBERTa Base POSP Tagger

Indonesian RoBERTa Base POSP Tagger is a part-of-speech token-classification model based on the RoBERTa model. The model was originally the pre-trained Indonesian RoBERTa Base model, which is then fine-tuned on indonlu 's POSP dataset consisting of tag-labelled news.

After training, the model achieved an evaluation F1-macro of 95.34%. On the benchmark test set, the model achieved an accuracy of 93.99% and F1-macro of 88.93%.

Hugging Face's Trainer class from the Transformers library was used to train the model. PyTorch was used as the backend framework during training, but the model remains compatible with other frameworks nonetheless.

Model

Model	#params	Arch.	Training/Validation data (text)
indonesian-roberta-base-posp-tagger	124M	RoBERTa Base	POSP

Evaluation Results

The model was trained for 10 epochs and the best model was loaded at the end.

Epoch	Training Loss	Validation Loss	Precision	Recall	F1	Accuracy
1	0.898400	0.343731	0.894324	0.894324	0.894324	0.894324
2	0.294700	0.236619	0.929620	0.929620	0.929620	0.929620
3	0.214100	0.202723	0.938349	0.938349	0.938349	0.938349
4	0.171100	0.183630	0.945264	0.945264	0.945264	0.945264
5	0.143300	0.169744	0.948469	0.948469	0.948469	0.948469
6	0.124700	0.174946	0.947963	0.947963	0.947963	0.947963
7	0.109800	0.167450	0.951590	0.951590	0.951590	0.951590
8	0.101300	0.163191	0.952475	0.952475	0.952475	0.952475
9	0.093500	0.163255	0.953361	0.953361	0.953361	0.953361
10	0.089000	0.164673	0.953445	0.953445	0.953445	0.953445

How to Use

As Token Classifier

from transformers import pipeline

pretrained_name = "w11wo/indonesian-roberta-base-posp-tagger"

nlp = pipeline(
    "token-classification",
    model=pretrained_name,
    tokenizer=pretrained_name
)

nlp("Budi sedang pergi ke pasar.")

Disclaimer

Do consider the biases which come from both the pre-trained RoBERTa model and the POSP dataset that may be carried over into the results of this model.

Author

Indonesian RoBERTa Base POSP Tagger was trained and evaluated by Wilson Wongso . All computation and development are done on Google Colaboratory using their free GPU access.

作者:

Wilson Wongso

数据集大小:

1.39 GB