pierreguillou/bert-base-cased-pt-lenerbr | ATYUN.COM 官网-人工智能教程资讯全方位服务平台

模型:

pierreguillou/bert-base-cased-pt-lenerbr

任务:

填充掩码

类库:

PyTorch Transformers

数据集:

pierreguillou/lener_br_finetuning_language_model 3Apierreguillou/lener_br_finetuning_language_model

语言:

其他:

bert generated_from_trainer Eval Results AutoTrain Compatible

模型介绍文件清单

中文

(BERT base) Language modeling in the legal domain in Portuguese (LeNER-Br)

bert-base-cased-pt-lenerbr is a Language Model in the legal domain in Portuguese that was finetuned on 20/12/2021 in Google Colab from the model BERTimbau base on the dataset LeNER-Br language modeling by using a MASK objective.

You can check as well the version large of this model .

Blog post

This language model is used to get a NER model on the Portuguese judicial domain. You can check the fine-tuned NER model at pierreguillou/ner-bert-base-cased-pt-lenerbr .

All informations and links are in this blog post: NLP | Modelos e Web App para Reconhecimento de Entidade Nomeada (NER) no domínio jurídico brasileiro (29/12/2021)

Widget & APP

You can test this model into the widget of this page.

Using the model for inference in production

# install pytorch: check https://pytorch.org/
# !pip install transformers 
from transformers import AutoTokenizer, AutoModelForMaskedLM

tokenizer = AutoTokenizer.from_pretrained("pierreguillou/bert-base-cased-pt-lenerbr")
model = AutoModelForMaskedLM.from_pretrained("pierreguillou/bert-base-cased-pt-lenerbr")

Training procedure

Notebook

The notebook of finetuning ( Finetuning_language_model_BERtimbau_LeNER_Br.ipynb ) is in github.

Training results

Num examples = 3227
Num Epochs = 5
Instantaneous batch size per device = 8
Total train batch size (w. parallel, distributed & accumulation) = 8
Gradient Accumulation steps = 1
Total optimization steps = 2020

Step	Training Loss	Validation Loss
100	 1.988700	     1.616412
200	 1.724900	     1.561100
300	 1.713400	     1.499991
400	 1.687400	     1.451414
500	 1.579700	     1.433665
600	 1.556900	     1.407338
700	 1.591400	     1.421942
800	 1.546000	     1.406395
900	 1.510100	     1.352389
1000	1.507100     	1.394799
1100	1.462200     	1.36809373471

作者:

Pierre Guillou

数据集大小:

416.33 MB