模型:
pierreguillou/bert-base-cased-pt-lenerbr
bert-base-cased-pt-lenerbr is a Language Model in the legal domain in Portuguese that was finetuned on 20/12/2021 in Google Colab from the model BERTimbau base on the dataset LeNER-Br language modeling by using a MASK objective.
You can check as well the version large of this model .
This language model is used to get a NER model on the Portuguese judicial domain. You can check the fine-tuned NER model at pierreguillou/ner-bert-base-cased-pt-lenerbr .
All informations and links are in this blog post: NLP | Modelos e Web App para Reconhecimento de Entidade Nomeada (NER) no domínio jurídico brasileiro (29/12/2021)
You can test this model into the widget of this page.
# install pytorch: check https://pytorch.org/ # !pip install transformers from transformers import AutoTokenizer, AutoModelForMaskedLM tokenizer = AutoTokenizer.from_pretrained("pierreguillou/bert-base-cased-pt-lenerbr") model = AutoModelForMaskedLM.from_pretrained("pierreguillou/bert-base-cased-pt-lenerbr")
The notebook of finetuning ( Finetuning_language_model_BERtimbau_LeNER_Br.ipynb ) is in github.
Num examples = 3227 Num Epochs = 5 Instantaneous batch size per device = 8 Total train batch size (w. parallel, distributed & accumulation) = 8 Gradient Accumulation steps = 1 Total optimization steps = 2020 Step Training Loss Validation Loss 100 1.988700 1.616412 200 1.724900 1.561100 300 1.713400 1.499991 400 1.687400 1.451414 500 1.579700 1.433665 600 1.556900 1.407338 700 1.591400 1.421942 800 1.546000 1.406395 900 1.510100 1.352389 1000 1.507100 1.394799 1100 1.462200 1.36809373471