数据集:
pierreguillou/lener_br_finetuning_language_model
The LeNER-Br language modeling dataset is a collection of legal texts in Portuguese from the LeNER-Br dataset ( official site ).
The legal texts were downloaded from this link (93.6MB) and processed to create a DatasetDict with train and validation dataset (20%).
The LeNER-Br language modeling dataset allows the finetuning of language models as BERTimbau base and large .
Portuguese from Brazil.
NLP | Modelos e Web App para Reconhecimento de Entidade Nomeada (NER) no domínio jurídico brasileiro (29/12/2021)
DatasetDict({ validation: Dataset({ features: ['text'], num_rows: 3813 }) train: Dataset({ features: ['text'], num_rows: 15252 }) })
!pip install datasets from datasets import load_dataset dataset = load_dataset("pierreguillou/lener_br_finetuning_language_model")