模型:
dbmdz/bert-base-french-europeana-cased
In this repository the MDZ Digital Library team (dbmdz) at the Bavarian State Library open sources French Europeana BERT models ?
We extracted all French texts using the language metadata attribute from the Europeana corpus.
The resulting corpus has a size of 63GB and consists of 11,052,528,456 tokens.
Based on the metadata information, texts from the 18th - 20th century are mainly included in the training corpus.
Detailed information about the data and pretraining steps can be found in this repository .
BERT model weights for PyTorch and TensorFlow are available.
For results on Historic NER, please refer to this repository .
With Transformers >= 2.3 our French Europeana BERT model can be loaded like:
from transformers import AutoModel, AutoTokenizer tokenizer = AutoTokenizer.from_pretrained("dbmdz/bert-base-french-europeana-cased") model = AutoModel.from_pretrained("dbmdz/bert-base-french-europeana-cased")
All models are available on the Huggingface model hub .
For questions about our BERT model just open an issue here ?
Research supported with Cloud TPUs from Google's TensorFlow Research Cloud (TFRC). Thanks for providing access to the TFRC ❤️
Thanks to the generous support from the Hugging Face team, it is possible to download our model from their S3 storage ?