模型:
onlplab/alephbert-base
State-of-the-art language model for Hebrew. Based on Google's BERT architecture (Devlin et al. 2018) .
How to usefrom transformers import BertModel, BertTokenizerFast alephbert_tokenizer = BertTokenizerFast.from_pretrained('onlplab/alephbert-base') alephbert = BertModel.from_pretrained('onlplab/alephbert-base') # if not finetuning - disable dropout alephbert.eval()
Trained on a DGX machine (8 V100 GPUs) using the standard huggingface training procedure.
Since the larger part of our training data is based on tweets we decided to start by optimizing using Masked Language Model loss only.
To optimize training time we split the data into 4 sections based on max number of tokens:
Each section was first trained for 5 epochs with an initial learning rate set to 1e-4. Then each section was trained for another 5 epochs with an initial learning rate set to 1e-5, for a total of 10 epochs.
Total training time was 8 days.