n3wtou/mt5-small-finedtuned-4-swahili
This model is a fine-tuned version of
google/mt5-small
on csebuetnlp/xlsum dataset.
It achieves the following results on the evaluation set:
-
Train Loss: 2.4419
-
Validation Loss: 2.4809
-
Epoch: 9
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
-
optimizer: {'name': 'AdamWeightDecay', 'learning_rate': {'class_name': 'WarmUp', 'config': {'initial_learning_rate': 0.0003, 'decay_schedule_fn': {'class_name': 'PolynomialDecay', 'config': {'initial_learning_rate': 0.0003, 'decay_steps': 19900, 'end_learning_rate': 0.0, 'power': 1.0, 'cycle': False, 'name': None}, '
passive_serialization
': True}, 'warmup_steps': 100, 'power': 1.0, 'name': None}}, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-08, 'amsgrad': False, 'weight_decay_rate': 0.001}
-
training_precision: mixed_float16
Training results
Train Loss
|
Validation Loss
|
Epoch
|
5.6636
|
2.9818
|
0
|
3.7789
|
2.7822
|
1
|
3.3841
|
2.6840
|
2
|
3.1496
|
2.6238
|
3
|
2.9656
|
2.5816
|
4
|
2.8134
|
2.5522
|
5
|
2.6914
|
2.5315
|
6
|
2.5935
|
2.4980
|
7
|
2.5056
|
2.4764
|
8
|
2.4419
|
2.4809
|
9
|
Framework versions
-
Transformers 4.30.2
-
TensorFlow 2.12.0
-
Datasets 2.12.0
-
Tokenizers 0.13.3