模型:
pszemraj/long-t5-tglobal-base-16384-booksci-summary-v1
An experiment investigating transfer learning capabilities by fine-tuning models on different datasets starting from the booksum checkpoint.
This model is a fine-tuned version of pszemraj/long-t5-tglobal-base-16384-book-summary on the pszemraj/scientific_lay_summarisation-elife-norm dataset for two epochs.
It's recommended to use this model with beam search decoding . If interested, you can also use the textsum util repo to have most of this abstracted out for you:
pip install -U textsum
from textsum.summarize import Summarizer model_name = "pszemraj/long-t5-tglobal-base-16384-booksci-summary-v1" summarizer = Summarizer(model_name) # GPU auto-detected text = "put the text you don't want to read here" summary = summarizer.summarize_string(text) print(summary)
Note: this model was trained at a lower LR & not till "absolute convergence" with the intention of retaining some of the properties learned from the initial fine-tuning on booksum
It achieves the following results on the evaluation set:
The following hyperparameters were used during training:
Training Loss | Epoch | Step | Validation Loss | Rouge1 | Rouge2 | Rougel | Rougelsum | Gen Len |
---|---|---|---|---|---|---|---|---|
2.7492 | 0.99 | 67 | 2.4272 | 34.6436 | 4.4536 | 12.4985 | 30.916 | 300.7635 |
2.6689 | 1.97 | 134 | 2.3994 | 34.2428 | 4.3644 | 12.5332 | 30.6965 | 294.0249 |