long-t5-tglobal-base-sci-simplify: elife subset

Exploring how well long-document models trained on "lay summaries" of scientific papers generalize.

A lay summary is a summary of a research paper or scientific study that is written in plain language, without the use of technical jargon, and is designed to be easily understood by non-experts.

Model description

This model is a fine-tuned version of google/long-t5-tglobal-base on the pszemraj/scientific_lay_summarisation-elife-norm dataset.

The variant trained on the PLOS subset can be found here

Usage

It's recommended to usage this model with beam search decoding . If interested, you can also use the textsum util repo to have most of this abstracted out for you:

pip install -U textsum

from textsum.summarize import Summarizer

model_name = "pszemraj/long-t5-tglobal-base-sci-simplify-elife"
summarizer = Summarizer(model_name) # GPU auto-detected
text = "put the text you don't want to read here"
summary = summarizer.summarize_string(text)
print(summary)

Intended uses & limitations

Ability to generalize outside of the dataset domain (pubmed/bioscience type papers) has to be evaluated.

Training and evaluation data

The elife subset of the :lay summaries dataset. Refer to pszemraj/scientific_lay_summarisation-elife-norm

Training procedure

Eval results

It achieves the following results on the evaluation set:

Loss: 1.9990
Rouge1: 38.5587
Rouge2: 9.7336
Rougel: 21.1974
Rougelsum: 35.9333
Gen Len: 392.7095

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0004
train_batch_size: 4
eval_batch_size: 2
seed: 42
distributed_type: multi-GPU
gradient_accumulation_steps: 16
total_train_batch_size: 64
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.01
num_epochs: 3.0

Training results

Training Loss	Epoch	Step	Validation Loss	Rouge1	Rouge2	Rougel	Rougelsum	Gen Len
2.2995	1.47	100	2.0175	35.2501	8.2121	20.4587	32.4494	439.7552
2.2171	2.94	200	1.9990	38.5587	9.7336	21.1974	35.9333	392.7095

作者:

Peter Szemraj

数据集大小:

2.77 GB