long-t5-tglobal-base-sci-simplify

Exploring how well long-document models trained on "lay summaries" of scientific papers generalize.

A lay summary is a summary of a research paper or scientific study that is written in plain language, without the use of technical jargon, and is designed to be easily understood by non-experts.

Model description

This model is a fine-tuned version of google/long-t5-tglobal-base on the pszemraj/scientific_lay_summarisation-plos-norm dataset for two epochs.

The variant trained on the ELIFE subset can be found here

Usage

It's recommended to use this model with beam search decoding . If you are interested, you can also use the textsum util repo to have most of this abstracted for you:

Install with pip :

pip install -U textsum

Use in python:

from textsum.summarize import Summarizer

summarizer = Summarizer('pszemraj/long-t5-tglobal-base-sci-simplify')
text = "put the text you don't want to read here"
summary = summarizer.summarize_string(text)
print(summary)

Intended uses & limitations

Ability to generalize outside of the dataset domain (pubmed/bioscience type papers) has to be evaluated.

Training procedure

Eval results

It achieves the following results on the evaluation set:

Loss: 1.6778
Rouge1: 49.1475
Rouge2: 18.9281
Rougel: 26.9893
Rougelsum: 45.0973
Gen Len: 399.4125

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0004
train_batch_size: 4
eval_batch_size: 2
seed: 42
distributed_type: multi-GPU
gradient_accumulation_steps: 16
total_train_batch_size: 64
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.01
num_epochs: 2.0

Training results

Training Loss	Epoch	Step	Validation Loss	Rouge1	Rouge2	Rougel	Rougelsum	Gen Len
1.966	0.52	200	1.7171	48.6521	18.427	26.7726	44.3947	376.335
1.877	1.03	400	1.6909	49.3263	18.7945	27.0741	45.1737	382.205
1.9007	1.55	600	1.6778	49.1475	18.9281	26.9893	45.0973	399.4125

作者:

Peter Szemraj

数据集大小:

2.77 GB