登录注册

行业学习

支撑

数据算量系统

企业招聘智能体

下载

模型数据集

AI工具箱

企业服务

EVA 智能HR

ChatGPT 人工智能应用人工智能未来计算机视觉

热门新闻

公司板科大讯飞

科大讯飞包揽ICPR MTWI图文识别挑战赛三项冠军

行业人工智能

所以，能动手就别吵吵了

行业人工智能

人工智能对 IT 技能和人才发展的影响

常用工具

OpenAI旗下AI对话工具

字节跳动旗下团队推出的免费AI英语写作助手

AI图像和插画生成工具，测试测试测试测试测试测测试

Stable Diffusion

StabilityAI推出的文本到图像生成AI

GitHub AI编程工具

您尚未登录账户

请先登录您的atyun账户，方可使用该功能

仅限企业账户使用

该功能仅限企业账号使用，开通企业账号可享受更多服务，是否现在注册企业账号？

立即注册企业账号

暂不需要

您的企业账号申请正在审核中

审核通过后即可使用此功能，请耐心等待~

模型:

pszemraj/pegasus-large-summary-explain

任务:

类库:

PyTorch Safetensors Transformers

数据集:

kmfoda/booksum 3Akmfoda/booksum

语言:

其他:

pegasus 文生文 Eval Results AutoTrain Compatible

许可:

模型介绍文件清单

pszemraj/pegasus-large-summary-explain

This model is a fine-tuned version of google/pegasus-large on the booksum dataset for four total epochs.

It achieves the following results on the evaluation set:

eval_loss: 1.1193
eval_runtime: 6.6754
eval_samples_per_second: 27.714
eval_steps_per_second: 1.798
epoch: 3.0
step: 900

A 1-epoch checkpoint can be found at pszemraj/pegasus-large-book-summary , which is where the second training session started from.

Model description

After some initial tests, it was found that models trained on the booksum dataset seem to inherit the summaries' SparkNotes-style explanations; so the user gets a shorter and easier-to-understand version of the text instead of just more compact.
This quality (anecdotally) is favourable for learning/comprehension because summarization datasets that simply make the information more compact (* cough * arXiv) can be so dense that the overall time spent trying to comprehend what it is saying can be the same as just reading the original material.

Intended uses & limitations

standard pegasus has a max input length of 1024 tokens, therefore the model only saw the first 1024 tokens of a chapter when training, and learned to try to make the chapter's summary from that. Keep this in mind when using this model, as information at the end of a text sequence longer than 1024 tokens may be excluded from the final summary/the model will be biased towards information presented first.

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 4e-05
train_batch_size: 16
eval_batch_size: 16
seed: 42
distributed_type: multi-GPU
gradient_accumulation_steps: 2
total_train_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.03
num_epochs: 4

Framework versions

Transformers 4.16.2
Pytorch 1.10.2+cu113
Datasets 1.18.3
Tokenizers 0.11.0

作者:

Peter Szemraj

数据集大小:

4.25 GB

相关推荐