模型:
philschmid/flan-ul2-20b-fp16
This is a fork of google/flan-ul2 20B implementing a custom handler.py for deploying the model to inference-endpoints on a 4x NVIDIA T4.
You can deploy the flan-ul2 with a 1-click .
Note: Creation of the endpoint can take 2 hours due super long building process, be patient. We are working on improving this!
Flan-UL2 is an encoder decoder model based on the T5 architecture. It uses the same configuration as the UL2 model released earlier last year. It was fine tuned using the "Flan" prompt tuning and dataset collection.
According ot the original blog here are the notable improvements:
Important : For more details, please see sections 5.2.1 and 5.2.2 of the paper .
This model was originally contributed by Yi Tay , and added to the Hugging Face ecosystem by Younes Belkada & Arthur Zucker .
If you want to cite this work, please consider citing the blogpost announcing the release of Flan-UL2 .