模型:

tiiuae/falcon-40b-instruct

任务:

文本生成

类库:

PyTorch Transformers

数据集:

tiiuae/falcon-refinedweb 3Atiiuae/falcon-refinedweb

语言:

其他:

RefinedWeb custom_code text-generation-inference

预印本库:

arxiv:2205.14135 arxiv:1911.02150 arxiv:2005.14165 arxiv:2104.09864 arxiv:2306.01116

许可:

apache-2.0

模型介绍文件清单

英文

✨ Falcon-40B-Instruct

Falcon-40B-Instruct是由 TII 基于 Falcon-40B 构建的一个40B参数因果解码器模型，并在一系列 Baize 的混合数据上进行了微调。它可在Apache 2.0许可下使用。

即将发布论文 😊。

🤗 要开始使用Falcon（推断、微调、量化等），我们建议阅读 this great blogpost fron HF ！

为什么使用Falcon-40B-Instruct？

您正在寻找一个基于 Falcon-40B 的即用型聊天/指导模型。
Falcon-40B是目前最好的开源模型。它的性能超过了 LLaMA 、 StableLM 、 RedPajama 、 MPT 等。请参阅 OpenLLM Leaderboard 。
它采用了为推断而优化的架构，具有FlashAttention（ Dao et al., 2022 ）和多查询（ Shazeer et al., 2019 ）。

💬 这是一个指导模型，可能不适合进一步的微调。如果您有兴趣构建自己的指导/聊天模型，我们建议从 Falcon-40B 开始。

💸 寻找一个更小、更便宜的模型？ Falcon-7B-Instruct 是Falcon-40B-Instruct的迷你版！

from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
import torch

model = "tiiuae/falcon-40b-instruct"

tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    device_map="auto",
)
sequences = pipeline(
   "Girafatron is obsessed with giraffes, the most glorious animal on the face of this Earth. Giraftron believes all other animals are irrelevant when compared to the glorious majesty of the giraffe.\nDaniel: Hello, Girafatron!\nGirafatron:",
    max_length=200,
    do_sample=True,
    top_k=10,
    num_return_sequences=1,
    eos_token_id=tokenizer.eos_token_id,
)
for seq in sequences:
    print(f"Result: {seq['generated_text']}")

要快速推断Falcon，请查看 Text Generation Inference ！欲了解更多，请阅读此 blogpost 。

您将至少需要85-100GB的内存才能快速运行Falcon-40B的推断。

Falcon-40B-Instruct的模型卡片

模型详情

模型描述

开发者： https://www.tii.ae ；
模型类型：仅解码器因果模型；
语言（自然语言处理）：英语和法语；
许可证：Apache 2.0；
经过微调的模型： Falcon-40B 。

模型来源

论文：即将发布。

用途

直接使用

Falcon-40B-Instruct在聊天数据集上进行了微调。

超出范围的使用

在没有充分评估风险和缓解措施的情况下进行生产使用；任何可能被认为是不负责任或有害的用例。

偏见、风险和限制

Falcon-40B-Instruct主要是在英语数据上训练的，并不能适当地泛化到其他语言。此外，由于它是在代表网络的大规模语料库上训练的，它将带有常见的在线刻板印象和偏见。

建议

我们建议使用Falcon-40B-Instruct的用户制定适当的规则，并为任何生产使用采取适当的预防措施。

如何开始使用该模型

from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
import torch

model = "tiiuae/falcon-40b-instruct"

tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    device_map="auto",
)
sequences = pipeline(
   "Girafatron is obsessed with giraffes, the most glorious animal on the face of this Earth. Giraftron believes all other animals are irrelevant when compared to the glorious majesty of the giraffe.\nDaniel: Hello, Girafatron!\nGirafatron:",
    max_length=200,
    do_sample=True,
    top_k=10,
    num_return_sequences=1,
    eos_token_id=tokenizer.eos_token_id,
)
for seq in sequences:
    print(f"Result: {seq['generated_text']}")

培训详细信息

培训数据

Falcon-40B-Instruct在 Bai ze 的1.5亿个标记上进行了微调，其中包含 RefinedWeb 数据的5%。

该数据使用Falcon- 7B / 40B 分词器进行分词。

评估

即将发布论文。

请参阅 OpenLLM Leaderboard 获取早期结果。

技术规格

有关预训练的更多信息，请参阅 Falcon-40B 。

模型架构和目标

Falcon-40B是一个仅解码器的因果模型，训练任务是因果语言建模（即预测下一个标记）。

该架构大体上是根据GPT-3论文（ Brown et al., 2020 ）进行改进的，具有以下区别：

位置嵌入：旋转（ Su et al., 2021 ）；
注意力：多查询（ Shazeer et al., 2019 ）和FlashAttention（ Dao et al., 2022 ）；
解码器块：并行注意力/MLP，具有单层规范化。

对于多查询的使用，我们使用的是内部变体，它在每个张量并行度上使用独立的键和值。

Hyperparameter	Value	Comment
Layers	60
d_model	8192
head_dim	64	Reduced to optimise for FlashAttention
Vocabulary	65024
Sequence length	2048

计算基础设施

硬件

Falcon-40B-Instruct是在AWS SageMaker上训练的，使用了P4d实例上的64个A100 40GB GPU。

软件

Falcon-40B-Instruct是使用自定义的分布式训练代码库Gigatron进行训练的。它采用了3D并行性方法，结合了ZeRO和高性能的Triton内核（FlashAttention等）。

引用

即将发布论文 😊。与此同时，您可以使用以下信息进行引用：

@article{falcon40b,
  title={{Falcon-40B}: an open large language model with state-of-the-art performance},
  author={Almazrouei, Ebtesam and Alobeidli, Hamza and Alshamsi, Abdulaziz and Cappelli, Alessandro and Cojocaru, Ruxandra and Debbah, Merouane and Goffinet, Etienne and Heslow, Daniel and Launay, Julien and Malartic, Quentin and Noune, Badreddine and Pannier, Baptiste and Penedo, Guilherme},
  year={2023}
}

要了解有关预训练数据集的更多信息，请参阅📓 RefinedWeb paper 。

@article{refinedweb,
  title={The {R}efined{W}eb dataset for {F}alcon {LLM}: outperforming curated corpora with web data, and web data only},
  author={Guilherme Penedo and Quentin Malartic and Daniel Hesslow and Ruxandra Cojocaru and Alessandro Cappelli and Hamza Alobeidli and Baptiste Pannier and Ebtesam Almazrouei and Julien Launay},
  journal={arXiv preprint arXiv:2306.01116},
  eprint={2306.01116},
  eprinttype = {arXiv},
  url={https://arxiv.org/abs/2306.01116},
  year={2023}
}

要引用用于该模型的 Baize 指导数据集：

@article{xu2023baize,
  title={Baize: An Open-Source Chat Model with Parameter-Efficient Tuning on Self-Chat Data},
  author={Xu, Canwen and Guo, Daya and Duan, Nan and McAuley, Julian},
  journal={arXiv preprint arXiv:2304.01196},
  year={2023}
}

许可

Falcon-40B-Instruct在Apache 2.0许可下提供。

联系方式

falconllm@tii.ae

作者:

Technology Innovation Institute

数据集大小:

77.93 GB