LayoutLM

多模态（文本 + 布局/格式 + 图像）的文档 AI 预训练

Microsoft Document AI | GitHub

模型描述

LayoutLM 是一种简单但有效的文本和布局预训练方法，用于文档图像理解和信息提取任务，例如表单理解和收据理解。在多个数据集上，LayoutLM 达到了 SOTA 结果。有关更多详细信息，请参阅我们的论文：

LayoutLM: Pre-training of Text and Layout for Document Image Understanding Yiheng Xu，Minghao Li，Lei Cui，Shaohan Huang，Furu Wei，Ming Zhou， KDD 2020

训练数据

我们使用 IIT-CDIP Test Collection 1.0* 数据集在两个设置上对 LayoutLM 进行预训练。

LayoutLM-Base，Uncased（11M 文档，2 个epoch）：12 层，768 隐藏单元，12 个头，113M 参数（此模型）
LayoutLM-Large，Uncased（11M 文档，2 个epoch）：24 层，1024 隐藏单元，16 个头，343M 参数

引用

如果您在研究中发现 LayoutLM 对您有用，请引用以下论文：

@misc{xu2019layoutlm,
    title={LayoutLM: Pre-training of Text and Layout for Document Image Understanding},
    author={Yiheng Xu and Minghao Li and Lei Cui and Shaohan Huang and Furu Wei and Ming Zhou},
    year={2019},
    eprint={1912.13318},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

作者:

Microsoft

数据集大小:

862.67 MB