模型:
microsoft/layoutlmv3-base-chinese
Microsoft Document AI | GitHub
LayoutLMv3是一种用于文档AI的预训练多模态Transformer,具有统一的文本和图像遮盖。简洁的统一架构和训练目标使得LayoutLMv3成为一个通用的预训练模型。例如,LayoutLMv3可以针对以文本为中心的任务进行微调,包括表单理解、收据理解和文档视觉问答,以及以图像为中心的任务,如文档图像分类和文档布局分析。
LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking Yupan Huang, Tengchao Lv, Lei Cui, Yutong Lu, Furu Wei, Preprint 2022.
Dataset | Language | Precision | Recall | F1 |
---|---|---|---|---|
1234321 | ZH | 0.8980 | 0.9435 | 0.9202 |
Dataset | Subject | Test Time | Name | School | Examination Number | Seat Number | Class | Student Number | Grade | Score | Mean |
---|---|---|---|---|---|---|---|---|---|---|---|
1235321 | 98.99 | 100.0 | 99.77 | 99.2 | 100.0 | 100.0 | 98.82 | 99.78 | 98.31 | 97.27 | 99.21 |
如果您在研究中发现LayoutLM很有用,请引用以下论文:
@inproceedings{huang2022layoutlmv3, author={Yupan Huang and Tengchao Lv and Lei Cui and Yutong Lu and Furu Wei}, title={LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking}, booktitle={Proceedings of the 30th ACM International Conference on Multimedia}, year={2022} }
本项目的内容本身受 Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) 许可。部分源代码基于 transformers 项目。 Microsoft Open Source Code of Conduct