priyank-m/SROIE_2019_text_recognition | ATYUN.COM 官网-人工智能教程资讯全方位服务平台

数据集:

priyank-m/SROIE_2019_text_recognition

任务:

图生文

子任务:

image-captioning

语言:

计算机处理:

monolingual

大小:

10K<n<100K

其他:

text-recognition recognition

数据集介绍文件清单

中文

This dataset we prepared using the Scanned receipts OCR and information extraction(SROIE) dataset. The SROIE dataset contains 973 scanned receipts in English language. Cropping the bounding boxes from each of the receipts to generate this text-recognition dataset resulted in 33626 images for train set and 18704 images for the test set. The text annotations for all the images inside a split are stored in a metadata.jsonl file.

usage:

from dataset import load_dataset

data = load_dataset("priyank-m/SROIE_2019_text_recognition")

source of raw SROIE dataset: https://www.kaggle.com/datasets/urbikn/sroie-datasetv2

作者:

priyank-m

数据集大小:

174.31 MB