Dataset Card for CORD (Consolidated Receipt Dataset)

Dataset Summary

[More Information Needed]

Supported Tasks and Leaderboards

[More Information Needed]

Dataset Structure

Data Instances

[More Information Needed]

Data Fields

{
  "id": datasets.Value("string"),
  "words": datasets.Sequence(datasets.Value("string")),
  "bboxes": datasets.Sequence(datasets.Sequence(datasets.Value("int64"))),
  "labels": datasets.Sequence(datasets.features.ClassLabel(names=_LABELS)),
  "images": datasets.features.Image(),
}

Data Splits

train (800 rows)
validation (100 rows)
test (100 rows)

Dataset Creation

Licensing Information

Creative Commons Attribution 4.0 International License

Citation Information

@article{park2019cord,
  title={CORD: A Consolidated Receipt Dataset for Post-OCR Parsing},
  author={Park, Seunghyun and Shin, Seung and Lee, Bado and Lee, Junyeop and Surh, Jaeheung and Seo, Minjoon and Lee, Hwalsuk}
  booktitle={Document Intelligence Workshop at Neural Information Processing Systems}
  year={2019}
}

Contributions

Thanks to @clovaai for adding this dataset.

作者:

wkrl

数据集大小:

11.92 KB