数据集:
severo/embellishments
This small dataset contains the thumbnails of the first 100 entries of Digitised Books - Images identified as Embellishments. c. 1510 - c. 1900. JPG . It has been uploaded to the Hub to reproduce the tutorial by Daniel van Strien: Using ? datasets for image search .
A typical row contains an image thumbnail, its filename, and the year of publication of the book it was extracted from.
An example looks as follows:
{ 'fname': '000811462_05_000205_1_The Pictorial History of England being a history of the people as well as a hi_1855.jpg', 'year': '1855', 'path': 'embellishments/1855/000811462_05_000205_1_The Pictorial History of England being a history of the people as well as a hi_1855.jpg', 'img': ... }
The dataset only contains 100 rows, in a single 'train' split.
This dataset was chosen by Daniel van Strien for his tutorial Using ? datasets for image search , which includes the code in Python to do it.
As stated on the British Library webpage:
The images were algorithmically gathered from 49,455 digitised books, equating to 65,227 volumes (25+ million pages), published between c. 1510 - c. 1900. The books cover a wide range of subject areas including philosophy, history, poetry and literature. The images are in .JPEG format.d BCP-47 code is en .
Who are the source data producers?British Library, British Library Labs, Adrian Edwards (Curator), Neil Fitzgerald (Contributor ORCID)
The dataset does not contain any additional annotations.
Annotation process[N/A]
Who are the annotators?[N/A]
[N/A]
[N/A]
[N/A]
This is a toy dataset that aims at:
The dataset was created by Sylvain Lesage at Hugging Face, to replicate the tutorial Using ? datasets for image search by Daniel van Strien.
CC0 1.0 Universal Public Domain