数据集:

severo/embellishments

大小:

n<1K

批注创建人:

no-annotation

源数据集:

original

许可:

cc0-1.0

数据集介绍文件清单

中文

Dataset Card for severo/embellishments

Dataset Summary

This small dataset contains the thumbnails of the first 100 entries of Digitised Books - Images identified as Embellishments. c. 1510 - c. 1900. JPG . It has been uploaded to the Hub to reproduce the tutorial by Daniel van Strien: Using 🤗 datasets for image search .

Dataset Structure

Data Instances

A typical row contains an image thumbnail, its filename, and the year of publication of the book it was extracted from.

An example looks as follows:

{
 'fname': '000811462_05_000205_1_The Pictorial History of England being a history of the people as well as a hi_1855.jpg',
 'year': '1855',
 'path': 'embellishments/1855/000811462_05_000205_1_The Pictorial History of England being a history of the people as well as a hi_1855.jpg',
 'img': ...
}

Data Fields

fname : the image filename.
year : a string with the year of publication of the book from which the image has been extracted
path : local path to the image
img : a thumbnail of the image with a max height and width of 224 pixels

Data Splits

The dataset only contains 100 rows, in a single 'train' split.

Dataset Creation

Curation Rationale

This dataset was chosen by Daniel van Strien for his tutorial Using 🤗 datasets for image search , which includes the code in Python to do it.

Source Data

Initial Data Collection and Normalization

As stated on the British Library webpage:

The images were algorithmically gathered from 49,455 digitised books, equating to 65,227 volumes (25+ million pages), published between c. 1510 - c. 1900. The books cover a wide range of subject areas including philosophy, history, poetry and literature. The images are in .JPEG format.d BCP-47 code is en .

Who are the source data producers?

British Library, British Library Labs, Adrian Edwards (Curator), Neil Fitzgerald (Contributor ORCID)

Annotations

The dataset does not contain any additional annotations.

Annotation process

[N/A]

Who are the annotators?

[N/A]

Personal and Sensitive Information

[N/A]

Considerations for Using the Data

Social Impact of Dataset

[N/A]

Discussion of Biases

[N/A]

Other Known Limitations

This is a toy dataset that aims at:

validating the process described in the tutorial Using 🤗 datasets for image search by Daniel van Strien,
showing the dataset viewer on an image dataset.

Additional Information

Dataset Curators

The dataset was created by Sylvain Lesage at Hugging Face, to replicate the tutorial Using 🤗 datasets for image search by Daniel van Strien.

Licensing Information

CC0 1.0 Universal Public Domain

作者:

severo

数据集大小:

638.46 KB