数据集:

svhn

任务:

图像分类

目标检测

语言:

计算机处理:

monolingual

大小:

100K<n<1M

语言创建人:

machine-generated

批注创建人:

machine-generated expert-generated

源数据集:

original

许可:

other

数据集介绍文件清单

中文

Dataset Card for Street View House Numbers

Dataset Summary

SVHN is a real-world image dataset for developing machine learning and object recognition algorithms with minimal requirement on data preprocessing and formatting. It can be seen as similar in flavor to MNIST (e.g., the images are of small cropped digits), but incorporates an order of magnitude more labeled data (over 600,000 digit images) and comes from a significantly harder, unsolved, real world problem (recognizing digits and numbers in natural scene images). SVHN is obtained from house numbers in Google Street View images. The dataset comes in two formats:

Original images with character level bounding boxes.

MNIST-like 32-by-32 images centered around a single character (many of the images do contain some distractors at the sides).

Supported Tasks and Leaderboards

object-detection : The dataset can be used to train a model for digit detection.
image-classification : The dataset can be used to train a model for Image Classification where the task is to predict a correct digit on the image. The leaderboard for this task is available at: https://paperswithcode.com/sota/image-classification-on-svhn

Languages

English

Dataset Structure

Data Instances

full_numbers

The original, variable-resolution, color house-number images with character level bounding boxes.

{
  'image': <PIL.PngImagePlugin.PngImageFile image mode=RGB size=98x48 at 0x259E3F01780>,
  'digits': {
    'bbox': [
      [36, 7, 13, 32],
      [50, 7, 12, 32]
    ], 
    'label': [6, 9]
  }
}

cropped_digits

Character level ground truth in an MNIST-like format. All digits have been resized to a fixed resolution of 32-by-32 pixels. The original character bounding boxes are extended in the appropriate dimension to become square windows, so that resizing them to 32-by-32 pixels does not introduce aspect ratio distortions. Nevertheless this preprocessing introduces some distracting digits to the sides of the digit of interest.

{
  'image': <PIL.PngImagePlugin.PngImageFile image mode=RGB size=32x32 at 0x25A89494780>,
  'label': 1
}

Data Fields

full_numbers

image : A PIL.Image.Image object containing the image. Note that when accessing the image column: dataset[0]["image"] the image file is automatically decoded. Decoding of a large number of image files might take a significant amount of time. Thus it is important to first query the sample index before the "image" column, i.e. dataset[0]["image"] should always be preferred over dataset["image"][0]
digits : a dictionary containing digits' bounding boxes and labels
- bbox : a list of bounding boxes (in the coco format) corresponding to the digits present on the image
- label : a list of integers between 0 and 9 representing the digit.

cropped_digits

image : A PIL.Image.Image object containing the image. Note that when accessing the image column: dataset[0]["image"] the image file is automatically decoded. Decoding of a large number of image files might take a significant amount of time. Thus it is important to first query the sample index before the "image" column, i.e. dataset[0]["image"] should always be preferred over dataset["image"][0]
digit : an integer between 0 and 9 representing the digit.

Data Splits

full_numbers

The data is split into training, test and extra set. The training set contains 33402 images, test set 13068 and the extra set 202353 images.

cropped_digits

The data is split into training, test and extra set. The training set contains 73257 images, test set 26032 and the extra set 531131 images.

The extra set can be used as extra training data. The extra set was obtained in a similar manner to the training and test set, but with the increased detection threshold in order to generate this large amount of labeled data. The SVHN extra subset is thus somewhat biased toward less difficult detections, and is thus easier than SVHN train/SVHN test.

Dataset Creation

Curation Rationale

From the paper:

As mentioned above, the venerable MNIST dataset has been a valuable goal post for researchers seeking to build better learning systems whose benchmark performance could be expected to translate into improved performance on realistic applications. However, computers have now reached essentially human levels of performance on this problem—a testament to progress in machine learning and computer vision. The Street View House Numbers (SVHN) digit database that we provide can be seen as similar in flavor to MNIST (e.g., the images are of small cropped characters), but the SVHN dataset incorporates an order of magnitude more labeled data and comes from a significantly harder, unsolved, real world problem. Here the gap between human performance and state of the art feature representations is significant. Going forward, we expect that this dataset may fulfill a similar role for modern feature learning algorithms: it provides a new and difficult benchmark where increased performance can be expected to translate into tangible gains on a realistic application.

Source Data

Initial Data Collection and Normalization

From the paper:

The SVHN dataset was obtained from a large number of Street View images using a combination of automated algorithms and the Amazon Mechanical Turk (AMT) framework, which was used to localize and transcribe the single digits. We downloaded a very large set of images from urban areas in various countries.

Who are the source language producers?

[More Information Needed]

Annotations

Annotation process

From the paper:

From these randomly selected images, the house-number patches were extracted using a dedicated sliding window house-numbers detector using a low threshold on the detector’s confidence score in order to get a varied, unbiased dataset of house-number signs. These low precision detections were screened and transcribed by AMT workers.

Who are the annotators?

The AMT workers.

Personal and Sensitive Information

[More Information Needed]

Considerations for Using the Data

Social Impact of Dataset

[More Information Needed]

Discussion of Biases

[More Information Needed]

Other Known Limitations

[More Information Needed]

Additional Information

Dataset Curators

Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu and Andrew Y. Ng

Licensing Information

Non-commerical use only.

Citation Information

@article{netzer2011reading,
  title={Reading digits in natural images with unsupervised feature learning},
  author={Netzer, Yuval and Wang, Tao and Coates, Adam and Bissacco, Alessandro and Wu, Bo and Ng, Andrew Y},
  year={2011}
}

Contributions

Thanks to @mariosasko for adding this dataset.

作者:

佚名

数据集大小:

23.86 KB