数据集:

lewtun/dog_food

任务:

图像分类

子任务:

multi-class-image-classification

语言:

计算机处理:

monolingual

大小:

1K<n<10K

语言创建人:

found

批注创建人:

found

源数据集:

original

许可:

license:unknown

数据集介绍文件清单

中文

Dataset Card for the Dog 🐶 vs. Food 🍔 (a.k.a. Dog Food) Dataset

Dataset Summary

This is a dataset for multiclass image classification, between 'dog', 'chicken', and 'muffin' classes.

The 'dog' class contains images of dogs that look like fried chicken and some that look like images of muffins, while the 'chicken' and 'muffin' classes contains images of (you guessed it) fried chicken and muffins 😋

Supported Tasks and Leaderboards

TBC

Languages

The labels are in English (['dog', 'chicken', 'muffin'])

Dataset Structure

Data Instances

A sample from the training set is provided below:

{
{'image': <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=300x470 at 0x7F176094EF28>, 
'label': 0}

}

Data Fields

img: A PIL.JpegImageFile object containing the 300x470. image. Note that when accessing the image column: dataset[0]["image"] the image file is automatically decoded. Decoding of a large number of image files might take a significant amount of time. Thus it is important to first query the sample index before the "image" column, i.e. dataset[0]["image"] should always be preferred over dataset["image"][0]
label: 0-1 with the following correspondence 0 dog 1 food

Data Splits

Train (1875 images) and Test (625 images)

Dataset Creation

Curation Rationale

N/A

Source Data

Initial Data Collection and Normalization

This dataset was taken from the qw2243c/Image-Recognition-Dogs-Fried-Chicken-or-Blueberry-Muffins? Github repository and randomly splitting 25% of the data for validation.

Annotations

Annotation process

This data was scraped from the internet and annotated based on the query words.

Personal and Sensitive Information

N/A

Considerations for Using the Data

Social Impact of Dataset

N/A

Discussion of Biases

This dataset is balanced -- it has an equal number of images of dogs (1000) compared to chicken (1000 and muffin (1000). This should be taken into account when evaluating models.

Other Known Limitations

N/A

Additional Information

Dataset Curators

This dataset was created by @lanceyjt, @yl3829, @wesleytao, @qw2243c and @asyouhaveknown

Licensing Information

No information is indicated on the original github repository .

Citation Information

N/A

Contributions

Thanks to @lewtun for adding this dataset.

作者:

lewtun

数据集大小:

271.79 MB