数据集:

huggan/smithsonian_butterflies_subset

中文

This a subset of "ceyda/smithsonian_butterflies" dataset with additional processing done to train the "ceyda/butterfly_gan" model.

The preprocessing includes:

  • Adding "sim_score" to images with CLIP model using "pretty butterfly","one butterfly","butterfly with open wings","colorful butterfly"
  • Removing butterflies with the same name(species)
  • Limiting only to the top 1000 images
  • Removing the background (doing another sim_scoring after bg removal did visually worse so didn't do it)
  • Detecting contours
  • Cropping to the bounding box of the contour with the largest area
  • Converting back to RGB