数据集:

visheratin/laion-coco-nllb

中文

LAION COCO translated into 201 languages

This dataset contains the samples of the LAION-COCO dataset translated to 201 languages using the largest NLLB-200 model (3.3B parameters).

Fields description

  • id - unique ID of the image.
  • url - original URL of the image from the LAION-COCO dataset.
  • eng_caption - original English caption from the LAION-COCO dataset.
  • captions - a list of captions translated to the languages from the Flores 200 dataset. Every item in the list is a list where the first element is a BCP-47 language code, and the second one is a caption in this language. The list of all language codes for the Flores 200 dataset can be found here .
  • score - aesthetic score generated using LAION aesthetic predictor . The images in the dataset have the score of 4.5+.
  • Images

    The dataset was filtered to contain only working image URLs. However, the availability may change in the future. Because of that, all images from this dataset are available at https://nllb-data.com/ . To get the image, use the following format:

    https://nllb-data.com/{id}.jpg