LAION COCO translated into 201 languages
This dataset contains the samples of the
LAION-COCO
dataset translated to 201 languages using
the largest
NLLB-200 model
(3.3B parameters).
Fields description
id
- unique ID of the image.
url
- original URL of the image from the LAION-COCO dataset.
eng_caption
- original English caption from the LAION-COCO dataset.
captions
- a list of captions translated to the languages from the Flores 200 dataset. Every item in the list is a list where the first element is a BCP-47 language code, and the second one is a caption in this language. The list of all language codes for the Flores 200 dataset can be found
here
.
score
- aesthetic score generated using
LAION aesthetic predictor
. The images in the dataset have the score of 4.5+.
Images
The dataset was filtered to contain only working image URLs. However, the availability may change in the future. Because of that, all images from this dataset are available at
https://nllb-data.com/
.
To get the image, use the following format:
https://nllb-data.com/{id}.jpg