You must download the dataset files manually. You can visit this page or run download.sh to get files.
After, you can load dataset by referencing the directory:
import datasets ds = datasets.load_dataset("atasoglu/flickr8k-dataset", data_dir="data") print(ds)
DatasetDict({ train: Dataset({ features: ['image_id', 'image_path', 'captions'], num_rows: 6000 }) test: Dataset({ features: ['image_id', 'image_path', 'captions'], num_rows: 1000 }) validation: Dataset({ features: ['image_id', 'image_path', 'captions'], num_rows: 1000 }) })
I don't own the copyright of the images. Please visit for more.