数据集:
HuggingFaceM4/LocalizedNarratives
预印本库:
arxiv:1912.03098许可:
cc-by-4.0Localized Narratives, a new form of multimodal image annotations connecting vision and language. We ask annotators to describe an image with their voice while simultaneously hovering their mouse over the region they are describing. Since the voice and the mouse pointer are synchronized, we can localize every single word in the description. This dense visual grounding takes the form of a mouse trace segment per word and is unique to our data. We annotated 849k images with Localized Narratives: the whole COCO, Flickr30k, and ADE20K datasets, and 671k images of Open Images, all of which we make publicly available.
As of now, there is only the OpenImages subset, but feel free to contribute the other subset of Localized Narratives!
OpenImages_captions is similar to the OpenImages subset. The differences are that captions are groupped per image (images can have multiple captions). For this subset, timed_caption , traces and voice_recording are not available.
[More Information Needed]
[More Information Needed]
Each instance has the following structure:
{ dataset_id: 'mscoco_val2017', image_id: '137576', annotator_id: 93, caption: 'In this image there are group of cows standing and eating th...', timed_caption: [{'utterance': 'In this', 'start_time': 0.0, 'end_time': 0.4}, ...], traces: [[{'x': 0.2086, 'y': -0.0533, 't': 0.022}, ...], ...], voice_recording: 'coco_val/coco_val_137576_93.ogg' }
Each line represents one Localized Narrative annotation on one image by one annotator and has the following fields:
[More Information Needed]
[More Information Needed]
[More Information Needed]
Who are the source language producers?[More Information Needed]
[More Information Needed]
Who are the annotators?[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
[More Information Needed]
Thanks to @VictorSanh for adding this dataset.