数据集:

HuggingFaceM4/ActivitiyNet_Captions

语言:

en

计算机处理:

monolingual

大小:

10K<n<100K

语言创建人:

crowdsourced

批注创建人:

expert-generated

源数据集:

original

预印本库:

arxiv:1705.00754

许可:

other
中文

Dataset Card for ActivityNet Captions

Dataset Summary

The ActivityNet Captions dataset connects videos to a series of temporally annotated sentence descriptions. Each sentence covers an unique segment of the video, describing multiple events that occur. These events may occur over very long or short periods of time and are not limited in any capacity, allowing them to co-occur. On average, each of the 20k videos contains 3.65 temporally localized sentences, resulting in a total of 100k sentences. We find that the number of sentences per video follows a relatively normal distribution. Furthermore, as the video duration increases, the number of sentences also increases. Each sentence has an average length of 13.48 words, which is also normally distributed. You can find more details of the dataset under the ActivityNet Captions Dataset section, and under supplementary materials in the paper.

Languages

The captions in the dataset are in English.

Dataset Structure

Data Fields

  • video_id : str unique identifier for the video
  • video_path : str Path to the video file - duration : float32 Duration of the video
  • captions_starts : List_float32 List of timestamps denoting the time at which each caption starts
  • captions_ends : List_float32 List of timestamps denoting the time at which each caption ends
  • en_captions : list_str List of english captions describing parts of the video

Data Splits

train validation test Overall
# of videos 10,009 4,917 4,885 19,811

Annotations

Quoting ActivityNet Captions' paper : "Each annotation task was divided into two steps: (1) Writing a paragraph describing all major events happening in the videos in a paragraph, with each sentence of the paragraph describing one event, and (2) Labeling the start and end time in the video in which each sentence in the paragraph event occurred."

Who annotated the dataset?

Amazon Mechnical Turk annotators

Personal and Sensitive Information

Nothing specifically mentioned in the paper.

Considerations for Using the Data

Social Impact of Dataset

[More Information Needed]

Discussion of Biases

[More Information Needed]

Other Known Limitations

[More Information Needed]

Additional Information

Licensing Information

[More Information Needed]

Citation Information

@InProceedings{tgif-cvpr2016,
@inproceedings{krishna2017dense,
    title={Dense-Captioning Events in Videos},
    author={Krishna, Ranjay and Hata, Kenji and Ren, Frederic and Fei-Fei, Li and Niebles, Juan Carlos},
    booktitle={International Conference on Computer Vision (ICCV)},
    year={2017}
}

Contributions

Thanks to @leot13 for adding this dataset.