数据集:

HuggingFaceM4/charades

任务:

task_categories:other

语言:

计算机处理:

monolingual

大小:

1K<n<10K

语言创建人:

crowdsourced

批注创建人:

crowdsourced

源数据集:

original

预印本库:

arxiv:1604.01753

许可:

other

数据集介绍文件清单

中文

Dataset Card for Charades

Dataset Summary

Charades is dataset composed of 9848 videos of daily indoors activities collected through Amazon Mechanical Turk. 267 different users were presented with a sentence, that includes objects and actions from a fixed vocabulary, and they recorded a video acting out the sentence (like in a game of Charades). The dataset contains 66,500 temporal annotations for 157 action classes, 41,104 labels for 46 object classes, and 27,847 textual descriptions of the videos

Supported Tasks and Leaderboards

multilabel-action-classification : The goal of this task is to classify actions happening in a video. This is a multilabel classification. The leaderboard is available here

Languages

The annotations in the dataset are in English.

Dataset Structure

Data Instances

{
  "video_id": "46GP8",
  "video": "/home/amanpreet_huggingface_co/.cache/huggingface/datasets/downloads/extracted/3f022da5305aaa189f09476dbf7d5e02f6fe12766b927c076707360d00deb44d/46GP8.mp4",
  "subject": "HR43",
  "scene": "Kitchen",
  "quality": 6,
  "relevance": 7,
  "verified": "Yes",
  "script": "A person cooking on a stove while watching something out a window.",
  "objects": ["food", "stove", "window"],
  "descriptions": [
    "A person cooks food on a stove before looking out of a window."
  ],
  "labels": [92, 147],
  "action_timings": [
    [11.899999618530273, 21.200000762939453],
    [0.0, 12.600000381469727]
  ],
  "length": 24.829999923706055
}

Data Fields

video_id : str Unique identifier for each video.
video : str Path to the video file
subject : str Unique identifier for each subject in the dataset
scene : str One of 15 indoor scenes in the dataset, such as Kitchen
quality : int The quality of the video judged by an annotator (7-point scale, 7=high quality), -100 if missing
relevance : int The relevance of the video to the script judged by an annotated (7-point scale, 7=very relevant), -100 if missing
verified : str 'Yes' if an annotator successfully verified that the video matches the script, else 'No'
script : str The human-generated script used to generate the video
descriptions : List[str] List of descriptions by annotators watching the video
labels : List[int] Multi-label actions found in the video. Indices from 0 to 156.
action_timings : List[Tuple[int, int]] Timing where each of the above actions happened.
length : float The length of the video in seconds

Click here to see the full list of Charades class labels mapping:

id	Class
c000	Holding some clothes
c001	Putting clothes somewhere
c002	Taking some clothes from somewhere
c003	Throwing clothes somewhere
c004	Tidying some clothes
c005	Washing some clothes
c006	Closing a door
c007	Fixing a door
c008	Opening a door
c009	Putting something on a table
c010	Sitting on a table
c011	Sitting at a table
c012	Tidying up a table
c013	Washing a table
c014	Working at a table
c015	Holding a phone/camera
c016	Playing with a phone/camera
c017	Putting a phone/camera somewhere
c018	Taking a phone/camera from somewhere
c019	Talking on a phone/camera
c020	Holding a bag
c021	Opening a bag
c022	Putting a bag somewhere
c023	Taking a bag from somewhere
c024	Throwing a bag somewhere
c025	Closing a book
c026	Holding a book
c027	Opening a book
c028	Putting a book somewhere
c029	Smiling at a book
c030	Taking a book from somewhere
c031	Throwing a book somewhere
c032	Watching/Reading/Looking at a book
c033	Holding a towel/s
c034	Putting a towel/s somewhere
c035	Taking a towel/s from somewhere
c036	Throwing a towel/s somewhere
c037	Tidying up a towel/s
c038	Washing something with a towel
c039	Closing a box
c040	Holding a box
c041	Opening a box
c042	Putting a box somewhere
c043	Taking a box from somewhere
c044	Taking something from a box
c045	Throwing a box somewhere
c046	Closing a laptop
c047	Holding a laptop
c048	Opening a laptop
c049	Putting a laptop somewhere
c050	Taking a laptop from somewhere
c051	Watching a laptop or something on a laptop
c052	Working/Playing on a laptop
c053	Holding a shoe/shoes
c054	Putting shoes somewhere
c055	Putting on shoe/shoes
c056	Taking shoes from somewhere
c057	Taking off some shoes
c058	Throwing shoes somewhere
c059	Sitting in a chair
c060	Standing on a chair
c061	Holding some food
c062	Putting some food somewhere
c063	Taking food from somewhere
c064	Throwing food somewhere
c065	Eating a sandwich
c066	Making a sandwich
c067	Holding a sandwich
c068	Putting a sandwich somewhere
c069	Taking a sandwich from somewhere
c070	Holding a blanket
c071	Putting a blanket somewhere
c072	Snuggling with a blanket
c073	Taking a blanket from somewhere
c074	Throwing a blanket somewhere
c075	Tidying up a blanket/s
c076	Holding a pillow
c077	Putting a pillow somewhere
c078	Snuggling with a pillow
c079	Taking a pillow from somewhere
c080	Throwing a pillow somewhere
c081	Putting something on a shelf
c082	Tidying a shelf or something on a shelf
c083	Reaching for and grabbing a picture
c084	Holding a picture
c085	Laughing at a picture
c086	Putting a picture somewhere
c087	Taking a picture of something
c088	Watching/looking at a picture
c089	Closing a window
c090	Opening a window
c091	Washing a window
c092	Watching/Looking outside of a window
c093	Holding a mirror
c094	Smiling in a mirror
c095	Washing a mirror
c096	Watching something/someone/themselves in a mirror
c097	Walking through a doorway
c098	Holding a broom
c099	Putting a broom somewhere
c100	Taking a broom from somewhere
c101	Throwing a broom somewhere
c102	Tidying up with a broom
c103	Fixing a light
c104	Turning on a light
c105	Turning off a light
c106	Drinking from a cup/glass/bottle
c107	Holding a cup/glass/bottle of something
c108	Pouring something into a cup/glass/bottle
c109	Putting a cup/glass/bottle somewhere
c110	Taking a cup/glass/bottle from somewhere
c111	Washing a cup/glass/bottle
c112	Closing a closet/cabinet
c113	Opening a closet/cabinet
c114	Tidying up a closet/cabinet
c115	Someone is holding a paper/notebook
c116	Putting their paper/notebook somewhere
c117	Taking paper/notebook from somewhere
c118	Holding a dish
c119	Putting a dish/es somewhere
c120	Taking a dish/es from somewhere
c121	Wash a dish/dishes
c122	Lying on a sofa/couch
c123	Sitting on sofa/couch
c124	Lying on the floor
c125	Sitting on the floor
c126	Throwing something on the floor
c127	Tidying something on the floor
c128	Holding some medicine
c129	Taking/consuming some medicine
c130	Putting groceries somewhere
c131	Laughing at television
c132	Watching television
c133	Someone is awakening in bed
c134	Lying on a bed
c135	Sitting in a bed
c136	Fixing a vacuum
c137	Holding a vacuum
c138	Taking a vacuum from somewhere
c139	Washing their hands
c140	Fixing a doorknob
c141	Grasping onto a doorknob
c142	Closing a refrigerator
c143	Opening a refrigerator
c144	Fixing their hair
c145	Working on paper/notebook
c146	Someone is awakening somewhere
c147	Someone is cooking something
c148	Someone is dressing
c149	Someone is laughing
c150	Someone is running somewhere
c151	Someone is going from standing to sitting
c152	Someone is smiling
c153	Someone is sneezing
c154	Someone is standing up from somewhere
c155	Someone is undressing
c156	Someone is eating something

Data Splits

train	validation	test
# of examples	1281167	50000	100000

Dataset Creation

Curation Rationale

Computer vision has a great potential to help our daily lives by searching for lost keys, watering flowers or reminding us to take a pill. To succeed with such tasks, computer vision methods need to be trained from real and diverse examples of our daily dynamic scenes. While most of such scenes are not particularly exciting, they typically do not appear on YouTube, in movies or TV broadcasts. So how do we collect sufficiently many diverse but boring samples representing our lives? We propose a novel Hollywood in Homes approach to collect such data. Instead of shooting videos in the lab, we ensure diversity by distributing and crowdsourcing the whole process of video creation from script writing to video recording and annotation.

Source Data

Initial Data Collection and Normalization

Similar to filming, we have a three-step process for generating a video. The first step is generating the script of the indoor video. The key here is to allow workers to generate diverse scripts yet ensure that we have enough data for each category. The second step in the process is to use the script and ask workers to record a video of that sentence being acted out. In the final step, we ask the workers to verify if the recorded video corresponds to script, followed by an annotation procedure.

Who are the source language producers?

Amazon Mechnical Turk annotators

Annotations

Annotation process

Who are the annotators?

Amazon Mechnical Turk annotators

Personal and Sensitive Information

Nothing specifically mentioned in the paper.

Considerations for Using the Data

Social Impact of Dataset

[More Information Needed]

Discussion of Biases

[More Information Needed]

Other Known Limitations

[More Information Needed]

Additional Information

Dataset Curators

AMT annotators

Licensing Information

License for Non-Commercial Use

If this software is redistributed, this license must be included. The term software includes any source files, documentation, executables, models, and data.

This software and data is available for general use by academic or non-profit, or government-sponsored researchers. It may also be used for evaluation purposes elsewhere. This license does not grant the right to use this software or any derivation of it in a for-profit enterprise. For commercial use, please contact The Allen Institute for Artificial Intelligence.

This license does not grant the right to modify and publicly release the data in any form.

This license does not grant the right to distribute the data to a third party in any form.

The subjects in this data should be treated with respect and dignity. This license only grants the right to publish short segments or still images in an academic publication where necessary to present examples, experimental results, or observations.

This software comes with no warranty or guarantee of any kind. By using this software, the user accepts full liability.

Citation Information

@article{sigurdsson2016hollywood,
    author = {Gunnar A. Sigurdsson and G{\"u}l Varol and Xiaolong Wang and Ivan Laptev and Ali Farhadi and Abhinav Gupta},
    title = {Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding},
    journal = {ArXiv e-prints},
    eprint = {1604.01753}, 
    year = {2016},
    url = {http://arxiv.org/abs/1604.01753},
}

Contributions

Thanks to @apsdehal for adding this dataset.

作者:

HuggingFaceM4

数据集大小:

43.45 KB