数据集:
HuggingFaceM4/charades
语言:
en计算机处理:
monolingual大小:
1K<n<10K语言创建人:
crowdsourced批注创建人:
crowdsourced源数据集:
original预印本库:
arxiv:1604.01753许可:
otherCharades is dataset composed of 9848 videos of daily indoors activities collected through Amazon Mechanical Turk. 267 different users were presented with a sentence, that includes objects and actions from a fixed vocabulary, and they recorded a video acting out the sentence (like in a game of Charades). The dataset contains 66,500 temporal annotations for 157 action classes, 41,104 labels for 46 object classes, and 27,847 textual descriptions of the videos
The annotations in the dataset are in English.
{ "video_id": "46GP8", "video": "/home/amanpreet_huggingface_co/.cache/huggingface/datasets/downloads/extracted/3f022da5305aaa189f09476dbf7d5e02f6fe12766b927c076707360d00deb44d/46GP8.mp4", "subject": "HR43", "scene": "Kitchen", "quality": 6, "relevance": 7, "verified": "Yes", "script": "A person cooking on a stove while watching something out a window.", "objects": ["food", "stove", "window"], "descriptions": [ "A person cooks food on a stove before looking out of a window." ], "labels": [92, 147], "action_timings": [ [11.899999618530273, 21.200000762939453], [0.0, 12.600000381469727] ], "length": 24.829999923706055 }
id | Class |
---|---|
c000 | Holding some clothes |
c001 | Putting clothes somewhere |
c002 | Taking some clothes from somewhere |
c003 | Throwing clothes somewhere |
c004 | Tidying some clothes |
c005 | Washing some clothes |
c006 | Closing a door |
c007 | Fixing a door |
c008 | Opening a door |
c009 | Putting something on a table |
c010 | Sitting on a table |
c011 | Sitting at a table |
c012 | Tidying up a table |
c013 | Washing a table |
c014 | Working at a table |
c015 | Holding a phone/camera |
c016 | Playing with a phone/camera |
c017 | Putting a phone/camera somewhere |
c018 | Taking a phone/camera from somewhere |
c019 | Talking on a phone/camera |
c020 | Holding a bag |
c021 | Opening a bag |
c022 | Putting a bag somewhere |
c023 | Taking a bag from somewhere |
c024 | Throwing a bag somewhere |
c025 | Closing a book |
c026 | Holding a book |
c027 | Opening a book |
c028 | Putting a book somewhere |
c029 | Smiling at a book |
c030 | Taking a book from somewhere |
c031 | Throwing a book somewhere |
c032 | Watching/Reading/Looking at a book |
c033 | Holding a towel/s |
c034 | Putting a towel/s somewhere |
c035 | Taking a towel/s from somewhere |
c036 | Throwing a towel/s somewhere |
c037 | Tidying up a towel/s |
c038 | Washing something with a towel |
c039 | Closing a box |
c040 | Holding a box |
c041 | Opening a box |
c042 | Putting a box somewhere |
c043 | Taking a box from somewhere |
c044 | Taking something from a box |
c045 | Throwing a box somewhere |
c046 | Closing a laptop |
c047 | Holding a laptop |
c048 | Opening a laptop |
c049 | Putting a laptop somewhere |
c050 | Taking a laptop from somewhere |
c051 | Watching a laptop or something on a laptop |
c052 | Working/Playing on a laptop |
c053 | Holding a shoe/shoes |
c054 | Putting shoes somewhere |
c055 | Putting on shoe/shoes |
c056 | Taking shoes from somewhere |
c057 | Taking off some shoes |
c058 | Throwing shoes somewhere |
c059 | Sitting in a chair |
c060 | Standing on a chair |
c061 | Holding some food |
c062 | Putting some food somewhere |
c063 | Taking food from somewhere |
c064 | Throwing food somewhere |
c065 | Eating a sandwich |
c066 | Making a sandwich |
c067 | Holding a sandwich |
c068 | Putting a sandwich somewhere |
c069 | Taking a sandwich from somewhere |
c070 | Holding a blanket |
c071 | Putting a blanket somewhere |
c072 | Snuggling with a blanket |
c073 | Taking a blanket from somewhere |
c074 | Throwing a blanket somewhere |
c075 | Tidying up a blanket/s |
c076 | Holding a pillow |
c077 | Putting a pillow somewhere |
c078 | Snuggling with a pillow |
c079 | Taking a pillow from somewhere |
c080 | Throwing a pillow somewhere |
c081 | Putting something on a shelf |
c082 | Tidying a shelf or something on a shelf |
c083 | Reaching for and grabbing a picture |
c084 | Holding a picture |
c085 | Laughing at a picture |
c086 | Putting a picture somewhere |
c087 | Taking a picture of something |
c088 | Watching/looking at a picture |
c089 | Closing a window |
c090 | Opening a window |
c091 | Washing a window |
c092 | Watching/Looking outside of a window |
c093 | Holding a mirror |
c094 | Smiling in a mirror |
c095 | Washing a mirror |
c096 | Watching something/someone/themselves in a mirror |
c097 | Walking through a doorway |
c098 | Holding a broom |
c099 | Putting a broom somewhere |
c100 | Taking a broom from somewhere |
c101 | Throwing a broom somewhere |
c102 | Tidying up with a broom |
c103 | Fixing a light |
c104 | Turning on a light |
c105 | Turning off a light |
c106 | Drinking from a cup/glass/bottle |
c107 | Holding a cup/glass/bottle of something |
c108 | Pouring something into a cup/glass/bottle |
c109 | Putting a cup/glass/bottle somewhere |
c110 | Taking a cup/glass/bottle from somewhere |
c111 | Washing a cup/glass/bottle |
c112 | Closing a closet/cabinet |
c113 | Opening a closet/cabinet |
c114 | Tidying up a closet/cabinet |
c115 | Someone is holding a paper/notebook |
c116 | Putting their paper/notebook somewhere |
c117 | Taking paper/notebook from somewhere |
c118 | Holding a dish |
c119 | Putting a dish/es somewhere |
c120 | Taking a dish/es from somewhere |
c121 | Wash a dish/dishes |
c122 | Lying on a sofa/couch |
c123 | Sitting on sofa/couch |
c124 | Lying on the floor |
c125 | Sitting on the floor |
c126 | Throwing something on the floor |
c127 | Tidying something on the floor |
c128 | Holding some medicine |
c129 | Taking/consuming some medicine |
c130 | Putting groceries somewhere |
c131 | Laughing at television |
c132 | Watching television |
c133 | Someone is awakening in bed |
c134 | Lying on a bed |
c135 | Sitting in a bed |
c136 | Fixing a vacuum |
c137 | Holding a vacuum |
c138 | Taking a vacuum from somewhere |
c139 | Washing their hands |
c140 | Fixing a doorknob |
c141 | Grasping onto a doorknob |
c142 | Closing a refrigerator |
c143 | Opening a refrigerator |
c144 | Fixing their hair |
c145 | Working on paper/notebook |
c146 | Someone is awakening somewhere |
c147 | Someone is cooking something |
c148 | Someone is dressing |
c149 | Someone is laughing |
c150 | Someone is running somewhere |
c151 | Someone is going from standing to sitting |
c152 | Someone is smiling |
c153 | Someone is sneezing |
c154 | Someone is standing up from somewhere |
c155 | Someone is undressing |
c156 | Someone is eating something |
train | validation | test | |
---|---|---|---|
# of examples | 1281167 | 50000 | 100000 |
Computer vision has a great potential to help our daily lives by searching for lost keys, watering flowers or reminding us to take a pill. To succeed with such tasks, computer vision methods need to be trained from real and diverse examples of our daily dynamic scenes. While most of such scenes are not particularly exciting, they typically do not appear on YouTube, in movies or TV broadcasts. So how do we collect sufficiently many diverse but boring samples representing our lives? We propose a novel Hollywood in Homes approach to collect such data. Instead of shooting videos in the lab, we ensure diversity by distributing and crowdsourcing the whole process of video creation from script writing to video recording and annotation.
Similar to filming, we have a three-step process for generating a video. The first step is generating the script of the indoor video. The key here is to allow workers to generate diverse scripts yet ensure that we have enough data for each category. The second step in the process is to use the script and ask workers to record a video of that sentence being acted out. In the final step, we ask the workers to verify if the recorded video corresponds to script, followed by an annotation procedure.
Who are the source language producers?Amazon Mechnical Turk annotators
Similar to filming, we have a three-step process for generating a video. The first step is generating the script of the indoor video. The key here is to allow workers to generate diverse scripts yet ensure that we have enough data for each category. The second step in the process is to use the script and ask workers to record a video of that sentence being acted out. In the final step, we ask the workers to verify if the recorded video corresponds to script, followed by an annotation procedure.
Who are the annotators?Amazon Mechnical Turk annotators
Nothing specifically mentioned in the paper.
[More Information Needed]
[More Information Needed]
[More Information Needed]
AMT annotators
License for Non-Commercial Use
If this software is redistributed, this license must be included. The term software includes any source files, documentation, executables, models, and data.
This software and data is available for general use by academic or non-profit, or government-sponsored researchers. It may also be used for evaluation purposes elsewhere. This license does not grant the right to use this software or any derivation of it in a for-profit enterprise. For commercial use, please contact The Allen Institute for Artificial Intelligence.
This license does not grant the right to modify and publicly release the data in any form.
This license does not grant the right to distribute the data to a third party in any form.
The subjects in this data should be treated with respect and dignity. This license only grants the right to publish short segments or still images in an academic publication where necessary to present examples, experimental results, or observations.
This software comes with no warranty or guarantee of any kind. By using this software, the user accepts full liability.
The Allen Institute for Artificial Intelligence (C) 2016.
@article{sigurdsson2016hollywood, author = {Gunnar A. Sigurdsson and G{\"u}l Varol and Xiaolong Wang and Ivan Laptev and Ali Farhadi and Abhinav Gupta}, title = {Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding}, journal = {ArXiv e-prints}, eprint = {1604.01753}, year = {2016}, url = {http://arxiv.org/abs/1604.01753}, }
Thanks to @apsdehal for adding this dataset.