Visual Attributes in Video Marielle Morris May 26, 2017
Project Goals Attributes: descriptive labels Ex. a trotting horse, a man with a pointy nose Identify and track attributes in videos Focus on time-dependent traits Different than action localization
Project Outline Create large-scale video attribute dataset Ideally build on existing, pre-annotated set Amazon Mechanical Turk to tag attributes Localize primary object in each frame Maintain in “spatio-temporal tube” across clip Train attribute prediction model Predict attribute dynamic in relation to object tube
Dataset Survey Name Clips Annotation Attributes Notes DAVIS 150 Obj segmentation — Humans, animals, objects, vehicles USAA 100 “Weak” Per-video Attributes are not per-frame SegTrack v2 14 Small dataset FBMS-59 59 Background sub Not very diverse, small dataset BVSD Fragmented ground-truth annotations CMU 30 Outlines Few frames annotated, limited motion UMASS 38 Bad annotations VSB100 Every 20th frame annotated McGillFaces 60 Pose All Faces Gesture 16 Gestures All Hand Gestures Youtube-Faces 3425 Bounding box CMU Panoptic 65 Skeleton/Pose All action sequences WWW Crowd 10000 Action/scene Poor attribute selections Youtube-Objects 126 Object is not in every frame, every 10th CRP 7 Action sequences, high-res, every 10th Youtube-BB 380000 Single object, 23 object types
Current Choice: Youtube-BB 380,000 video segments of 15-20s Single object bounding boxes at 1 fps 23 object types
Attribute Labeling: Single Frame Choose three of the following attributes: White Sniffing Preparing to jump Small Crouching
Attribute Annotation: Reel Choose three of the following attributes: White Sniffing Preparing to jump Small Crouching
Timeline: Week Proof of concept with Amazon Mechanical Turk Crowd-source results of single frame vs. reel Method: COCO Attributes (Patterson & Hays) Economic Labeling Algorithm for annotation Formulate plan for attribute generation
Timeline: Month Prepare YT-BB dataset Create Human Intelligence Task for labeling Building off COCO Attributes Github Minor modification: adding image reel to GUI