Download presentation
Presentation is loading. Please wait.
Published bySybil Chandler Modified over 9 years ago
1
Beyond Mindless Labeling: Really Leveraging Humans to Build Intelligent Machines Devi Parikh Virginia Tech
2
Image Understanding Slide credit: Devi Parikh “Color College Avenue”, Blacksburg, VA, May 2012
3
Accuracy MachineHuman State of Affairs Slide credit: Devi Parikh
4
How do we teach machines today? Slide credit: Devi Parikh
12
And on, and on, and on… Slide credit: Devi Parikh
14
How do machines behave? Slide credit: Devi Parikh
15
Airplane Cabin Amusement Park Aquarium Badminton CourtBedroom Xiao et al., CVPR 2010 Slide credit: Devi Parikh
16
Clarifai, April 10 th 2014
17
Slide credit: Devi Parikh Need a better mode of communication! Interacting with Vision Systems
18
Attributes Examples: furry, natural, young, etc. Mid-level Shareable across concepts Human understandable Machine detectable Allow for human-machine communication Slide credit: Devi Parikh
19
[Parikh and Grauman, ICCV 2011] [Parkash and Parikh, ECCV 2012] [Biswas and Parikh, CVPR 2013] [Lad and Parikh, ECCV 2014] [Kovashka, Parikh and Grauman, CVPR 2012] [Parikh and Grauman, ICCV 2013] [Bansal, Farhadi and Parikh, ECCV 2014] Supervisor User Role of the Human Communicator Supervisor User Human Machine Image SearchInstilling Domain Knowledge Characterizing Failure Modes Interpretable Models My missing brother is fuller-faced than this boy. Polar bears are white and larger than rabbits. If the image is blurry or the face is not frontal, I may fail. Active and Interactive Learning Slide credit: Devi Parikh Supervisor I think this is a polar bear because this is a white and furry animal.
20
Accessing Common Sense Direct communication Learn by observing structure in our visual world? Slide credit: Devi Parikh
21
Two professors converse in front of a blackboard. Slide credit: Larry Zitnick
22
Two professors stand in front of a blackboard. Slide credit: Larry Zitnick
23
Two professors converse in front of a blackboard. Slide credit: Larry Zitnick
24
Challenges Lacking visual density Annotations are expensive (and boring) Computer vision doesn’t work well enough Slide credit: Devi Parikh
25
Is photorealism necessary? Slide credit: Larry Zitnick
26
JennyMike Slide credit: Larry Zitnick
29
Interface 2x
30
Mike fights off a bear by giving him a hotdog while Jenny runs away. Slide credit: Larry Zitnick
31
1,000 classes of semantically similar scenes: Class 1 Class 2 Class 1,000 1,000 classes x 10 scenes per class = 10,000 scenes Slide credit: Larry Zitnick Dataset Dataset online [Zitnick and Parikh, CVPR 2013]
32
Slide credit: Larry Zitnick Visual Features
33
Cloud Cat Basketball Smile Gaze Person sitting Tree Person standing Slide credit: Larry Zitnick Visual Features
34
Cloud Cat Basketball Smile Gaze Person sitting Tree Person standing Slide credit: Devi Parikh Visual Features Which visual features are important for semantic meaning? Which words correlate with specific visual features?
35
Generate and Retrieve Scenes Input: Jenny is catching the ball. Mike is kicking the ball. The table is next to the tree. Tuples:,, >,, >,,<>> Slide credit: Devi Parikh [Zitnick, Parikh and Vanderwende, ICCV 2013] Automatically GeneratedHuman Generated Retrieval: score a database of scenes
36
Slide credit: Devi Parikh [Antol, Zitnick and Parikh, ECCV 2014] Learning Fine-grained Interactions 3x
37
Learning Fine-grained Interactions Train on abstract, test on real
38
Results: 60 categories Accuracy % Domain adaptation Learn explicit mapping from abstract to real world Multi-label problem [Antol, Zitnick and Parikh, ECCV 2014]
39
Visual Abstraction For… Studying mappings between images and text Zero-shot learning Studying image memorability, specificity, etc. Learning common sense knowledge Rich annotation modality – Ask for descriptions – Ask for scenes – Show scene and ask for modification Goes beyond “Jenny and Mike.” Study high-level image understanding tasks without waiting for lower-level vision tasks to be solved
40
[Xinlei Chen]
41
Accuracy MachineHuman Conclusion Give computer vision systems access to common- sense knowledge – Communication with humans via attributes (text) – Visual abstraction Use humans for more than just “labels” Slide credit: Devi Parikh
42
Thank you. Slide credit: Devi Parikh
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.