Presentation is loading. Please wait.

Presentation is loading. Please wait.

Beyond Mindless Labeling: Really Leveraging Humans to Build Intelligent Machines Devi Parikh Virginia Tech.

Similar presentations


Presentation on theme: "Beyond Mindless Labeling: Really Leveraging Humans to Build Intelligent Machines Devi Parikh Virginia Tech."— Presentation transcript:

1 Beyond Mindless Labeling: Really Leveraging Humans to Build Intelligent Machines Devi Parikh Virginia Tech

2 Image Understanding Slide credit: Devi Parikh “Color College Avenue”, Blacksburg, VA, May 2012

3 Accuracy MachineHuman State of Affairs Slide credit: Devi Parikh

4 How do we teach machines today? Slide credit: Devi Parikh

5

6

7

8

9

10

11

12 And on, and on, and on… Slide credit: Devi Parikh

13

14 How do machines behave? Slide credit: Devi Parikh

15 Airplane Cabin Amusement Park Aquarium Badminton CourtBedroom Xiao et al., CVPR 2010 Slide credit: Devi Parikh

16 Clarifai, April 10 th 2014

17 Slide credit: Devi Parikh Need a better mode of communication! Interacting with Vision Systems

18 Attributes Examples: furry, natural, young, etc. Mid-level Shareable across concepts Human understandable Machine detectable Allow for human-machine communication Slide credit: Devi Parikh

19 [Parikh and Grauman, ICCV 2011] [Parkash and Parikh, ECCV 2012] [Biswas and Parikh, CVPR 2013] [Lad and Parikh, ECCV 2014] [Kovashka, Parikh and Grauman, CVPR 2012] [Parikh and Grauman, ICCV 2013] [Bansal, Farhadi and Parikh, ECCV 2014] Supervisor User Role of the Human Communicator Supervisor User Human Machine Image SearchInstilling Domain Knowledge Characterizing Failure Modes Interpretable Models My missing brother is fuller-faced than this boy. Polar bears are white and larger than rabbits. If the image is blurry or the face is not frontal, I may fail. Active and Interactive Learning Slide credit: Devi Parikh Supervisor I think this is a polar bear because this is a white and furry animal.

20 Accessing Common Sense Direct communication Learn by observing structure in our visual world? Slide credit: Devi Parikh

21 Two professors converse in front of a blackboard. Slide credit: Larry Zitnick

22 Two professors stand in front of a blackboard. Slide credit: Larry Zitnick

23 Two professors converse in front of a blackboard. Slide credit: Larry Zitnick

24 Challenges Lacking visual density Annotations are expensive (and boring) Computer vision doesn’t work well enough Slide credit: Devi Parikh

25 Is photorealism necessary? Slide credit: Larry Zitnick

26 JennyMike Slide credit: Larry Zitnick

27

28

29 Interface 2x

30 Mike fights off a bear by giving him a hotdog while Jenny runs away. Slide credit: Larry Zitnick

31 1,000 classes of semantically similar scenes: Class 1 Class 2 Class 1,000 1,000 classes x 10 scenes per class = 10,000 scenes Slide credit: Larry Zitnick Dataset Dataset online [Zitnick and Parikh, CVPR 2013]

32 Slide credit: Larry Zitnick Visual Features

33 Cloud Cat Basketball Smile Gaze Person sitting Tree Person standing Slide credit: Larry Zitnick Visual Features

34 Cloud Cat Basketball Smile Gaze Person sitting Tree Person standing Slide credit: Devi Parikh Visual Features Which visual features are important for semantic meaning? Which words correlate with specific visual features?

35 Generate and Retrieve Scenes Input: Jenny is catching the ball. Mike is kicking the ball. The table is next to the tree. Tuples:,, >,, >,,<>> Slide credit: Devi Parikh [Zitnick, Parikh and Vanderwende, ICCV 2013] Automatically GeneratedHuman Generated Retrieval: score a database of scenes

36 Slide credit: Devi Parikh [Antol, Zitnick and Parikh, ECCV 2014] Learning Fine-grained Interactions 3x

37 Learning Fine-grained Interactions Train on abstract, test on real

38 Results: 60 categories Accuracy % Domain adaptation Learn explicit mapping from abstract to real world Multi-label problem [Antol, Zitnick and Parikh, ECCV 2014]

39 Visual Abstraction For… Studying mappings between images and text Zero-shot learning Studying image memorability, specificity, etc. Learning common sense knowledge Rich annotation modality – Ask for descriptions – Ask for scenes – Show scene and ask for modification Goes beyond “Jenny and Mike.” Study high-level image understanding tasks without waiting for lower-level vision tasks to be solved

40 [Xinlei Chen]

41 Accuracy MachineHuman Conclusion Give computer vision systems access to common- sense knowledge – Communication with humans via attributes (text) – Visual abstraction Use humans for more than just “labels” Slide credit: Devi Parikh

42 Thank you. Slide credit: Devi Parikh


Download ppt "Beyond Mindless Labeling: Really Leveraging Humans to Build Intelligent Machines Devi Parikh Virginia Tech."

Similar presentations


Ads by Google