Presentation is loading. Please wait.

Presentation is loading. Please wait.

Adri`a Recasens, Aditya Khosla, Carl Vondrick, Antonio Torralba

Similar presentations


Presentation on theme: "Adri`a Recasens, Aditya Khosla, Carl Vondrick, Antonio Torralba"— Presentation transcript:

1 Adri`a Recasens, Aditya Khosla, Carl Vondrick, Antonio Torralba
Where are they looking? Adri`a Recasens, Aditya Khosla, Carl Vondrick, Antonio Torralba -- Presented by Yinan Zhao

2 Outline Motivation Approach Dataset Experiments Extension

3 Outline Motivation Approach Dataset Experiments Extension

4 Motivation Human’s remarkable ability to follow gaze
Crucial in interaction with other people and environment

5 Motivation Human’s remarkable ability to follow gaze
Crucial in interaction with other people and environment Joint attention is an important part of early language learning for children.

6 Motivation No look pass by Magic Jackson

7 Motivation Is it possible for machines to perform gaze-following in natural settings without restrictive assumptions when only a single view is available?

8 Motivation Is it possible for machines to perform gaze-following in natural settings without restrictive assumptions when only a single view is available?

9 Outline Motivation Approach Dataset Experiments Extension

10 Approach How do humans tend to follow gaze?

11 Approach How do humans tend to follow gaze?
First look at person’s head and eyes to estimate their field of view Head Detection

12 Approach How do humans tend to follow gaze?
First look at person’s head and eyes to estimate their field of view

13 Approach How do humans tend to follow gaze?
First look at person’s head and eyes to estimate their field of view Subsequenly reason about salient objects in their perspective

14 Approach

15 Approach Convolutional layered combined with FC?

16 Approach Saliency Pathway
See the full image but not the person’s location Produce a spatial map of size D×D (D=13) We hope it learns to find objects that are salient Independent of the person’s viewpoint

17 Approach Gaze Pathway Only has access to the closeup image of the person’s head and their location Produce a spatial map of the same size D×D (D=13) Expect it will learn to predict the direction of gaze Head orientation is modeled implicitly

18 Approach Shifted Grids
Formulate the problem as classification supporting multimodal outputs naturally Quantize the fixation location y into N×N grid Large N: Harder learning due to no gradual penalty on spatial categories Small N: Poor precision Shifted grids increases resolution while keeping learning easy.

19 Approach Shifted Grids
Formulate the problem as classification supporting multimodal outputs naturally Quantize the fixation location y into N×N grid Large N: Harder learning due to no gradual penalty on spatial categories Small N: Poor precision Shifted grids increases resolution while keeping learning easy. Other approach for multimodal distribution?

20 Approach Shifted Grids
Solve several overlapping classification problems Average shifted outputs to produce the final prediction

21 Approach Shifted Grids
Solve several overlapping classification problems Average shifted outputs to produce the final prediction

22 They have different values now Resolution is increased!
Approach Shifted Grids Solve several overlapping classification problems Average shifted outputs to produce the final prediction They have different values now Resolution is increased!

23 Approach Training Differentiable. End-to-end using backpropagation.

24 Approach Training Differentiable. End-to-end using backpropagation.
softmax loss

25 Approach Training Differentiable. End-to-end using backpropagation.
Supervision on gaze fixations only The role of saliency and gaze pathways emerges automatically softmax loss

26 Outline Motivation Approach Dataset Experiments Extension

27 Dataset GazeFollow Large-scale dataset annotated with the location where people are looking 1548 from SUN, from MS COCO, 9135 from Action 40 7791 from PASCAL, 508 from ImageNet, from Places Annotated with AMT. Mark the center of eyes and where the person is looking Finally, people in images with gaze location inside the image

28 Dataset GazeFollow

29 Outline Motivation Approach Dataset Experiments Extension

30 Experiments Implementation AlexNet architecture Initialization
Saliency pathway: Places-CNN Gaze pathway: ImageNet-CNN Augment training data Flip Random crop Train/Test 4782 people for testing, the rest for training Uniform fixation location in test set 10 gaze annotations per person in test set 100 400 200 169 1×1×256 kernel N=5 shifted grids GT

31 Experiments Result Effectiveness of shifted grids

32 Experiments Result Effectiveness of shifted grids
Outperform all baselines by a margin

33 Experiments Result Effectiveness of shifted grids
Outperform all baselines by a margin Far from human performance

34 Outline Motivation Approach Dataset Experiments Extension

35 Extension Gaze out of frame Extention for videos? Motion?
Co gaze following 2.5D

36 Reference [1] Recasens, Adria, et al. "Where are they looking?." Advances in Neural Information Processing Systems [2] Pusiol, Guido, et al. "Discovering the signatures of joint attention in child-caregiver interaction." Proceedings of the 36th annual meeting of the Cognitive Science Society, Quebec City, Canada [3] Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems [4]

37 Thanks!


Download ppt "Adri`a Recasens, Aditya Khosla, Carl Vondrick, Antonio Torralba"

Similar presentations


Ads by Google