Studying Relationships Between Human Gaze, Description, and Computer Vision Kiwon Yun 1, Yifan Peng 1 Dimitris Samaras 1, Gregory J. Zelinsky 1,2, Tamara.

Slides:

Advertisements

Similar presentations

National Technical University of Athens Department of Electrical and Computer Engineering Image, Video and Multimedia Systems Laboratory

Advertisements

3 Small Comments Alex Berg Stony Brook University I work on recognition: features – action recognition – alignment – detection – attributes – hierarchical.

Eye Tracking Analysis of User Behavior in WWW Search Laura Granka Thorsten Joachims Geri Gay.

Beyond Mindless Labeling: Really Leveraging Humans to Build Intelligent Machines Devi Parikh Virginia Tech.

›SIFT features [3] were computed for 100 images (from ImageNet [4]) for each of our 48 subordinate-level categories. ›The visual features of an image were.

Parsing Clothing in Fashion Photographs

Overview of Nursing Informatics

Sketch Tokens: A Learned Mid-level Representation for Contour and Object Detection CVPR2013 POSTER.

Biased Normalized Cuts 1 Subhransu Maji and Jithndra Malik University of California, Berkeley IEEE Conference on Computer Vision and Pattern Recognition.

Proceedings of the IEEE 2010 Antonio Torralba, MIT Jenny Yuen, MIT Bryan C. Russell, MIT.

Statistical Recognition Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and Kristen Grauman.

Understanding Practice: Video as a Medium for Reflection & Design Lucy A. Suchman & Randall H. Trigg.

Predicting Matchability - CVPR 2014 Paper -

Computer Vision. Computer vision is concerned with the theory and technology for building artificial Computer vision is concerned with the theory and.

Click to highlight each section of the article one by one Read the section, then click once to view the description of it If you want to read it, you.

Post Modern Principles

Computer Vision Systems for the Blind and Visually Disabled. STATS 19 SEM Talk 3. Alan Yuille. UCLA. Dept. Statistics and Psychology.

Computer vision: models, learning and inference Chapter 6 Learning and Inference in Vision.

Evaluating Eye Movement Differences when Processing Subtitles Andrew T. Duchowski COMPUTER SCIENCE CLEMSON UNIVERSITY Abstract Our experimental.

Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.

Unit 2: Engineering Design Process

Technology and digital images. Objectives Describe how the characteristics and behaviors of white light allow us to see colored objects. Describe the.

Human abilities Presented By Mahmoud Awadallah 1.

Visual Object Tracking Xu Yan Quantitative Imaging Laboratory 1 Xu Yan Advisor: Shishir K. Shah Quantitative Imaging Laboratory Computer Science Department.

Computer Vision CS 776 Spring 2014 Recognition Machine Learning Prof. Alex Berg.

CHAPTER ONE The Scientific Method. Section 1: What is Science?  Science:  a way of learning more about the natural world.  questions about art, politics,

Programming in Java Unit 3. Learning outcome:  LO2:Be able to design Java solutions  LO3:Be able to implement Java solutions Assessment criteria: 

Implicit Acquisition of Context for Personalization of Information Retrieval Systems Chang Liu, Nicholas J. Belkin School of Communication and Information.

Describing Images using Inferred Visual Dependency Representations Authors : Desmond Elliot & Arjen P. de Vries Presentation of Paper by : Jantre Sanket.

Eye-Based Interaction in Graphical Systems: Theory & Practice Part III Potential Gaze-Contingent Applications.

What Helps Where – And Why? Semantic Relatedness for Knowledge Transfer Marcus Rohrbach 1,2 Michael Stark 1,2 György Szarvas 1 Iryna Gurevych 1 Bernt Schiele.

Course Overview EdSc 143- Elements of Geography (3 hours lecture/week) Deals with man and his habitat. This includes the use of the world atlas, map reading.

The Sciences of the Artificial Herbert A. Simon Prefaces & Chapter 1.

Geovisualization and Spatial Analysis of Cancer Data: Developing Visual-Computational Spatial Tools for Cancer Data Research Challenges for Spatial Data.

UNBIASED LOOK AT DATASET BIAS Antonio Torralba Massachusetts Institute of Technology Alexei A. Efros Carnegie Mellon University CVPR 2011.

Exploring Science Concepts Grade Two November 2010.

C. Lawrence Zitnick Microsoft Research, Redmond Devi Parikh Virginia Tech Bringing Semantics Into Focus Using Visual.

Intelligent Robot Architecture (1-3)  Background of research  Research objectives  By recognizing and analyzing user’s utterances and actions, an intelligent.

Human Interaction with Data “Meaningful Interpretations” “The Power of Crowdsourcing” &

Eye Tracking In Evaluating The Effectiveness OF Ads Guide : Dr. Andrew T. Duchowski.

Describing Images using Inferred Visual Dependency Representations Authors : Desmond Elliot & Arjen P. de Vries Presentation of Paper by : Jantre Sanket.

Department of Psychology & The Human Computer Interaction Program Vision Sciences Society’s Annual Meeting, Sarasota, FL May 13, 2007 Jeremiah D. Still,

Warm up Which approach comes to mind for each of the following… 1. Unconscious mind, primitive urges vs. morality 2. Self-actualization 3. Brain, nervous.

Gaze-Tracked Crowdsourcing Jakub Šimko, Mária Bieliková

An Analysis of Advertisement Perception through Eye Tracking William A. Hill Physics Youngstown State University Zach C. Joyce Computer.

Visual Computing Computer Vision 2 INFO410 & INFO350 S2 2015

Recognition Using Visual Phrases

Another Person’s Eye Gaze as a Cue in Solving Programming Problems Randy Stein Susan Brennan Stony Brook University.

The Principles of Design: Movement Module 3: Designing for Communication LESSON 7.

MMM2005The Chinese University of Hong Kong MMM2005 The Chinese University of Hong Kong 1 Video Summarization Using Mutual Reinforcement Principle and Shot.

Immersive Virtual Characters for Educating Medical Communication Skills J. Hernendez, A. Stevens, D. S. Lind Department of Surgery (College of Medicine)

1 ARTIFICIAL INTELLIGENCE Gilles BÉZARD Version 3.16.

Extracting Adaptive Contextual Cues From Unlabeled Regions Congcong Li +, Devi Parikh *, Tsuhan Chen + + Cornell University * Toyota Technological Institute.

WHAT DOES THE WORD SCIENCE MEAN?

The Study of Psychology. What to expect? Social sciences –Explore influences of society on individual behavior and group relationships Natural sciences.

GraphiCon 2008 | 1 Trajectory classification based on Hidden Markov Models Jozef Mlích and Petr Chmelař Brno University of Technology, Faculty of Information.

Object detection with deformable part-based models

CS201 Lecture 02 Computer Vision: Image Formation and Basic Techniques

Above and below the object level

What is Pattern Recognition?

The Nature of Probability and Statistics

Introduction to Psychology Chapter 1

Immigration over time Using your visual literacy skills, analyze and describe the photos and how they relate to our unit theme Journeys.

Immigration over time Using your visual literacy skills, analyze and describe the photos and how they relate to our unit theme Journeys.

J. Hernendez, A. Stevens, D. S. Lind

The Experimental Method in Psychology

Learning Object Context for Dense Captioning

Machine Learning for Visual Scene Classification with EEG Data

Deep Structured Scene Parsing by Learning with Image Descriptions

Presentation transcript:

Studying Relationships Between Human Gaze, Description, and Computer Vision Kiwon Yun 1, Yifan Peng 1 Dimitris Samaras 1, Gregory J. Zelinsky 1,2, Tamara L. Berg 3 1 Department of Computer Science, Stony Brook University 2 Department of Psychology, Stony Brook University 3 Department of Computer Science, University of North Carolina Computer Vision and Pattern Recognition (CVPR) 2013

Overview User behavior while freely viewing images contains an abundance of information - about user intent and depicted scene content. Human gaze  “where” the important things are in an image. Description  “what” is in an image, which parts of an image are important to the viewer. Computer vision  “what” might be “where” in an image. However, it will always be noisy and have no knowledge of importance.

Overview User behavior while freely viewing images contains an abundance of information - about user intent and depicted scene content. 2. From these exploratory analyses, we build prototype applications for gaze-enabled object detection and annotation. Human gaze Description An old black woman wearing a turban and a headdress and a dog next to her wearing the same red headdress Image content 1. We conduct several experiments to better understand the relationship between gaze, description, and image content.

SUN images, free-viewing for 5 seconds eye movements from 8 observers, each of whom provided a scene description. Datasets PASCAL VOC 1,000 images, free-viewing for 3 seconds eye movements from 3 observers 5 natural language descriptions per image from different observers.

Experiments and Analyses People are more likely to look at people, other animals, televisions, and vehicles. People are less likely to look at chairs, bottles, potted plants, drawers, and rugs. Animate objects are much more likely to be fixated than inanimate objects (0.636 vs ). Gaze vs. Object Type Probability of being fixated when present for various object categories (top: PASCAL, bottom: SUN09)

Experiments and Analyses Gaze vs. Location on Objects person horsebird bustraintv bicyclechairtable cabinetcurtainplant Animate objects are much more likely to be described than inanimate objects (0.843 vs ). What objects do people describe?

Experiments and Analyses What is the relationship between gaze and description? P (fixated | described)P (described | fixated) PASCAL SUN S1: A man is reading the label on a beverage bottle. S2: A man looking at the bottle of beer that he is holding. S3: The man in a white tee shirt is holding a beer bottle and looking at it. S4: The scraggly haired man is holding up and admiring his bottle of beer. S5: Young man with curly black hair holding a beer bottle. Fixated objects: bottle, person. Described objects: bottle, person

Gaze-Enabled Computer Vision Analysis of Human Gaze with Object Detectors Potential for gaze to increase the performance of object detectors varies by object category.

Gaze-Enabled Computer Vision Gaze-Enabled Object Detection and Annotation Combine gaze and automated object detection methods to create a collaborative system for detection and annotation.

Gaze-Enabled Computer Vision Gaze-Enabled Object Detection and Annotation

Conclusion Through a series of behavioral studies and experimental evaluations, we explored the information contained in eye movements and description, and analyzed their relationship with image content. We also examined the complex relationships between human gaze and outputs of current visual detection methods. In future work, we will build on this work in the development of more intelligent human-computer interactive systems for image understanding. [1] Studying Relationships Between Human Gaze, Description, and Computer Vision, Kiwon Yun, Yifan Peng, Dimitris Samaras, Gregory J. Zelinsky, and Tamara L. Berg, Computer Vision and Pattern Recognition (CVPR) 2013 (Oregon/USA) [2] Specifying the Relationships Between Objects, Gaze, and Descriptions for Scene Understanding, Kiwon Yun, Yifan Peng, Hossein Adeli, Tamara L. Berg, Dimitris Samaras, and Gregory J. Zelinsky, Visual Science Society (VSS) 2013 (Florida/USA)