Above and below the object level

Slides:



Advertisements
Similar presentations
Object recognition and scene “understanding”
Advertisements

Rapid Object Detection using a Boosted Cascade of Simple Features Paul Viola, Michael Jones Conference on Computer Vision and Pattern Recognition 2001.
My Group’s Current Research on Image Understanding.
Object Recognition & Model Based Tracking © Danica Kragic Tracking system.
Image classification by sparse coding.
Visual Cognition II Object Perception. Theories of Object Recognition Template matching models Feature matching Models Recognition-by-components Configural.
1 Abstract This paper presents a novel modification to the classical Competitive Learning (CL) by adding a dynamic branching mechanism to neural networks.
Spatial Pyramid Pooling in Deep Convolutional
PhD Thesis. Biometrics Science studying measurements and statistics of biological data Most relevant application: id. recognition 2.
T HE VISUAL INTERFACE Human Visual Perception Includes material from Dix et al, 2006, Human Computer Interaction, Chapter 1 1.
Image and interface Pertti Saariluoma University of Helsinki.
Cognitive Development: Piaget Believed that intelligence was not random, but was a set of organized cognitive structures that the infant actively constructed.
Artificial Neural Nets and AI Connectionism Sub symbolic reasoning.
Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial Visual Object Recognition Bastian Leibe & Computer Vision Laboratory ETH.
Object Perception (Recognizing the things we see).
3.2 VISION 70% of your receptor cells are in your eyes taste and touch need direct contact where as sight and smell don’t Sight can be experienced from.
November 13, 2014Computer Vision Lecture 17: Object Recognition I 1 Today we will move on to… Object Recognition.
Learning to Detect Faces A Large-Scale Application of Machine Learning (This material is not in the text: for further information see the paper by P.
Stochastic Grammars: Overview Representation: Stochastic grammar Representation: Stochastic grammar Terminals: object interactions Terminals: object interactions.
Traffic scene related change blindness in older drivers Professor: Liu Student: Ruby.
Object Recognizing. Deep Learning Success in 2012 DeepNet and speech processing.
Meeting 8: Features for Object Classification Ullman et al.
PANDA: Pose Aligned Networks for Deep Attribute Modeling Ning Zhang 1,2 Manohar Paluri 1 Marć Aurelio Ranzato 1 Trevor Darrell 2 Lumbomir Boudev 1 1 Facebook.
FACE RECOGNITION. A facial recognition system is a computer application for automatically identifying or verifying a person from a digital image or a.
Inspiring today’s children for tomorrow’s world Early Years Foundation Stage Assessment Procedure 2016.
Chapter 9 Knowledge. Some Questions to Consider Why is it difficult to decide if a particular object belongs to a particular category, such as “chair,”
Wenchi MA CV Group EECS,KU 03/20/2017
A Plane-Based Approach to Mondrian Stereo Matching
9.012 Presentation by Alex Rakhlin March 16, 2001
Our Color vision is Limited
CSSE463: Image Recognition Day 26
What Convnets Make for Image Captioning?
Visual Information Retrieval
Visual Computation and Learning Lab
JEAN PAIGET "The principle goal of education in the schools should be creating men and women who are capable of doing new things, not simply repeating.
Recognizing Deformable Shapes
“How does animal companionship affect overall wellbeing in individuals suffering from severe loneliness and mental disorder?” Byron Edge.
Perceiving and Representing Structured Information using Objects
Automatic Digitizing.
Perception Unit How can we so easily perceive the world? Objects may be moving, partially hidden, varying in orientation, and projected as a 2D image.
Feature based vs. holistic processing
How do we realize design? What should we consider? Technical Visual Interaction Search Context of use Information Interacting/ transacting.
© 2016 by W. W. Norton & Company Recognizing Objects Chapter 4 Lecture Outline.
Deep Belief Networks Psychology 209 February 22, 2013.
Enhanced-alignment Measure for Binary Foreground Map Evaluation
Brain States: Top-Down Influences in Sensory Processing
Attention-based Caption Description Mun Jonghwan.
Course Instructor: knza ch
Image Classification.
Feature based vs. holistic processing
Investigation 21. 3: Subtractive Color Process
CSc4730/6730 Scientific Visualization
Bolun Wang*, Yuanshun Yao, Bimal Viswanath§ Haitao Zheng, Ben Y. Zhao
Domingo Mery Department of Computer Science
Brief Review of Recognition + Context
Object Classes Most recent work is at the object level We perceive the world in terms of objects, belonging to different classes. What are the differences.
Object Recognition Today we will move on to… April 12, 2018
Pattern Recognition Binding Edge Detection
Karl R Gegenfurtner, Jochem Rieger  Current Biology 
Brain States: Top-Down Influences in Sensory Processing
Gradient Domain Salience-preserving Color-to-gray Conversion
Recognizing Deformable Shapes
Toward a Great Class Project: Discussion of Stoianov & Zorzi’s Numerosity Model Psych 209 – 2019 Feb 14, 2019.
Domingo Mery Department of Computer Science
Photographic Compositions
 Piliavin et al. developed a model to explain their results called the Arousal: Cost vs. Reward model. They argue that firstly, observation of an emergency.
Learning complex visual concepts
Spec model application
Visual Grounding.
CVPR 2019 Poster.
Presentation transcript:

Above and below the object level

Object level: Single object recognition (ImageNet)

Below the object level: recognizing local configurations The human visual system makes highly effective use of limited information: it can recognize not only objects, but severely reduced sub-configurations in terms of size or resolution. ‘Configurations’ rather than well-defined parts.

Minimizing variability Motivation: interested because they are useful for the interpretation of complex scenes Will see more, but the basic reason is that the reduce variability Generalization was much better with the reduced images Useful for the interpretation of complex scenes

Searching for Minimal Images Eg 40 to 35 Over 15,000 Mechanical Turk subjects ‘Atoms of Recognition’ PNAS 2016

Pairs Parent – MIRC, Child – ‘sub-MIRC’

0.79 0.0

0.88 0.14

0.88 0.16

Average 16 per class After removing highly overlapping ones Jaccard > 0.5

Cover Average 16.9 / class Highly redundant Each MIRC is non-redundant: each feature is important Highly redundant, each MIRC is non-redundant Number of visual elements is the same at different scales

Testing computational models

DNN do not reach human level recognition on minimal images Recognition of minimal images does not emerge by training any of the existing models tested. The large gap at the minimal level is not reproduced The accuracy of recognition is lower than human’s Representations used by existing models do not capture differences that human recognition is sensitive to

Minimal Images: Internal Interpretation Humans can interpret detailed sub-structures within the ‘atomic’ MIRC. Cannot be done by current feed-forward models Another basic aspect that people can do and models cannot:

Internal Interpretation Another limitation of current models / A related task that people can do Also tells us about features and representations that humans use Examples of internal interpretations, produced automatically by a model, validated by Turk These structures do not appear in the same way in the false detections. False detection: using Felzenszwalb Internal interpretations, perceived by humans Cannot be produced by existing feed-forward models

Current recognition models are feed-forward only This is likely to limit their ability to providing interpretation of find details A model that can produce interpretation of MIRCs The model uses a top-down stage (back to V1) Ben-Yossef et al, CogSci 2015 A model for full local image interpretation

Above the object level

Image Captioning

Automatic caption: A brown horse standing next to a building. 2017

Automatic caption: a man is standing in front woman in white shirt.

Stealing Human description: ‘Two women sitting at restaurant table while a man in a black shirt takes one of their purses off the chair, while they are not looking’

Components: Man, Woman, Purse, Chair Properties: Young, dark-hair, red, Relations: Man grabs purse, Purse hanging of chair, Woman sitting on chair

This is what vision produces, and we identify the event as ‘stealing’ Purse Hanging-of Chair Red Grabbing Sitting Man Woman Dark Not-looking We also connect each component to the image or in some working memory This is what vision produces, and we identify the event as ‘stealing’

Stealing Grabbing Owner Not-looking Person A Object X Person B Object X and 2 people: we also need to know that person B is unaware of the grabbing Abstract and cognitive (shared with non-sighted), broad generalization

The ‘ownership’ is added to the structure, Purse Hanging-of Chair Red Owns Grabbing Sitting Man Woman Dark Not-looking The ‘ownership’ is added to the structure, This cognitive addition contributes to ‘stealing’