A System for Observing and Recognizing Objects in the Real World J.O. Eklundh, M. Björkman, E. Hayman Computational Vision and Active Perception Lab Royal.

Slides:

Advertisements

Similar presentations

Bayesian Belief Propagation

Advertisements

Probabilistic Tracking and Recognition of Non-rigid Hand Motion

Change Detection C. Stauffer and W.E.L. Grimson, “Learning patterns of activity using real time tracking,” IEEE Trans. On PAMI, 22(8): , Aug 2000.

Analysis of Contour Motions Ce Liu William T. Freeman Edward H. Adelson Computer Science and Artificial Intelligence Laboratory Massachusetts Institute.

MASKS © 2004 Invitation to 3D vision Lecture 7 Step-by-Step Model Buidling.

Introduction To Tracking

Object Recognition & Model Based Tracking © Danica Kragic Tracking system.

Qualifying Exam: Contour Grouping Vida Movahedi Supervisor: James Elder Supervisory Committee: Minas Spetsakis, Jeff Edmonds York University Summer 2009.

Foreground Modeling The Shape of Things that Came Nathan Jacobs Advisor: Robert Pless Computer Science Washington University in St. Louis.

Tracking Multiple Occluding People by Localizing on Multiple Scene Planes Professor ：王聖智教授 Student ：周節.

Multiple People Detection and Tracking with Occlusion Presenter: Feifei Huo Supervisor: Dr. Emile A. Hendriks Dr. A. H. J. Stijn Oomes Information and.

EE462 MLCV Lecture 5-6 Object Detection – Boosting Tae-Kyun Kim.

Formation et Analyse d’Images Session 8

Robust Moving Object Detection & Categorization using self- improving classifiers Omar Javed, Saad Ali & Mubarak Shah.

Beyond bags of features: Part-based models Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.

Tracking Objects with Dynamics Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 04/21/15 some slides from Amin Sadeghi, Lana Lazebnik,

1 Image Recognition - I. Global appearance patterns Slides by K. Grauman, B. Leibe.

A Study of Approaches for Object Recognition

Automatic Image Alignment (feature-based) : Computational Photography Alexei Efros, CMU, Fall 2005 with a lot of slides stolen from Steve Seitz and.

High-Quality Video View Interpolation

Feature extraction: Corners and blobs

1 Integration of Background Modeling and Object Tracking Yu-Ting Chen, Chu-Song Chen, Yi-Ping Hung IEEE ICME, 2006.

A Novel 2D To 3D Image Technique Based On Object- Oriented Conversion.

Automatic Image Alignment (feature-based) : Computational Photography Alexei Efros, CMU, Fall 2006 with a lot of slides stolen from Steve Seitz and.

December 2, 2014Computer Vision Lecture 21: Image Understanding 1 Today’s topic is.. Image Understanding.

Face Processing System Presented by: Harvest Jang Group meeting Fall 2002.

Stereo matching Class 10 Read Chapter 7 Tsukuba dataset.

Computer Vision Spring ,-685 Instructor: S. Narasimhan WH 5409 T-R 10:30am – 11:50am Lecture #15.

Computer Vision: Summary and Discussion

Distinctive Image Features from Scale-Invariant Keypoints By David G. Lowe, University of British Columbia Presented by: Tim Havinga, Joël van Neerbos.

Computer Vision James Hays, Brown

CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu Lecture 35 – Review for midterm.

Prakash Chockalingam Clemson University Non-Rigid Multi-Modal Object Tracking Using Gaussian Mixture Models Committee Members Dr Stan Birchfield (chair)

BraMBLe: The Bayesian Multiple-BLob Tracker By Michael Isard and John MacCormick Presented by Kristin Branson CSE 252C, Fall 2003.

SVCL Automatic detection of object based Region-of-Interest for image compression Sunhyoung Han.

1. Introduction Motion Segmentation The Affine Motion Model Contour Extraction & Shape Estimation Recursive Shape Estimation & Motion Estimation Occlusion.

A General Framework for Tracking Multiple People from a Moving Camera

The Correspondence Problem and “Interest Point” Detection Václav Hlaváč Center for Machine Perception Czech Technical University Prague

Dynamic 3D Scene Analysis from a Moving Vehicle Young Ki Baik (CV Lab.) (Wed)

Person detection, tracking and human body analysis in multi-camera scenarios Montse Pardàs (UPC) ACV, Bilkent University, MTA-SZTAKI, Technion-ML, University.

Computer Vision: Summary and Discussion Computer Vision ECE 5554 Virginia Tech Devi Parikh 11/21/

Augmenting Physical State Prediction Through Structured Activity Inference Nam Vo & Aaron Bobick ICRA 2015.

Computer Vision: Summary and Discussion Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 05/03/2011.

Learning the Appearance and Motion of People in Video Hedvig Sidenbladh, KTH Michael Black, Brown University.

Computer Vision, Robert Pless

Computer Vision 776 Jan-Michael Frahm 12/05/2011 Many slides from Derek Hoiem, James Hays.

Computer Vision Lecture #10 Hossam Abdelmunim 1 & Aly A. Farag 2 1 Computer & Systems Engineering Department, Ain Shams University, Cairo, Egypt 2 Electerical.

CSE 185 Introduction to Computer Vision Stereo. Taken at the same time or sequential in time stereo vision structure from motion optical flow Multiple.

Epitomic Location Recognition A generative approach for location recognition K. Ni, A. Kannan, A. Criminisi and J. Winn In proc. CVPR Anchorage,

Expectation-Maximization (EM) Case Studies

Chapter 5 Multi-Cue 3D Model- Based Object Tracking Geoffrey Taylor Lindsay Kleeman Intelligent Robotics Research Centre (IRRC) Department of Electrical.

Boosted Particle Filter: Multitarget Detection and Tracking Fayin Li.

Computer Science Readings: Reinforcement Learning Presentation by: Arif OZGELEN.

Feature extraction: Corners and blobs. Why extract features? Motivation: panorama stitching We have two images – how do we combine them?

Final Review Course web page: vision.cis.udel.edu/~cv May 21, 2003  Lecture 37.

Lecture 9 Feature Extraction and Motion Estimation Slides by: Michael Black Clark F. Olson Jean Ponce.

Learning video saliency from human gaze using candidate selection CVPR2013 Poster.

A computational model of stereoscopic 3D visual saliency School of Electronic Information Engineering Tianjin University 1 Wang Bingren.

Correspondence and Stereopsis. Introduction Disparity – Informally: difference between two pictures – Allows us to gain a strong sense of depth Stereopsis.

11/25/03 3D Model Acquisition by Tracking 2D Wireframes Presenter: Jing Han Shiau M. Brown, T. Drummond and R. Cipolla Department of Engineering University.

Learning Image Statistics for Bayesian Tracking Hedvig Sidenbladh KTH, Sweden Michael Black Brown University, RI, USA

Processing visual information for Computer Vision

Computer vision: models, learning and inference

Summary and Discussion

A Forest of Sensors: Using adaptive tracking to classify and monitor activities in a site Eric Grimson AI Lab, Massachusetts Institute of Technology

Eric Grimson, Chris Stauffer,

PRAKASH CHOCKALINGAM, NALIN PRADEEP, AND STAN BIRCHFIELD

Brief Review of Recognition + Context

Analysis of Contour Motions

Liyuan Li, Jerry Kah Eng Hoe, Xinguo Yu, Li Dong, and Xinqi Chu

Presentation transcript:

A System for Observing and Recognizing Objects in the Real World J.O. Eklundh, M. Björkman, E. Hayman Computational Vision and Active Perception Lab Royal Institute of Technology (KTH)

Vision in the Real World: Attending, Foveating and Recognizing Objects Motivation An autonomous agent moving about in a dynamic indoor environment, performing tasks such as finding, picking up and delivering known objects or classes of objects. What should the vision system of such an agent be capable of?

Vision in the Real World: Attending, Foveating and Recognizing Objects Capabilities “ where” what attention segmentation recognition what Should be dealt with jointly. Bootstrapping?

Vision in the Real World: Attending, Foveating and Recognizing Objects Recognition and Categorization Feifei et al 03

Vision in the Real World: Attending, Foveating and Recognizing Objects A robot looking at a table at 1.5 m Objects subtend only a fraction of the scene and are not centered (no attentional step)

Vision in the Real World: Attending, Foveating and Recognizing Objects Approach - themes 3D cues relevant in the scene Motion and stereo used for bootstrapping Integration of multiple cues A system interacting with the environment Fast processes and anytime algorithms desirable

Vision in the Real World: Attending, Foveating and Recognizing Objects The f-g-s problem Segmenting 3D objects from the background Computing motion, depth and ego-motion Acquiring appearance models Issues: Combining cues Demonstrate simple algorithms that suffice Monocular as well as binocular cases

Vision in the Real World: Attending, Foveating and Recognizing Objects Cue integration for f-g-s First, the dynamic monocular case Problem: classifying pixels as being foreground or background (or into layers) Cues: motion, colour, contrast + prediction (temporal continuity) Inference problem: observations from different spaces to be combined

Vision in the Real World: Attending, Foveating and Recognizing Objects Integration Two approaches: - probabilistic: likelihood of observing data, given a model of each layer - voting: each cue decides independently, form weighted combination Algorithm Online initialization of colour + texture models –Use segmentation from motion to train distributions Suppress unreliable cues Sequential algorithm

Vision in the Real World: Attending, Foveating and Recognizing Objects Related work Khan & Shah CVPR’01 Triesch & von der Malsburg ICAFGR’00 Spengler & Schiele ICVS’01 Toyama & Horvitz ACCV’00 Kragic & Christensen ICRA’99 Belongie et al ICCV’98 … Similarity to tracking

Vision in the Real World: Attending, Foveating and Recognizing Objects Voting The likelihood of observation from cue k at pixel i given model of layer j : Posterior probability of layer j : We set

Vision in the Real World: Attending, Foveating and Recognizing Objects Probabilistic fusion Assume independent observations The total likelihood of the observations given the combined model : The posterior estimate of layer membership: An independent opinion pool

Vision in the Real World: Attending, Foveating and Recognizing Objects Illustrations Assume a distribution for each layer for each cue f.g. model b.g. model 8 f ( Z | f.g ) = f ( Z | b.g ) = 0.4 Observation: Z

Vision in the Real World: Attending, Foveating and Recognizing Objects Cue combination: Weighted voting Each cue makes independent decision Combine using weighted sum Assuming equal weights: f.g.b.g. Colour Motion0.20.3

Vision in the Real World: Attending, Foveating and Recognizing Objects Cue combination: Probabilistic Compute total likelihood of observations for each layer Classify using Bayes’ Rule Assuming uniform priors: f.g.b.g. Colour Motion0.20.3

Vision in the Real World: Attending, Foveating and Recognizing Objects Pros and cons Simple! Easy to combine observations from very different spaces Not obvious how cues interact Graded output Mathematically well-founded One cue can easily dominate over others Almost binary output Voting: Probabilistic:

Vision in the Real World: Attending, Foveating and Recognizing Objects Training and adaptation Start with segmentation just from motion Use this to train colour and contrast distributions –Use EM algorithm to train Gaussian mixture models Subsequently adapted online –Recompute models from current data –Update model as weighted sum over time window (Raja et al ECCV’98)

Vision in the Real World: Attending, Foveating and Recognizing Objects Suppressing unreliable cues Cues unreliable during training No independent motion poor motion segmentation Unreliable in the past probably unreliable now! (Triesch and von der Malsburg ICAFGR’00) Mechanisms: Voting:weights Probabilistic:hyper-priors

Vision in the Real World: Attending, Foveating and Recognizing Objects The effect of hyper-priors Orginal pdf’sAfter marginalization

Vision in the Real World: Attending, Foveating and Recognizing Objects Results OriginalProbabilisticVoting Degree of membership to foreground

Vision in the Real World: Attending, Foveating and Recognizing Objects Cue combination Motion ColourPrediction Texture (contrast) Combined Probabilistic Combined Voting

Vision in the Real World: Attending, Foveating and Recognizing Objects Results OriginalForeground

Vision in the Real World: Attending, Foveating and Recognizing Objects Results: Probabilistic cue integration OriginalForeground mask

Vision in the Real World: Attending, Foveating and Recognizing Objects The cues Motion ColourPrediction Texture (contrast) Combined

Vision in the Real World: Attending, Foveating and Recognizing Objects Epipolar geometry Non-retinal info. Image features Disparity map Ego-motion Independent motion Additional retinal info. Regions of interest Fixation point Top-down control Process overview

Vision in the Real World: Attending, Foveating and Recognizing Objects Calibration Relative orientations has to be known to relate disparities to depths simplify estimation of disparities xyxy (1+x )  – y r xy  + r + x r 2 z zy =+ 1 z 1 – x t - y t Using corner features and optical flow model Unstable process => be careful We first assume r and r to be zero. zy

Vision in the Real World: Attending, Foveating and Recognizing Objects More examples

Vision in the Real World: Attending, Foveating and Recognizing Objects Final output

Vision in the Real World: Attending, Foveating and Recognizing Objects Many objects of interest static. Harder! Motion included in full system; now only stereo The cues in these examples –Stereo data - exist along contours –Color data/appearance between contours 3D Cues: stereo and motion

Vision in the Real World: Attending, Foveating and Recognizing Objects Proposed system structure Technically by a combination of wide field and foveal cameras A wide field for attention, recognition in foveated view Problem: transfer from wide field to foveal view Steps: Divide scene into 3D objects Select objects through attention (e.g.hue and expected size) Fixate (and track) object of interest Recognize objects in foveal view

Vision in the Real World: Attending, Foveating and Recognizing Objects Processes Recognition Attention Segmen- tation Hypotheses Knowledge Adaptation Shape and size Knowledge Region of Interest Where What

Vision in the Real World: Attending, Foveating and Recognizing Objects Flow of information Left SegmentationGlobal hueSIFT featuresLocal hue Attention Gaze direction Recognition Registration LeftRight FixationCalibration Wide fieldFoveal

Vision in the Real World: Attending, Foveating and Recognizing Objects Figure-ground segmentation Disparity map is sliced into layers. Widths are set to that of requested object.

Vision in the Real World: Attending, Foveating and Recognizing Objects Figure-ground segmentation Disparities using SAD correlations. Segmentation based on slicing the 3D world. BinoCuesBinoAttn

Vision in the Real World: Attending, Foveating and Recognizing Objects Hue based attention Local hue histograms correlated with that of requested object. Fast implementation using rotating sums.

Vision in the Real World: Attending, Foveating and Recognizing Objects Saliency peaks Peaks from blob detection of depth slices. Based on Differences of Gaussians. Hue saliency map used for weighting. Random value added before selection.

Vision in the Real World: Attending, Foveating and Recognizing Objects Fixation 0 0 c 0 0 d a b e F = The foveal system continuously tries to fixate done using corner features and affine essential matrix Zero disparity filters won’t work

Vision in the Real World: Attending, Foveating and Recognizing Objects Foveated segmentation To boost recognition Foveal segmentation based on disparities Rectification using affine fundamental matrix Only search for disparities around zero => Large number of false positives Points clustered in 3D using mean shift

Vision in the Real World: Attending, Foveating and Recognizing Objects Foveated segmentation

Vision in the Real World: Attending, Foveating and Recognizing Objects Foveated segmentation

Vision in the Real World: Attending, Foveating and Recognizing Objects Small object database in real-time experiments Models of SIFT features and hue histograms

Vision in the Real World: Attending, Foveating and Recognizing Objects Visual scene search

Vision in the Real World: Attending, Foveating and Recognizing Objects Segmentation robustness

Vision in the Real World: Attending, Foveating and Recognizing Objects Segmentation robustness

Vision in the Real World: Attending, Foveating and Recognizing Objects Effect of occlusions

Vision in the Real World: Attending, Foveating and Recognizing Objects Effect of rotations

Vision in the Real World: Attending, Foveating and Recognizing Objects Recognition in w-f-o-v

Vision in the Real World: Attending, Foveating and Recognizing Objects Recognition after foveation

Vision in the Real World: Attending, Foveating and Recognizing Objects Conclusions We have a running system. Objects normally found within three saccades Concern: dependency on corner features Current work: Focus on recognition and categorization More robust foveal segmentation Additional cues e.g. texture Learning and adaptation on all levels