Download presentation
Presentation is loading. Please wait.
1
Attentive People Finding
James Elder Centre for Vision Research York University Toronto, Canada Joint work with: Simon Prince Bob Hou
2
Research Context Collaborative Project: “Monitoring Changes to Urban Environments with a Network of Sensors” Funding: Canadian Agency called GEOIDE (Geomatics for Informed Decisions) "This ‘network of networks’ brings together the skills, technology and people from different communities of practice, in order to develop and consolidate the Canadian competences in geomatics."
3
What is our project? Monitoring Changes to Urban Environments
"This project will study visual detection and interpretation of changes to urban environments using continuous and non-continuous sensing from a multiplicity of diverse sensors using networks of video cameras, augmented with high-resolution satellite imagery. It will also investigate the problem of how such information can be integrated and managed within a computer, leading to the development of a prototype information system for monitoring urban environments."
4
Project Team University Principal Investigators:
David Clausi, Waterloo Geoffrey Edwards, Laval James Elder, York Frank Ferrie, McGill Jim Little, UBC Main Industry Partners CAE Genetec Aimetis
5
Timeframe April 2005 – March 2009
6
Objectives 1. Establishment of urban test facilities involving networks of multi-sensor wireless cameras with associated satellite data and development of intercalibration software (Elder, Ferrie, Little) 2. Development of algorithms for fusing offline satellite data with streaming video from terrestrial sensors for the construction of more complete 3D urban models (Clausi). 3. Development of algorithms for inferring approximate intrinsic images from monocular video (ordinal depth maps, reflectance maps, …). (Elder, Ferrie, Little) 4. Development of algorithms for identifying and modeling typical dynamic events (e.g. pedestrian and automobile traffic, changes in climate, air quality, seasonal changes) and detecting unusual events. (Elder, Ferrie, Little) 5. Development of algorithms for deriving and updating navigational maps based upon derived models. (Edwards) 6. Development of integrated demonstration system. (Ferrie)
7
Possible Application Areas
Disaster management (e.g., earthquakes) Traffic monitoring (e.g., automobile, trucking, pedestrian) Security (e.g., people tracking, activity and identity recognition) Urban planning (e.g., 3D dynamic scene visualization) Environmental monitoring (e.g., air quality)
8
Pre-Attentive and Attentive Sensing (with S. Prince, Y. Hou, M
Pre-Attentive and Attentive Sensing (with S. Prince, Y. Hou, M. Sizinitsev, E. Olevskey) FOVEAL IMAGE PAN TILT WIDE-FIELD IMAGE
9
Homographic fusion of attentive and pre-attentive streams
10
Wide-Field Body Detection
Min: 15x2 pixels Max: 98x78 pixels Median: 52x14 pixels
11
Wide-Field Face Detection
Max: 34x31 pixels Min: 2x2 pixels Median: 6x6 pixels
12
Detecting people in realistic environments
13
Biological vision?
14
Motion scaling From Johnston & Wright, 1986
15
Biological Motion From Ikeda, Blake & Watanabe, 2005
16
Structural Coherence (with L. Velisavljevic)
Psychophysical Method 506 ms 59 ms 1000 ms Until Response
17
Image Conditions Scrambled Coherent Colour Monochrome
18
Results % Correct 58 62 66 70 74 78 82 Colour Coherent Incoherent BW
Data Model % Correct
19
Spatial Coherence Colour Monochromatic 50 60 70 80 90 3 8 13 18
Mean Distance from Fixation (º) Percent Correct Unscrambled Scrambled Colour Monochromatic
20
Summary Pre-Attentive (Peripheral) Vision: Motion discrimination
Colour discrimination Biological motion Contour integration Coherent structure
21
Preattentive System Design
Motion Foreground Skin region likelihood ratio X raw pixel pixel posterior region response pixel model spatial integrator region model system posterior system priors
22
Priors as Attentive Feedback
mean body indicator motion kernel spatial prior high-resolution face detection confirmed face location non-max suppression gaze command random sampler gaze control Attentive sensor posterior prior motion kernel likelihood
23
Pixel Posteriors Pixel Posteriors Motion Original frame Foreground
0.5 1 Original frame Foreground Skin 0.5 1 0.5 1 Skin
24
Spatial Integration
25
Spatial Integration Motion Foreground Skin Area under ROC Curve
10 -1 1 0.7 0.72 0.74 0.76 0.78 0.8 0.82 0.84 0.86 Area under ROC Curve Exponent, g Motion Foreground Skin
26
Spatial Integration -4 -2 2 4 Motion Region Log Likelihood Ratio -4 -2
2 4 Motion Region Log Likelihood Ratio -4 -2 2 4 Joint Region Log Likelihood Ratio -4 -2 2 4 Foreground Region Log Likelihood Ratio -4 -2 2 4 Skin Region Log Likelihood Ratio
27
Combining Detectors System evaluation on distinct test database:
0.2 0.4 0.6 0.8 1 p(False Positive) p(Hit) Foreground 13 x 20 Combined Motion 20 x 20 Skin 4 x 5 Xiong & Jaynes System evaluation on distinct test database: 74% of fixations capture human heads
28
Performance System evaluation on distinct test database:
74% of fixations capture human heads 83% of people are fixated at least once
30
Automatically Confirmed High-Resolution Faces
31
3D POSE PROBLEM Capture training and test database
Horizontal pose (known) varies over 180 degrees. Pose for each image known precisely. Points on each face identified Image regions extracted Features are weighted sums of pixels in region
32
An Alternate Approach: 2D to 3D (with VisionSphere Technologies)
33
Simon Prince
34
Attentive People Finding
Realistic environments and behaviour hard problem. Humans: primitive mechanisms are preserved in periphery, more complex mechanisms are not. Our approach: probabilistic combination of simple, weak cues Ongoing work: attentive feedback
36
Colour Scaling From Rovamo & Iivanainen, 1991
37
Contour Integration From Hess & Dakin, 1999
38
Contour Integration From Hess & Dakin, 1999
39
Interactive Attentive Sensing
Needed: Fast Saccadic Programming Algorithms!
40
Spatial Integration Motion Foreground Skin Area under ROC Curve
10 -1 1 0.7 0.72 0.74 0.76 0.78 0.8 0.82 0.84 0.86 Area under ROC Curve Exponent, g Motion Foreground Skin
41
3D Hugh
42
Sal Khan (VisionSphere)
43
No knowledge of 3d transformations
SUMMARY A supervised method to make a feature set more invariant to a known nuisance parameter Fast No knowledge of faces No knowledge of 3d transformations Slower Uses lot s of domain specific knowledge Better Results EIGEN-LIGHTFIELDS < INVARIANCE << 3D MODEL Gross, Matthews, Baker Prince, Elder Blanz et al.
44
Algorithm Summary TO TRAIN: TO CALCULATE INVARIANT VECTORS:
ESTIMATE MEAN AND COV OF MANIFOLD A FUNCTION OF DISTRACTOR VARIABLE ALTERNATELY ESTIMATE: INVARIANT VECTORS Ci TRANSFORMATIONS F1..n TO CALCULATE INVARIANT VECTORS: ESTIMATE NUISANCE VALUE, v TRANSFORM BY APPROPRIATE Fv
45
Attentive Snapshots
46
Feature Space PROBLEM STATEMENT
Problem: Image variation due to nuisance parameters such as pose change is greater than variation due to identity. This is reflected in most “features” The problem we are addressing concerns the type of features typically used for face recognition: consider two instances each of two faces at different poses. We would like to measure features such that both instances of the same face have similar values, as shown at the bottom of this slide. However, for typical “appearance” based features, the situation is more like this: all profile faces are similar to each other and all frontal faces are similar, so it is very hard to do face recognition when the face in the database has a different pose to your probe face. This is also true for other dimensions such as lighting and expression. We term all of these irrelevant parameters “nuisance parameters”. Feature Space
47
CONVENTIONAL FEATURE VECTOR
GOAL: Decompose Conventional Feature Vector to Invariant Feature + Nuisance Parameter ………………. X1 CONVENTIONAL FEATURE VECTOR ………………. C INVARIANT VECTOR f1,q1 NUISANCE PARAMETERS + ………………. X2 Our goal is to take the conventional feature vector, and decompose it into the nuisance parameters, and a new vector which is independent of these distractor dimensions. If we take a second instance of the same face, we should similarly be able to extract a vector, and decompose to a second set of nuisance parameters and a second invariant vector. If all has gone well this will be exactly the same as for the first instance. f2,q2
48
TOY DATA SET – IN PLANE ORIENTATION
TEST IMAGES – angle unknown ? PROBE IMAGE – angle unknown TRAINING IMAGES – angle known, several images of each face present Although I’m interested in more complicated situations, I’m going to demonstrate the ideas using a toy example. Consider trying to perform face recognition under an unknown in-plane rotation. We have some faces in our test database, and we are given a probe face which has the same identity as one of the faces in our database, but is at a different angle. Our task is to predict which of the test faces matches the probe. Our method is supervised, and requires a set of training data in which several individuals are seen at known poses. From these, we can learn statistically how faces at one pose are related to faces at another pose and leverage this information to make pose-invariant vectors. For our initial dataspace, we project onto the first few eigenvectors of the training data set. Choice of features: – first few EIGENVECTORS
49
Increasing q THE FIRST TWO FEATURE DIMENSIONS X2 X1
This is a view of the first two features - that is to say the distribution of the first two dimensions of the eigenspace. You can see some clear structure in the data. In face, I have colour coded (PRESS) the points by their pose, so that the darkest points are come from faces that are near horizontal and as we move anticlockwise around the manifold the faces become progressively more vertical. We can represent the progression of the manifold through space by plotting the mean of the data as a function of the pose. Any given point on the red line represents the mean of all the training data at a given pose. X1
50
ESTIMATE NUISANCE PARAMETER
X2 Similarly, we can represent the variance of the manifold as a function of the pose. For each point on the red line, there is an associated co-variance ellipse describing the variability of the data as a function of the pose. This gives us an easy method for estimating the nuisance parameter. If we are given a new test point, it is easy to identify which ellipse it is most closely associated with and hence identify the correct pose. You can see that this has also partitioned the entire space into distinct regions associated with each ellipse. We are going to exploit this partitioning to create invariance by associating a different function with each region of space such that they transform the data to a pose-invariant representation. X1
51
TRANSFORM feature vector differently based on estimated nuisance parameter, to an invariant vector.
X1 X2 Before: Fq 3 1 2 After: Here I have replotted the manifold. Again, the red line is the mean. The black line represents all of the data vectors belonging to a single individual as the pose of the picture moves around from horizontal to vertical. Our goal is to map all of the points on this line to a single vector. The spokes coming from the red line representing the mean of the manifold to the black line connect each point on the black line to the mean of it’s associated Gaussian. Lets consider three feature vectors representing this person at three different orientations, and lets look at the vectors from the mean for that pose to their position in feature space. Clearly these also vary considerably as a function of the pose. Now lets associate a different function with each of the three pose values with the aim of mapping them to a new constant vector, like so. It is obvious that this can be done for a single individual. The point of our technique is to estimate the parameters of these functions so that they map each individual in the test space to a constant vector in a least squares sense. One way of thinking about this whole system that might be helpful to those familiar with the learning literature is that we have a mixture of experts where the each expert is explicitly placed on the manifold mean at each given pose. What families functions should be used? We have experimented with Euclidean rotations and linear transforms, but it is quite possible to use more sinister non-linear functions in their place if you have enough training data. Let me give you an intuition of why such simple transformations might be helpful. What we know is true is that faces that look similar from the front usually look similar from the side, so that local neighbourhood structure in different parts of the manifold may well be quite similar. This regularity in the space means that it is potentially possible to model the structure of the manifold in this low-dimensional way.
52
3D POSE RESULTS ORIGINAL SPACE INVARIANT SPACE FEATURE 2 FEATURE 2
53
FERET RESULTS FOR POSE FERET data set 100 individuals, pose varies between degrees. One example of each face in the database, and one probe – never the same pose. Mean pose difference 71.3 degrees. 71% first choice match (depends on features)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.