Download presentation
Presentation is loading. Please wait.
1
Recognizing Objects and Actions in Images Jitendra Malik U.C. Berkeley
2
Many kinds of images… Ordinary optical images/video –Ubiquitous, cheap X Ray tomography –Volumetric data Range sensors –2.5 D data ……
3
From images/video to objects Labeled sets: tiger, grass etc
4
Recognition Possible for both instances or object classes (Mona Lisa vs. faces or Beetle vs. cars) Tolerant to changes in pose and illumination, and occlusion
5
Examples of Actions Movement and posture change –run, walk, crawl, jump, hop, swim, skate, sit, stand, kneel, lie, dance (various), … Object manipulation –pick, carry, hold, lift, throw, catch, push, pull, write, type, touch, hit, press, stroke, shake, stir, turn, eat, drink, cut, stab, kick, point, drive, bike, insert, extract, juggle, play musical instrument (various)… Conversational gesture –point, … Sign Language
6
Outline Finding/recognizing faces Recognizing objects Recognizing actions
7
Results on various images submitted to the CMU on-line face detector Face Detection Carnegie Mellon University
8
Accuracy of Face Detection Carnegie Mellon University MIT-CMU test set –94% detection rate with false detection every 2 images Ordinary consumer photographs (according to Gretag Imaging) –88% detection rate with false detection every image H. Schneiderman and T. Kanade. “Object Detection Using the Statistics of Parts.” To appear in Int. Jour. of Comp. Vision, 2002.
9
http://www.dodcounterdrug.com/facialrecognition/ DoD Counterdrug (courtesy, J. Phillips) Test of commercial-off-the-shelf (COTS) facial recognition products Test developed by NIST based on FERET - 13,872 x 13,872 image evaluation matrix resulting in over 192 million matches Formal test was conducted May to June 2000 Results released in February 2001 Sponsored by DoD Counterdrug Technology Development Program Office, National Institute of Justice, NAVSEA Crane Division and DARPA
10
Pose Gallery 200 people Probe set 400 images 25°10°20°45° Probability of ID
11
Illumination Gallery 227 Outdoor 190 Indoor/Ambient 236 Probability of ID
12
FRVT 2000 Results Critical Parameters –Rotations –Distance/resolution –Duplicates –Illumination
13
Outline Finding/recognizing faces Recognizing objects Recognizing actions
14
Biological Shape D’Arcy Thompson: On Growth and Form, 1917 –studied transformations between shapes of organisms
15
Matching Framework Belongie, Malik & Puzicha, PAMI 2002 Find correspondences between points on shape Estimate transformation & measure similarity modeltarget...
16
Comparing Pointsets
17
Comparing Shape Contexts Compute matching costs using Chi Squared distance: Recover correspondences by solving linear assignment problem with costs C ij [Jonker & Volgenant 1987]
18
Matching Framework Find correspondences between points on shape Estimate transformation & measure similarity modeltarget...
19
2D counterpart to cubic spline: Minimizes bending energy: Solve by inverting linear system Can be regularized when data is inexact Thin Plate Spline Model Duchon (1977), Meinguet (1979), Wahba (1991)
20
Matching Example modeltarget
21
Object Recognition Experiments Handwritten digits COIL 3D objects (Nayar-Murase) Human body configurations Trademarks
22
Terms in Similarity Score Shape Context difference Local Image appearance difference –orientation –gray-level correlation in Gaussian window –… (many more possible) Bending energy
23
Handwritten Digit Recognition MNIST 60 000: –linear: 12.0% –40 PCA+ quad: 3.3% –1000 RBF +linear: 3.6% –K-NN: 5% –K-NN (deskewed) : 2.4% –K-NN (tangent dist.) : 1.1% –SVM: 1.1% –LeNet 5: 0.95% MNIST 600 000 (distortions): –LeNet 5: 0.8% –SVM: 0.8% –Boosted LeNet 4: 0.7% MNIST 20 000: –K-NN, Shape Context matching: 0.63%
25
COIL Object Database
26
Prototypes Selected for 2 Categories Details in Belongie, Malik & Puzicha (NIPS2000)
27
Error vs. Number of Views
28
Human body configurations
29
Deformable Matching (Mori & Malik, ECCV 2002) Kinematic chain-based deformation model Use iterations of correspondence and deformation Keypoints on exemplars are deformed to locations on query image
30
Results
31
Tracking by Repeated Finding
32
Outline Finding/recognizing faces Recognizing objects Recognizing actions
33
Examples of Actions Movement and posture change –run, walk, crawl, jump, hop, swim, skate, sit, stand, kneel, lie, dance (various), … Object manipulation –pick, carry, hold, lift, throw, catch, push, pull, write, type, touch, hit, press, stroke, shake, stir, turn, eat, drink, cut, stab, kick, point, drive, bike, insert, extract, juggle, play musical instrument (various)… Conversational gesture –point, … Sign Language
34
Activities and Situation Assessment Example: Withdrawing money from an ATM Activities constructed by composing actions. Partial order plans may be a good model. Activities may involve multiple agents Detecting unusual situations or activity patterns is facilitated by the video activity transform
35
Objects in space Actions in spacetime Segment/Region-of-interest Features (points, curves, wavelet coefficients..) Correspondence and deform into alignment Recover parameters of generative model Discriminative classifier Segment/volume-of-interest Features (points, curves, wavelets, motion vectors..) Correspondence and deform into alignment Recover parameters of generative model Discriminative classifier
36
Key cues for action recognition “Morpho-kinesics” of action (shape and movement of the body) Identity of the object/s Activity context
37
Image/Video Stick figure Action Stick figures can be specified in a variety of ways or at various resolutions (deg of freedom) –2D joint positions –3D joint positions –Joint angles Complete representation Evidence that it is effectively computable
38
Mathematical Challenges Modeling shape variation Nearest neighbor search in high dimensions Combining statistical optimality with computational efficiency Reconstruction algorithms for novel sensing modalities
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.