Download presentation
Presentation is loading. Please wait.
1
Computer Vision Group University of California Berkeley Recognizing objects and actions in images and video Jitendra Malik U.C. Berkeley
2
Computer Vision Group University of California Berkeley Collaborators Grouping: Jianbo Shi, Serge Belongie, Thomas Leung Ecological Statistics: Charless Fowlkes, David Martin, Xiaofeng Ren Recognition: Serge Belongie, Jan Puzicha, Greg Mori
3
Computer Vision Group University of California Berkeley Motivation Interaction with multimedia needs to be in terms that are significant to humans: objects, actions, events.. Feature based CBIR systems (e.g. color histograms) are a failure for this purpose Presently, the best solutions use text / speech recognition/ human annotation Goal: Auto-annotation
4
Computer Vision Group University of California Berkeley From images/video to objects Labeled sets: tiger, grass etc
5
Computer Vision Group University of California Berkeley
6
Computer Vision Group University of California Berkeley
7
Computer Vision Group University of California Berkeley Consistency A BC A,C are refinements of B A,C are mutual refinements A,B,C represent the same percept Attention accounts for differences Image BGL-birdR-bird grass bush head eye beak far body head eye beak body Perceptual organization forms a tree: Two segmentations are consistent when they can be explained by the same segmentation tree (i.e. they could be derived from a single perceptual organization).
8
Computer Vision Group University of California Berkeley What enables us to parse a scene? –Low level cues Color/texture Contours Motion –Mid level cues T-junctions Convexity –High level Cues Familiar Object Familiar Motion
9
Computer Vision Group University of California Berkeley Outline Finding boundaries Recognizing objects Recognizing actions
10
Computer Vision Group University of California Berkeley Finding boundaries: Is texture a problem or a solution? imageorientation energy
11
Computer Vision Group University of California Berkeley Statistically optimal contour detection Use humans to segment a large collection of natural images. Train a classifier for the contour/non-contour classification using orientation energy and texture gradient as features.
12
Computer Vision Group University of California Berkeley Orientation Energy Gaussian 2nd derivative and its Hilbert pair Can detect combination of bar and edge features; also insensitive to linear shading [Perona&Malik 90] Multiple scales
13
Computer Vision Group University of California Berkeley Texture gradient = Chi square distance between texton histograms in half disks across edge i j k Chi-square 0.1 0.8
14
Computer Vision Group University of California Berkeley
15
Computer Vision Group University of California Berkeley
16
Computer Vision Group University of California Berkeley
17
Computer Vision Group University of California Berkeley ROC curve for local boundary detection
18
Computer Vision Group University of California Berkeley Outline Finding boundaries Recognizing objects Recognizing actions
19
Computer Vision Group University of California Berkeley Biological Shape D’Arcy Thompson: On Growth and Form, 1917 –studied transformations between shapes of organisms
20
Computer Vision Group University of California Berkeley Deformable Templates: Related Work Fischler & Elschlager (1973) Grenander et al. (1991) von der Malsburg (1993)
21
Computer Vision Group University of California Berkeley Matching Framework Find correspondences between points on shape Fast pruning Estimate transformation & measure similarity modeltarget...
22
Computer Vision Group University of California Berkeley Comparing Pointsets
23
Computer Vision Group University of California Berkeley Shape Context Count the number of points inside each bin, e.g.: Count = 4 Count = 10... FCompact representation of distribution of points relative to each point
24
Computer Vision Group University of California Berkeley Shape Context
25
Computer Vision Group University of California Berkeley Shape Contexts Invariant under translation and scale Can be made invariant to rotation by using local tangent orientation frame Tolerant to small affine distortion –Log-polar bins make spatial blur proportional to r Cf. Spin Images (Johnson & Hebert) - range image registration
26
Computer Vision Group University of California Berkeley Comparing Shape Contexts Compute matching costs using Chi Squared distance: Recover correspondences by solving linear assignment problem with costs C ij [Jonker & Volgenant 1987]
27
Computer Vision Group University of California Berkeley Matching Framework Find correspondences between points on shape Fast pruning Estimate transformation & measure similarity modeltarget...
28
Computer Vision Group University of California Berkeley Fast pruning Find best match for the shape context at only a few random points and add up cost
29
Computer Vision Group University of California Berkeley Snodgrass Results
30
Computer Vision Group University of California Berkeley Results
31
Computer Vision Group University of California Berkeley Matching Framework Find correspondences between points on shape Fast pruning Estimate transformation & measure similarity modeltarget...
32
Computer Vision Group University of California Berkeley 2D counterpart to cubic spline: Minimizes bending energy: Solve by inverting linear system Can be regularized when data is inexact Thin Plate Spline Model Duchon (1977), Meinguet (1979), Wahba (1991)
33
Computer Vision Group University of California Berkeley Matching Example modeltarget
34
Computer Vision Group University of California Berkeley Outlier Test Example
35
Computer Vision Group University of California Berkeley Synthetic Test Results Fish - deformation + noiseFish - deformation + outliers ICPShape ContextChui & Rangarajan
36
Computer Vision Group University of California Berkeley Terms in Similarity Score Shape Context difference Local Image appearance difference –orientation –gray-level correlation in Gaussian window –… (many more possible) Bending energy
37
Computer Vision Group University of California Berkeley Object Recognition Experiments Handwritten digits COIL 3D objects (Nayar-Murase) Human body configurations Trademarks
38
Computer Vision Group University of California Berkeley Handwritten Digit Recognition MNIST 60 000: –linear: 12.0% –40 PCA+ quad: 3.3% –1000 RBF +linear: 3.6% –K-NN: 5% –K-NN (deskewed) : 2.4% –K-NN (tangent dist.) : 1.1% –SVM: 1.1% –LeNet 5: 0.95% MNIST 600 000 (distortions): –LeNet 5: 0.8% –SVM: 0.8% –Boosted LeNet 4: 0.7% MNIST 20 000: –K-NN, Shape Context matching: 0.63%
39
Computer Vision Group University of California Berkeley
40
Computer Vision Group University of California Berkeley Results: Digit Recognition 1-NN classifier using: Shape context + 0.3 * bending + 1.6 * image appearance
41
Computer Vision Group University of California Berkeley COIL Object Database
42
Computer Vision Group University of California Berkeley Error vs. Number of Views
43
Computer Vision Group University of California Berkeley Prototypes Selected for 2 Categories Details in Belongie, Malik & Puzicha (NIPS2000)
44
Computer Vision Group University of California Berkeley Editing: K-medoids Input: similarity matrix Select: K prototypes Minimize: mean distance to nearest prototype Algorithm: –iterative –split cluster with most errors Result: Adaptive distribution of resources (cfr. aspect graphs)
45
Computer Vision Group University of California Berkeley Error vs. Number of Views
46
Computer Vision Group University of California Berkeley Human body configurations
47
Computer Vision Group University of California Berkeley Deformable Matching Kinematic chain-based deformation model Use iterations of correspondence and deformation Keypoints on exemplars are deformed to locations on query image
48
Computer Vision Group University of California Berkeley Results
49
Computer Vision Group University of California Berkeley Trademark Similarity
50
Computer Vision Group University of California Berkeley Recognizing objects in scenes
51
Computer Vision Group University of California Berkeley Shape matching using multi-scale scanning Shape context computation (10 Mops) –Scales * key-points * contour-points (10*100*10000) Multi scale coarse matching (100 Gops) –Scales * objects * views * samples * key-points* dim-sc (10*1000*10*100*100*100) Deform into alignment (1 Gops) –Image-objects * shortlist * (samples)^2 *dim-sc (10*100*10000*100)
52
Computer Vision Group University of California Berkeley Shape matching using grouping Complexity determining step: find approx. nearest neighbors of 10^2 query points in a set of 10^6 stored points in the 100 dimensional space of shape contexts. Naïve bound of 10^9 can be much improved using ideas from theoretical CS (Johnson- Lindenstrauss, Indyk-Motwani etc)
53
Computer Vision Group University of California Berkeley Putting grouping/segmentation on a sound foundation Construct a dataset of human segmented images Measure the conditional probability distribution of various Gestalt grouping factors Incorporate these in an inference algorithm
54
Computer Vision Group University of California Berkeley Outline Finding boundaries Recognizing objects Recognizing actions
55
Computer Vision Group University of California Berkeley Examples of Actions Movement and posture change –run, walk, crawl, jump, hop, swim, skate, sit, stand, kneel, lie, dance (various), … Object manipulation –pick, carry, hold, lift, throw, catch, push, pull, write, type, touch, hit, press, stroke, shake, stir, turn, eat, drink, cut, stab, kick, point, drive, bike, insert, extract, juggle, play musical instrument (various)… Conversational gesture –point, … Sign Language
56
Computer Vision Group University of California Berkeley Activities and Situation Classification Example: Withdrawing money from an ATM Activities constructed by composing actions. Partial order plans may be a good model. Activities may involve multiple agents Detecting unusual situations or activity patterns is facilitated by the video activity transform
57
Computer Vision Group University of California Berkeley On the difficulty of action detection Number of categories Intra-Category variation –Clothing –Illumination –Person performing the action Inter-Category variation
58
Computer Vision Group University of California Berkeley Objects in space Actions in spacetime Segment/Region-of-interest Features (points, curves, wavelet coefficients..) Correspondence and deform into alignment Recover parameters of generative model Discriminative classifier Segment/volume-of-interest Features (points, curves, wavelets, motion vectors..) Correspondence and deform into alignment Recover parameters of generative model Discriminative classifier
59
Computer Vision Group University of California Berkeley Key cues for action recognition “Morpho-kinesics” of action (shape and movement of the body) Identity of the object/s Activity context
60
Computer Vision Group University of California Berkeley Image/Video Stick figure Action Stick figures can be specified in a variety of ways or at various resolutions (deg of freedom) –2D joint positions –3D joint positions –Joint angles Complete representation Evidence that it is effectively computable
61
Computer Vision Group University of California Berkeley Tracking by Repeated Finding
62
Computer Vision Group University of California Berkeley Achievable goals in 3 years Reasonable competence at object recognition at crude category level (~1000) Detection/Tracking of humans as kinematic chains, assuming adequate resolution. Recognition of ~10-100 actions and compositions thereof.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.