Download presentation
Presentation is loading. Please wait.
Published byMarshall Hodge Modified over 9 years ago
3
Need volunteers…
4
From Monday’s paper: A simple story about representations Input signal: a moving edge. Model it using an auto-regressive model, Using two different representations for observations y: Representation 1: image-based. Representation 2: position-based.
5
Input signal Representation 1
6
Bases, n=8 Representation 1
7
Dynamics, n=8 Representation 1
8
Bases, n=20 Representation 1
9
Dynamics, n=20 Representation 1
10
Bases, n=50 Representation 1
11
N = 50 dynamics Representation 1
12
What happens next? Representation 1
13
Representing the edge position Input signal: y = [1:100] What dimension of an auto-regressive model do we need to describe that signal? Representation 2
14
N = 1 Can only show exponentially decaying position. Representation 2
15
N = 2 A 2-d model can handle uniform translation exactly. Representation 2
16
The simple story For a simple, canonical signal like a moving edge, modelling it with an AR model, The pixel-based representation requires a high-dimensional state vector, and even then doesn’t work very well. The position-based representation works perfectly with a 2-dimensional state vector.
17
Separating style and content with bilinear models Bill Freeman, MIT AI Lab. Josh Tenenbaum, MIT Dept. Brain and Cognitive Sciences
18
ContentStyle characterfont rendered observation A Matura MTCharacter #1 not observed observed synthesis analysis Style and content example
19
DomainContentStyle typographyletterfont face recognitionidentityhead orientation shape from shadingshapelighting color perceptionobject colorillum. color speech recognitionwordsspeaker Many perception problems have this two-factor structure Factor 1Factor 2
20
Color constancy demo
21
How much of what we may consider to be (high-level) visual style can we account for by a simple, low-level statistical model? Given: observations that are the result of two strongly interacting factors, can we separately analyze or manipulate those two factors?
22
Perceptual tasks
23
Common form of observations A B C D E F G H I... factor 1 factor 2
24
General case content-class (“b” values) style (“a” values) f(a 1,b 1 ) f(a 1,b 2 ) f(a 1,b 3 )... f(a 2,b 1 ) f(a 2,b 2 ) f(a 2,b 3 )............ Account for observations by a rendering function, f(a,b )
25
Asymmetric bilinear model y sc = f(A s, b c ) = A s b c Observation vector in style s and content c Matrix for style s Vector for content element c
26
Asymmetric bilinear model, with identity is the style factor.
27
Symmetric bilinear model y sc k = f(a s, b c ) = a s W k b c Kth element of the observation vector in style s and content c Matrix for element k of observation vector. Vector for content element c Vector for style s
28
Symmetric bilinear model
29
Fitting model to training observations Iterate SVD’s Magnus and Neudecker, 1988 SVD =... Asymmetric model Symmetric model y sc = A s b c y sc k = a s W k b c
30
head pose identity y =
31
Vector transpose
32
Related Work, bilinear models Koenderink and Van Doorn, 1991, 1996 Tomasi and Kanade, 1992 Faugeras, 1993 Magnus and Neudecker, 1988 Marimont and Wandell, 1992 Turk and Pentland, 1991 Ullman and Basri, 1991 Murase and Nayar, 1995
33
Related Work, analyzing style Hofstadter, 1995 and earlier papers. Grebert et al, 1992 SIGGRAPH papers regarding controls for animation or line style. Typically hand- crafted, not learned. Brand and Hertzmann, 2000 Hertzmann et al, 2001 Efros and Freeman, 2001
34
Procedure (1) Fit a bilinear model to the training data of content elements observed across different styles, using linear algebra techniques. (2) Use new data to find the parameters for a new, unknown style, or to classify new observations, or to generalize both style and content.
35
phoneme speaker “ ah eh ou... ” “ ah eh ou … ” “ ah eh ou... ” “eh ee ” oueeu ah training set utterances from new speaker Task: Classification Domain: vowel phonemes
36
Benchmark dataset CMU machine learning repository Training: 8 speakers saying 11 different vowel phonemes. Testing: 7 new speakers Data representation: LPC coefficients.
37
Classification using bilinear models Use EM (expectation maximization) algorithm. Build up model for new speaker’s style simultaneously with classification of the content. y observed = A new speaker b phonemes Vowel data from a speaker in a new style Matrix describing the unknown style of the new speaker Previously learned vowel (content) descriptors
38
Example problem for Expectation Maximization (EM) algorithm “Find the probability that each point came from one of two random spatial processes”.
39
EM algorithm (E-step) (M-step)
40
Classification results: performance comparison Multi-layer perceptron:51% 1-nearest neighbor (nn):56% Discrm. adapt. nn:62% Bilinear model: data not grouped69% data grouped by speaker76%
41
Task: Classification Domain: faces and pose.
42
Nearest neighbor matching:53% Bilinear model: Estimate A s while classify b c with EM:74% Face pose classification results Given observations of a new face, what % of the poses can we identify correctly?
43
Chicago Zaph Times Mistral Times Bold Monaco (Rest of alphabet, used in training, not shown.) Task: Extrapolation Domain: typography
44
Coulomb warp representation Describe each shape by the warp that a square of ink particles would have to undergo to form the shape.
45
Coulomb warping reference shape target shape
46
Coulomb warp representation shapesaverages pixel Coulomb warp
47
S 1 S 2 S 1 + S 2 S 1 + S 2 (pixel) (Coulomb)
48
Basis functions for the asymmetric bilinear model b letter “C” A chicago A zaph A mistral xxxxxxx = = =
49
Controlling complexity in calculating the style matrix for the new font asymmetric model, using symmetric model as a prior asymmetric model (173,280 parameters to fit) symmetric model (5 parameters to fit) Monaco (true)
50
synthetic actual Chicago Zaph Times Mistral Times Bold Monaco Results of extrapolation to a new style
51
Leave-one-out results Chicago Zaph Chancery Times Mistral Times Bold Monaco
53
Task: Translation Domain: shape and lighting Factor 2: Identity (face shape) Factor 1: lighting (1) Fit symmetric bilinear model to training data (pixel representation). (2) Solve for parameters describing face and lighting of new image. Training Generalization
54
Translation Results Factor 2: Identity (face shape) Factor 1: lighting
55
Conclusion: bilinear models are useful for translation, classification, and extrapolation perceptual tasks. factor 1factor 2observation letter#1Matura MT phonemespeaker “ahh” pose 3Hiro illuminantsurface color eye cone A response s
58
End. Extra pages follow. The following slides are extras….
59
Style and content Mention unsupervised version would be a good class project. Josh or I would be into working with someone on it.
60
Increase dimensionality to represent non-linearities Say f(x) = p x 2 + q x + r. This parabola varies non-linearly with x, but as a linear function of. (Like “homogeneous coordinates” in graphics)
61
Fitting parabolas 1-d model 2-d model 3-d model
62
Reconstruction from low- dimensional model
63
Eigenfaces for each pose
64
Factor 1: head pose Factor 2: identity Task: Classification Domain: faces and pose. We build a bilinear model of how head pose and identity modify face appearance.
65
Basis images Pose Pose-dependent basis functions for face appearance. One set of coefficients will reconstruct the same person in different poses.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.