Style/Content separation Evgeniy Bart, Dan Levi April 13, 2003
Artistic styles Photograph
Artistic styles Impressionist
Artistic styles Expressionist
Artistic styles Pointillist
Photographic styles *Pictures by Aya Aner-Wolf
Fonts ABCDE ABCDE ABCDE ABCDE Style Content
Faces *Images from FERET database
Tasks Extrapolation: Extrapolation of familiar style to new content
Tasks Extrapolation: Extrapolation of familiar content to new style
Tasks Translation:
Task specification by analogy : : ? Image analogies, Hertzmann et al.
Wei&Levoy Ashikhmin
Region growing Somewhat similar to quilting *Picture from presentation by Tal and Zeev
Region growing
Combining the two … : :
… hierarchically : :
: :
: :
Selecting best match
Arbitrating ? yes no Use Wei&Levoy value Use Ashikhmin value
Arbitrating ? yes no Use Wei&Levoy value Use Ashikhmin value
Arbitrating ? yes no Use Wei&Levoy value Use Ashikhmin value
: : Results – artistic filters
: :
: :
Super-resolution : Training 1
Super-resolution : Training 2
Super-resolution : Training 3
Super-resolution Results :
Super-resolution Results :
A lonely pine is standing In the North where high winds blow. He sleeps; and the whitest blanket wraps him in ice and snow. He dreams - dreams of a palm-tree that far in an Orient land Languishes, lonely and drooping, Upon the burning sand. H. Heine, translated by L. Untermeyer
Texture by numbers : :
Parameters annEpsilon: [float] = ashLastLevel: [bool] = false biasPenalty: [float] = cheesyBoundaries: [bool] = true coherenceEps: [float] = coherencePow: [float] = createSrcLocHisto: [bool] = false decayWeight: [double] = filterColorspace: [enum] = {Lab, Luv, RGB, XYZ} filterMM: [string] = (none!) filterModeMask: [string] = (none!) filterProcedure: [enum] = {Copy, Synthesize} filteredFeatureType: [enum] = {Difference, Raw} filteredPyramidType: [enum] = {Gaussian, Laplacian} finalSourceFac: [float] = gainPenalty: [float] = heurAnnEpsilon: [float] = heurMaxTSVQDepth [int] = 7 histogramEq: [bool] = false levelWeighting: [float] = matchBtoA: [bool] = false matchGrayHistogram: [bool] = false matchMeanVariance: [bool] = false maxTSVQDepth: [int] = 20 maxTSVQError: [float] = modeMaskWeight: [float] = neighborhoodWidth: [int] = 5 pyramidType: [enum] = {Gaussian, Laplacian, Steerable} samplerEpsilon: [float] = searchMethod: [enum] = {ANN, Ash, HeurANN, HeurTSVQ, Image, MLP, TSVQ, TSVQR, Vector} sourceColorspace: [enum] = {Lab, Luv, RGB, XYZ} srcWeight: [float] = targetMM: [string] = (none!) targetModeMask: [string] = (none!) useBias: [bool] = false useFilter: [bool] = true useFilterModeMask: [bool] = false useGain: [bool] = false useInterface: [bool] = true useRandomStart: [bool] = true useSigmoidalDecay: [bool] = false useSplineWeights: [bool] = true useTargetModeMask: [bool] = false useYIQ: [bool] = false numHiddenNeurons: [int] = 20 numLevels: [int] = 2 numPasses: [int] = 1 numTSVQBacktracks: [int] = 8 onePixelSource: [bool] = false oneway: [bool] = false pyramidHeight: [int] = 4
3D rotation : :
: :
: :
: :
: :
What went wrong? There is some structure, but not simple correspondence Need more knowledge about objects
Rectangular parallelepipeds (cuboids)
Representation by 3D point coordinates Linear classes, Vetter&Poggio
May combine linearly += + =
Only 3 dimensions ++= = Call it linear class
Linear operators Linear operator L
Example: rotation ++= =
Rotation If Then
Example: projection ++= =
Projection If Then
Example: projection + rotation ++= =
Rotation + projection If Then But also We may work entirely in 2D domain!
+ + Working in 2D : : + +
Results : :
: :
Can we apply this idea to faces? Linear class assumption: Object may be represented as linear combination of other (similar) objects ++ = Use raw images as basis Reconstruction quality will be poor PCA reconstruction is much better Can we use PCA?
+ + Using linear classes : : + +
Eigenfaces do not correspond Cannot use the same coefficients
So far: Solving Specific Tasks (Image analogies), Linear Classes Learn Interaction Generalize To New Examples Goal : General Style/Content Framework
Motivation : Linear Models Faces Images Form A linear subspace Illumination variations (of same face) can be modeled by a low-dimensional linear space (Hallinan ‘94) Eigenvector Basis Reconstruct faces Model: Linear In Style And In Content eigenfaces (Turk, Pentland ‘91)
Bilinear Models (Tenenbaum, Freeman 2000,97) is bilinear if : Linear in x Linear in y y constant
Bilinear Forms : Example x,y ∈ Real f(x,y) = xy Bilinear Forms To Model Style And Content 2 Models : Symmetric, And Asymmetric
K Image in Style s Content c Style vector I J Content vector IXJ Interaction Matrix Symmetric Bilinear Model
K Image in Style s Content c J Content vector Symmetric Bilinear Model Pixel Style vector J Style Matrix AsAs JXK
Symmetric Bilinear Model : basis vectors
Toy Example: Symmetric Model Images: Style: Color Content: Shape (0,0,1,1) (2,2,2,0) (0,3,0,3) Content: (0,0,1,1) (1,0,1,0) (0,1,1,1) Style:
Face Example - Symmetric Style: Pose, Content: Person
K Image in Style s Content c Style vector I J Content vector IXJ Interaction Matrix Asymmetric Bilinear Model WksWks
K Image in Style s Content c J Content vector Asymmetric Bilinear Model Pixel Style vector J Style Matrix AsAs JXK A s : Content Images
K Image in Style s Content c J Content vector Asymmetric Bilinear Model J
……… J K Image in Style s Content c A style specific basis Mixed by content coefficients
Content: (0,0,1,1) (1,0,1,0) (0,1,1,1) Toy Example: Asymmetric Model Style:
Face Example - Asymmetric Style: Pose, Content: Person
Training Person Illumination Style Content Image Matrix:
Training – Model Fitting Problem : Given {y sc (t) } t = 1..T find model parameters Error Minimization: Asymmetric: Closed SVD solution or Quasi-Newton methods A s, b c Free parameter: J – content vector dimension. Symmetric: Iterative solution using SVD a s, W k, b c
Content: person Style: illumination Translation - Faces Asymmetric Model Cannot handle translation!
Problem: C = 23 (faces), S = 3 (illuminations) Training : Fit a symmetric model using iterative SVD with I = S, J = C a s, W k, b c Generalization: find a s`, b c ` that minimize E* = k | y k s`c` - a s` W k b c` | 2 Alternating Iterative Linear Solution Translation : Produce a s` W k b c, a s W k b c` for each s and c. Translation – Symmetric Model
Translation - Results
Extrapolation Content: Letter, Style: Font Main Problem : Image Representation Linear combinations of letters should look like a letter.
A Displacement Vector Warp Map
Coulomb Warp Map For unique mapping: physical model of electrostatic forces. Linear combination of letters looks like a letter
Extrapolation Scheme Fit An Asymmetric bilinear model S = 5 training fonts (styles), C = 62 characters (content), K=2888 data dim. closed-form SVD A s, b c C={c 1,…,c M } letters in a new style s’ find best fitting A s’ Minimize : E* = c ║y s’c - A s’ b c ║ ∂E*/ ∂ A s’ = 0 Set J High(~60) Overfitting on Test Data (173,280 degrees of freedom!)
Constraint: Close To Symmetric A OLC = s α s A s (Optimal Linear Combination) α s - style parameters (symmetric) Minimize: E* = c ║y s’c - A s’ b c ║ + λ║ A s’ - A OLC ║ ∂E*/ ∂ A s’ = 0 Extrapolate missing letters by: y s’c = A s’ b c
Results
Symmetric vs. Aymmetric Can reduce dimensionality of factors Learns the structure of factor interactions : handles translation More Flexible Too flexible: overfitting Cannot handle translation Can be overcome by combining both
Bilinear Models (Tenenbaum, Freeman 2000,97) General framework for two factor problems Explicit parameterized representations of each factor and their interaction Natural generalization for extrapolation and translation tasks Fast algorithms (SVD) Pros: Cons: Assumes Linearity In Each Factor Find Clever Input Representations Decompose To Sub Problems
Example-Based Style Synthesis ( Ido Drori Hezi Yeshurun Daniel Cohen-Or 03)
Algorithm Outline 1.Divide Image To Overlapping Tiles
Algorithm Outline 1.Divide Image To Overlapping Tiles 2. Find Best Match In Each Scene
Algorithm Outline 1.Divide Image To Overlapping Tiles 2. Find Best Match In Each Scene 3.Synthesize tiles By Bilinear Model
Algorithm Outline 1.Divide Image To Overlapping Tiles 2. Find Best Match In Each Scene 3.Synthesize tiles By Bilinear Model 4. Image Quilting
Algorithm Outline 1.Decompose Image To Tiles 2. Find Best Match In Each Scene 3.Synthesize tiles By Bilinear Model 4. Image Quilting 5. Image Analogies
Create Gaussian Pyramids For Examples And Input Images Apply Algorithm To Each Level From Coarse To Fine
Finding Best Matching Fragment Similar GeometryAgreeing Boundaries V search = (,,,,, ) Gradient Laplacian Luminance For Each Training Scene : Create V In every Position And Orientation Search For Nearest Neighbor