“ Pixels that Sound ” Find pixels that correspond (correlate !?) to sound Kidron, Schechner, Elad, CVPR 2005 34.

Slides:



Advertisements
Similar presentations
FMRI Methods Lecture 10 – Using natural stimuli. Reductionism Reducing complex things into simpler components Explaining the whole as a sum of its parts.
Advertisements

The fundamental matrix F
Machine Learning Lecture 8 Data Processing and Representation
Two-View Geometry CS Sastry and Yang
Robust Principle Component Analysis Based 4D Computed Tomography Hongkai Zhao Department of Mathematics, UC Irvine Joint work with H. Gao, J. Cai and Z.
Bayesian Robust Principal Component Analysis Presenter: Raghu Ranganathan ECE / CMR Tennessee Technological University January 21, 2011 Reading Group (Xinghao.
In ♫ ♫ otion Harmony Zohar Barzelay, Yoav Y. Schechner Dept. Elect. Eng. Technion – Israel Institute of Technology 1 Ack: Einav Namer, Yael Waissman, ISF.
1cs542g-term High Dimensional Data  So far we’ve considered scalar data values f i (or interpolated/approximated each component of vector values.
MASKS © 2004 Invitation to 3D vision Lecture 8 Segmentation of Dynamical Scenes.
Principal Component Analysis
Sparse and Overcomplete Data Representation
Some useful linear algebra. Linearly independent vectors span(V): span of vector space V is all linear combinations of vectors v i, i.e.
Image Denoising via Learned Dictionaries and Sparse Representations
Uncalibrated Geometry & Stratification Sastry and Yang
Multiple View Geometry Marc Pollefeys University of North Carolina at Chapel Hill Modified by Philippos Mordohai.
Audio-Visual Graphical Models Matthew Beal Gatsby Unit University College London Nebojsa Jojic Microsoft Research Redmond, Washington Hagai Attias Microsoft.
© 2003 by Davi GeigerComputer Vision October 2003 L1.1 Structure-from-EgoMotion (based on notes from David Jacobs, CS-Maryland) Determining the 3-D structure.
Agenda The Subspace Clustering Problem Computer Vision Applications
Presented by Pat Chan Pik Wah 28/04/2005 Qualifying Examination
1cs542g-term Notes  Extra class next week (Oct 12, not this Friday)  To submit your assignment: me the URL of a page containing (links to)
Epipolar geometry Class 5. Geometric Computer Vision course schedule (tentative) LectureExercise Sept 16Introduction- Sept 23Geometry & Camera modelCamera.
Linear Algebra and Image Processing
Summarized by Soo-Jin Kim
Probability of Error Feature vectors typically have dimensions greater than 50. Classification accuracy depends upon the dimensionality and the amount.
Presented By Wanchen Lu 2/25/2013
What’s Making That Sound ?
BMI II SS06 – Class 3 “Linear Algebra” Slide 1 Biomedical Imaging II Class 3 – Mathematical Preliminaries: Elementary Linear Algebra 2/13/06.
Computing the Fundamental matrix Peter Praženica FMFI UK May 5, 2008.
Outline Separating Hyperplanes – Separable Case
Informatics and Mathematical Modelling / Intelligent Signal Processing 1 EUSIPCO’09 27 August 2009 Tuning Pruning in Sparse Non-negative Matrix Factorization.
CS 4487/6587 Algorithms for Image Analysis
Vector Norms and the related Matrix Norms. Properties of a Vector Norm: Euclidean Vector Norm: Riemannian metric:
Machine Learning for Computer graphics Aaron Hertzmann University of Toronto Bayesian.
December 9, 2014Computer Vision Lecture 23: Motion Analysis 1 Now we will talk about… Motion Analysis.
Lec 22: Stereo CS4670 / 5670: Computer Vision Kavita Bala.
ECE 8443 – Pattern Recognition LECTURE 08: DIMENSIONALITY, PRINCIPAL COMPONENTS ANALYSIS Objectives: Data Considerations Computational Complexity Overfitting.
Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1.
MASKS © 2004 Invitation to 3D vision Uncalibrated Camera Chapter 6 Reconstruction from Two Uncalibrated Views Modified by L A Rønningen Oct 2008.
Principal Component Analysis and Linear Discriminant Analysis for Feature Reduction Jieping Ye Department of Computer Science and Engineering Arizona State.
John Morris Stereo Vision (continued) Iolanthe returns to the Waitemata Harbour.
Uncalibrated reconstruction Calibration with a rig Uncalibrated epipolar geometry Ambiguities in image formation Stratified reconstruction Autocalibration.
Structure from Motion Paul Heckbert, Nov , Image-Based Modeling and Rendering.
Ultra-high dimensional feature selection Yun Li
Feel the beat: using cross-modal rhythm to integrate perception of objects, others, and self Paul Fitzpatrick and Artur M. Arsenio CSAIL, MIT.
Jianchao Yang, John Wright, Thomas Huang, Yi Ma CVPR 2008 Image Super-Resolution as Sparse Representation of Raw Image Patches.
EMPIRICAL ORTHOGONAL FUNCTIONS 2 different modes SabrinaKrista Gisselle Lauren.
Dimension reduction (2) EDR space Sliced inverse regression Multi-dimensional LDA Partial Least Squares Network Component analysis.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Lecture 16: Image alignment
Biointelligence Laboratory, Seoul National University
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
EMPIRICAL ORTHOGONAL FUNCTIONS
A Closed Form Solution to Direct Motion Segmentation
Computer Vision, Robotics, Machine Learning and Control Lab
René Vidal and Xiaodong Fan Center for Imaging Science
Segmentation of Dynamic Scenes
René Vidal Time/Place: T-Th 4.30pm-6pm, Hodson 301
Segmentation of Dynamic Scenes from Image Intensities
Optical Flow Estimation and Segmentation of Moving Dynamic Textures
3D Photography: Epipolar geometry
Structure from motion Input: Output: (Tomasi and Kanade)
Singular Value Decomposition
Image Segmentation Techniques
Outline Linear Shift-invariant system Linear filters
A Motivating Application: Sensor Array Signal Processing
Uncalibrated Geometry & Stratification
Optimal sparse representations in general overcomplete bases
Principal Component Analysis
Structure from motion Input: Output: (Tomasi and Kanade)
Outline Sparse Reconstruction RIP Condition
Presentation transcript:

“ Pixels that Sound ” Find pixels that correspond (correlate !?) to sound Kidron, Schechner, Elad, CVPR

Audio-Visual Analysis: Applications Lip reading – detection of lips (or person) Slaney, Covell (2000) Bregler, Konig (1994) Analysis and synthesis of music from motion Murphy, Andersen, Jensen (2003) Source separation based on vision Li, Dimitrova, Li, Sethi (2003) Smaragdis, Casey (2003) Nock, Iyengar, Neti (2002) Fisher, Darrell, Freeman, Viola (2001) Hershey, Movellan (1999) Tracking Vermaak, Gangnet, Blake, Pérez (2001) Biological systems Gutfreund, Zheng, Knudsen (2002) 47

Problem: Different Modalities camera microphone audio-visual analysis Visual data 25 frames/sec Each frame: 576 x 720 pixels Audio data 44.1 KHz, few bands Not stereophonic Kidron, Schechner, Elad, Pixels that Sound 47

Previous Work  Pointwise correlation Nock, Iyengar, Neti (2002) Hershey, Movellan (1999) Ill-posed (lack of data) Canonical Correlation Analysis (CCA) Smaragdis, Casey (2003) Li, Dimitrova, Li, Sethi (2003) Slaney, Covell (2000)  Cluster of pixels - linear superposition Mutual Information (MI) Fisher et. al. (2001) Cutler, Davis (2000) Bregler,Konig (1994) Not Typical highly complex 54

Kidron, Schechner, Elad, Pixels that Sound 49 Projection VideoAudio Pixel #1 Pixel #2 Pixel #3 Band #1 Band #2 Optimal Optimal visual components CCA

Visual Projection 1D variable Projection Video features Pixels intensity Transform coeff (wavelet) Image differences v 40

Audio Projection 1D variable Projection Audio features Average energy per frame Transform coeffs per frame a 41

Canonical Correlation Video Audio Representation Projections (per time window) Random variables (time dependent) Correlation coefficient 42

CCA Formulation yield an eigenvalue problem: Knutsson, Borga, Landelius (1995) Canonical Correlation Projections Largest Eigenvalue equivalent to Corresponding Eigenvectors 43

Visual Data t (frames) Spatial Location (pixels intensities) Kidron, Schechner, Elad, Pixels that Sound 51

Rank Deficiency t (frames) Spatial Location (pixels intensities) = Kidron, Schechner, Elad, Pixels that Sound 44

Estimation of Covariance Rank deficient 45

Ill-Posedness Prior solutions: Use many more frames  poor temporal resolution. Aggressive spatial pruning  poor spatial resolution. Trivial regularization Impossible to invert !!! 46

A General Problem Small amount of data The problem is ILL-POSED Over fitting is likely Large number of weights 47

An Equivalent Problem Minimizing Maximizing 48

Single Audio Band (The denominator is non-zero) Minimizing Known data A has a single column, and 49

= Time a(t i ) a (1) a (30) a (2) V a Full correlation if Underdetermined system ! Kidron, Schechner, Elad, Pixels that Sound 52 end

Detected correlated pixels “Out of clutter, find simplicity. From discord, find harmony.” Albert Einstein 52 end

Sparse Solution Non-convex Exponential complexity -norm minimum 53

The -norm criterion Sparse Convex Polynomial complexity in common situations -norm minimum Donoho, Elad (2005) 54

The Minimum Norm Solution Energy spread -norm minimum Solving using -norm (pseudo-inverse, SVD, QR) 55

Linear programming Fully correlated Sparse No parameters to tweak Polynomial Audio-visual events Maximum correlation: Eigenproblem Minimum objective function G 56

Multiple Audio Bands - Solution -ball Non-convex constraint Convex Linear The optimization problem: 57

Multiple Audio Bands Optimization over each face is: S1S1 S2S2 S3S3 S4S4 No parameters to tweak Each face: linear programming 58

Sharp & Dynamic, Despite Distraction Frame 9Frame 42Frame 68 Frame 115Frame 146Frame 169

Frame 51 Frame 106 Frame 83 Frame 177 Sparse Localization on the proper elements False alarm – temporally inconsistent Handling dynamics Performing in Audio Noise

–norm: Energy Spread Movie #1Movie #2 Frame 83Frame

–norm: Localization Movie #1Movie #2 Frame 83Frame

The “Chorus Ambiguity” Who’s talking? Synchronized talk Not unique (ambiguous) Possible solutions: Left Right Both

The “Chorus Ambiguity” -norm feature 1 feature 2 feature 1 feature 2 Both