11-755 Machine Learning for Signal Processing Latent Variable Models and Signal Separation Class 12. 11 Oct 2011 11 Oct 2011111755/18797.

Slides:



Advertisements
Similar presentations
Very simple to create with each dot representing a data value. Best for non continuous data but can be made for and quantitative data 2004 US Womens Soccer.
Advertisements

Image Modeling & Segmentation
Wavelets Fast Multiresolution Image Querying Jacobs et.al. SIGGRAPH95.
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
November 12, 2013Computer Vision Lecture 12: Texture 1Signature Another popular method of representing shape is called the signature. In order to compute.
Chapter 3 Image Enhancement in the Spatial Domain.
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
The General Linear Model Or, What the Hell’s Going on During Estimation?
K Means Clustering , Nearest Cluster and Gaussian Mixture
Image Processing IB Paper 8 – Part A Ognjen Arandjelović Ognjen Arandjelović
Reading Graphs and Charts are more attractive and easy to understand than tables enable the reader to ‘see’ patterns in the data are easy to use for comparisons.
Low Complexity Keypoint Recognition and Pose Estimation Vincent Lepetit.
Complex Feature Recognition: A Bayesian Approach for Learning to Recognize Objects by Paul A. Viola Presented By: Emrah Ceyhan Divin Proothi Sherwin Shaidee.
Informationsteknologi Wednesday, November 7, 2007Computer Graphics - Class 51 Today’s class Geometric objects and transformations.
2008 SIAM Conference on Imaging Science July 7, 2008 Jason A. Palmer
What is Statistical Modeling
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Image processing. Image operations Operations on an image –Linear filtering –Non-linear filtering –Transformations –Noise removal –Segmentation.
Dimensional reduction, PCA
Course Website: Computer Graphics 11: 3D Object Representations – Octrees & Fractals.
ICA Alphan Altinok. Outline  PCA  ICA  Foundation  Ambiguities  Algorithms  Examples  Papers.
Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.
Radial Basis Function Networks
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Fundamentals Rawesak Tanawongsuwan
Psy B07 Chapter 1Slide 1 ANALYSIS OF VARIANCE. Psy B07 Chapter 1Slide 2 t-test refresher  In chapter 7 we talked about analyses that could be conducted.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Example Clustered Transformations MAP Adaptation Resources: ECE 7000:
Towers of Hanoi. Introduction This problem is discussed in many maths texts, And in computer science an AI as an illustration of recursion and problem.
Multimodal Interaction Dr. Mike Spann
Mining Discriminative Components With Low-Rank and Sparsity Constraints for Face Recognition Qiang Zhang, Baoxin Li Computer Science and Engineering Arizona.
Chapter 2 Frequency Distributions
Statistics for Managers Using Microsoft Excel, 5e © 2008 Pearson Prentice-Hall, Inc.Chap 8-1 Statistics for Managers Using Microsoft® Excel 5th Edition.
Some probability distribution The Normal Distribution
Signal Processing and Representation Theory Lecture 4.
Parallel Algorithms Patrick Cozzi University of Pennsylvania CIS Spring 2012.
First topic: clustering and pattern recognition Marc Sobel.
Chapter 7 Sampling Distributions Statistics for Business (Env) 1.
A Sparse Non-Parametric Approach for Single Channel Separation of Known Sounds Paris Smaragdis, Madhusudana Shashanka, Bhiksha Raj NIPS 2009.
Image Modeling & Segmentation Aly Farag and Asem Ali Lecture #2.
OLAP Recap 3 characteristics of OLAP cubes: Large data sets ~ Gb, Tb Expected Query : Aggregation Infrequent updates Star Schema : Hierarchical Dimensions.
A Statistical Method for 3D Object Detection Applied to Face and Cars CVPR 2000 Henry Schneiderman and Takeo Kanade Robotics Institute, Carnegie Mellon.
Fields of Experts: A Framework for Learning Image Priors (Mon) Young Ki Baik, Computer Vision Lab.
Marwan Al-Namari 1 Digital Representations. Bits and Bytes Devices can only be in one of two states 0 or 1, yes or no, on or off, … Bit: a unit of data.
Inference: Probabilities and Distributions Feb , 2012.
CS COMPUTER GRAPHICS LABORATORY. LIST OF EXPERIMENTS 1.Implementation of Bresenhams Algorithm – Line, Circle, Ellipse. 2.Implementation of Line,
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: MLLR For Two Gaussians Mean and Variance Adaptation MATLB Example Resources:
1Ellen L. Walker 3D Vision Why? The world is 3D Not all useful information is readily available in 2D Why so hard? “Inverse problem”: one image = many.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Intro. ANN & Fuzzy Systems Lecture 16. Classification (II): Practical Considerations.
Non-negative Matrix Factor Deconvolution; Extraction of Multiple Sound Sources from Monophonic Inputs C.G. Puntonet and A. Prieto (Eds.): ICA 2004 Presenter.
The accuracy of averages We learned how to make inference from the sample to the population: Counting the percentages. Here we begin to learn how to make.
Chapter 2 Frequency Distributions PowerPoint Lecture Slides Essentials of Statistics for the Behavioral Sciences Seventh Edition by Frederick J Gravetter.
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
Yun, Hyuk Jin. Theory A.Nonuniformity Model where at location x, v is the measured signal, u is the true signal emitted by the tissue, is an unknown.
Hidden Markov Models BMI/CS 576
Normal Distribution.
Data Mining K-means Algorithm
Last update on June 15, 2010 Doug Young Suh
Machine Learning Basics
CHAPTER 3 DATA AND SIGNAL
Computer Vision Lecture 4: Color
Igor V. Cadez, Padhraic Smyth, Geoff J. Mclachlan, Christine and E
Sam Norman-Haignere, Nancy G. Kanwisher, Josh H. McDermott  Neuron 
Parallelization of Sparse Coding & Dictionary Learning
Nonparametric Hypothesis Tests for Dependency Structures
Machine Learning for Signal Processing Expectation Maximization Mixture Models Bhiksha Raj Week Apr /18797.
Intensity Transformation
Geology 491 Spectral Analysis
Lecture 16. Classification (II): Practical Considerations
Presentation transcript:

Machine Learning for Signal Processing Latent Variable Models and Signal Separation Class Oct Oct /18797

Summary So Far PLCA:  The basic mixture-multinomial model for audio (and other data) Sparse Decomposition:  The notion of sparsity and how it can be imposed on learning Sparse Overcomplete Decomposition:  The notion of overcomplete basis set Example-based representations  Using the training data itself as our representation 11 Oct 20112

11755/18797 Next up: Shift/Transform Invariance Sometimes the “typical” structures that compose a sound are wider than one spectral frame  E.g. in the above example we note multiple examples of a pattern that spans several frames 11 Oct 20113

11755/18797 Next up: Shift/Transform Invariance Sometimes the “typical” structures that compose a sound are wider than one spectral frame  E.g. in the above example we note multiple examples of a pattern that spans several frames Multiframe patterns may also be local in frequency  E.g. the two green patches are similar only in the region enclosed by the blue box 11 Oct 20114

11755/18797 Patches are more representative than frames Four bars from a music example The spectral patterns are actually patches  Not all frequencies fall off in time at the same rate The basic unit is a spectral patch, not a spectrum 11 Oct 20115

11755/18797 Images: Patches often form the image A typical image component may be viewed as a patch  The alien invaders  Face like patches  A car like patch overlaid on itself many times.. 11 Oct 20116

11755/18797 Shift-invariant modelling A shift-invariant model permits individual bases to be patches Each patch composes the entire image. The data is a sum of the compositions from individual patches 11 Oct 20117

11755/18797 Shift Invariance in one Dimension Our bases are now “patches”  Typical spectro-temporal structures The urns now represent patches  Each draw results in a (t,f) pair, rather than only f  Also associated with each urn: A shift probability distribution P(T|z) The overall drawing process is slightly more complex Repeat the following process:  Select an urn Z with a probability P(Z)  Draw a value T from P(t|Z)  Draw (t,f) pair from the urn  Add to the histogram at (t+T, f) Oct 20118

11755/18797 Shift Invariance in one Dimension The process is shift-invariant because the probability of drawing a shift P(T|Z) does not affect the probability of selecting urn Z Every location in the spectrogram has contributions from every urn patch Oct 20119

11755/18797 Shift Invariance in one Dimension The process is shift-invariant because the probability of drawing a shift P(T|Z) does not affect the probability of selecting urn Z Every location in the spectrogram has contributions from every urn patch 11 Oct

11755/18797 Shift Invariance in one Dimension The process is shift-invariant because the probability of drawing a shift P(T|Z) does not affect the probability of selecting urn Z Every location in the spectrogram has contributions from every urn patch 11 Oct

11755/18797 Probability of drawing a particular (t,f) combination The parameters of the model:  P(t,f|z) – the urns  P(T|z) – the urn-specific shift distribution  P(z) – probability of selecting an urn The ways in which (t,f) can be drawn:  Select any urn z  Draw T from the urn-specific shift distribution  Draw (t-T,f) from the urn The actual probability sums this over all shifts and urns 11 Oct

11755/18797 Learning the Model The parameters of the model are learned analogously to the manner in which mixture multinomials are learned Given observation of (t,f), it we knew which urn it came from and the shift, we could compute all probabilities by counting!  If shift is T and urn is Z Count(Z) = Count(Z) + 1 For shift probability: Count(T|Z) = Count(T|Z)+1 For urn: Count(t-T,f | Z) = Count(t-T,f|Z) + 1  Since the value drawn from the urn was t-T,f  After all observations are counted: Normalize Count(Z) to get P(Z) Normalize Count(T|Z) to get P(T|Z) Normalize Count(t,f|Z) to get P(t,f|Z) Problem: When learning the urns and shift distributions from a histogram, the urn (Z) and shift (T) for any draw of (t,f) is not known  These are unseen variables 11 Oct

11755/18797 Learning the Model Urn Z and shift T are unknown  So (t,f) contributes partial counts to every value of T and Z  Contributions are proportional to the a posteriori probability of Z and T,Z Each observation of (t,f)  P(z|t,f) to the count of the total number of draws from the urn Count(Z) = Count(Z) + P(z | t,f)  P(z|t,f)P(T | z,t,f) to the count of the shift T for the shift distribution Count(T | Z) = Count(T | Z) + P(z|t,f)P(T | Z, t, f)  P(z|t,f)P(T | z,t,f) to the count of (t-T, f) for the urn Count(t-T,f | Z) = Count(t-T,f | Z) + P(z|t,f)P(T | z,t,f) 11 Oct

11755/18797 Shift invariant model: Update Rules Given data (spectrogram) S(t,f) Initialize P(Z), P(T|Z), P(t,f | Z) Iterate 11 Oct

11755/18797 Shift-invariance in one time: example An Example: Two distinct sounds occuring with different repetition rates within a signal  Modelled as being composed from two time-frequency bases  NOTE: Width of patches must be specified INPUT SPECTROGRAM Discovered time-frequency “patch” bases (urns) Contribution of individual bases to the recording 11 Oct

Shift Invariance in Time: Dereverberation Reverberation – a simple model  The Spectrogram of the reverberated signal is a sum of the spectrogram of the clean signal and several shifted and scaled versions of itself  A convolution of the spectrogram and a room response = + 11 Oct /18797

Dereverberation Given the spectrogram of the reverberated signal:  Learn a shift-invariant model with a single patch basis Sparsity must be enforced on the basis  The “basis” represents the clean speech! 11 Oct /18797

Shift Invariance in Two Dimensions We now have urn-specific shifts along both T and F The Drawing Process  Select an urn Z with a probability P(Z)  Draw SHIFT values (T,F) from P s (T,F|Z)  Draw (t,f) pair from the urn  Add to the histogram at (t+T, f+F) This is a two-dimensional shift-invariant model  We have shifts in both time and frequency Or, more generically, along both axes 11 Oct

11755/18797 Learning the Model Learning is analogous to the 1-D case Given observation of (t,f), it we knew which urn it came from and the shift, we could compute all probabilities by counting!  If shift is T,F and urn is Z Count(Z) = Count(Z) + 1 For shift probability: ShiftCount(T,F|Z) = ShiftCount(T,F|Z)+1 For urn: Count(t-T,f-F | Z) = Count(t-T,f-F|Z) + 1  Since the value drawn from the urn was t-T,f-F  After all observations are counted: Normalize Count(Z) to get P(Z) Normalize ShiftCount(T,F|Z) to get P s (T,F|Z) Normalize Count(t,f|Z) to get P(t,f|Z) Problem: Shift and Urn are unknown 11 Oct

11755/18797 Learning the Model Urn Z and shift T,F are unknown  So (t,f) contributes partial counts to every value of T,F and Z  Contributions are proportional to the a posteriori probability of Z and T,F|Z Each observation of (t,f)  P(z|t,f) to the count of the total number of draws from the urn Count(Z) = Count(Z) + P(z | t,f)  P(z|t,f)P(T,F | z,t,f) to the count of the shift T,F for the shift distribution ShiftCount(T,F | Z) = ShiftCount(T,F | Z) + P(z|t,f)P(T | Z, t, f)  P(T | z,t,f) to the count of (t-T, f-F) for the urn Count(t-T,f-F | Z) = Count(t-T,f-F | Z) + P(z|t,f)P(t-T,f-F | z,t,f) 11 Oct

11755/18797 Shift invariant model: Update Rules Given data (spectrogram) S(t,f) Initialize P(Z), P s (T,F|Z), P(t,f | Z) Iterate 11 Oct

11755/ D Shift Invariance: The problem of indeterminacy P(t,f|Z) and P s (T,F|Z) are analogous  Difficult to specify which will be the “urn” and which the “shift” Additional constraints required to ensure that one of them is clearly the shift and the other the urn Typical solution: Enforce sparsity on P s (T,F|Z)  The patch represented by the urn occurs only in a few locations in the data 11 Oct

11755/18797 Example: 2-D shift invariance Only one “patch” used to model the image (i.e. a single urn)  The learnt urn is an “average” face, the learned shifts show the locations of faces 11 Oct

11755/18797 Example: 2-D shift invarince The original figure has multiple handwritten renderings of three characters  In different colours The algorithm learns the three characters and identifies their locations in the figure Input data Discovered Patches Patch Locations 11 Oct

11755/18797 Beyond shift-invariance: transform invariance The draws from the urns may not only be shifted, but also transformed The arithmetic remains very similar to the shift- invariant model  We must now impose one of an enumerated set of transforms to (t,f), after shifting them by (T,F)  In the estimation, the precise transform applied is an unseen variable Oct

Transform invariance: Generation The set of transforms is enumerable  E.g. scaling by 0.9, scaling by 1.1, rotation right by 90degrees, rotation left by 90 degrees, rotation by 180 degrees, reflection  Transformations can be chosen by draws from a distribution over transforms E.g. P(rotation by 90 degrees) = Distributions are URN SPECIFIC The drawing process:  Select an urn Z (patch)  Select a shift (T,F) from P s (T, F| Z)  Select a transform from P(txfm | Z)  Select a (t,f) pair from P(t,f | Z)  Transform (t,f) to txfm(t,f)  Increment the histogram at txfm(t,f) + (T,F) 11755/ Oct

Transform invariance The learning algorithm must now estimate  P(Z) – probability of selecting urn/patch in any draw  P(t,f|Z) – the urns / patches  P(txfm | Z) – the urn specific distribution over transforms  P s (T,F|Z) – the urn-specific shift distribution Essentially determines what the basic shapes are, where they occur in the data and how they are transformed The mathematics for learning are similar to the maths for shift invariance  With the addition that each instance of a draw must be fractured into urns, shifts AND transforms Details of learning are left as an exercise  Alternately, refer to Madhusudana Shashanka’s PhD thesis at BU 11755/ Oct

11755/18797 Example: Transform Invariance Top left: Original figure Bottom left – the two bases discovered Bottom right –  Left panel, positions of “a”  Right panel, positions of “l” Top right: estimated distribution underlying original figure 11 Oct

Transform invariance: model limitations and extensions The current model only allows one transform to be applied at any draw  E.g. a basis may be rotated or scaled, but not scaled and rotated An obvious extension is to permit combinations of transformations  Model must be extended to draw the combination from some distribution Data dimensionality: All examples so far assume only two dimensions (e.g. in spectrogram or image) The models are trivially extended to higher- dimensional data 11755/ Oct

11755/18797 Transform Invariance: Uses and Limitations Not very useful to analyze audio May be used to analyze images and video Main restriction: Computational complexity  Requires unreasonable amounts of memory and CPU  Efficient implementation an open issue 11 Oct

11755/18797 Example: Higher dimensional data Video example 11 Oct