Presentation is loading. Please wait.

Presentation is loading. Please wait.

Some Statistics Stuff (A.K.A. Shamelessly Stolen Stuff)

Similar presentations


Presentation on theme: "Some Statistics Stuff (A.K.A. Shamelessly Stolen Stuff)"— Presentation transcript:

1 Some Statistics Stuff (A.K.A. Shamelessly Stolen Stuff)

2 Blind Source Separation Suppose we have a data set that  Has many independent components or channels  Audio track recorded from multiple microphones  Series of brain images with multiple voxels  Seismographic data from multiple seismometers  We believe is driven by several independent processes  Different people speaking into microphones  Different neuronal processes occurring within the brain  Different parts of the fault moving  We have no a priori notion of what those processes look like Our goal is to figure out what the different processes are by grouping together data that is correlated

3 ICA (Independent Component Analysis) Put the raw data into a vector X, of length m. We want to find sources or “components”, a vector S, of length n. We imagine that X is produced by some mixing matrix A: X = A * S Or alternatively S is estimated by a weight matrix W: S = W * X We want to find components that are (somehow) maximally independent, yet generate X. * *Note: X isn’t actually raw data

4 Spelling Things Out The meaning of the ICA equation: we’re choosing a basis. e.g. if W =.6 and W =.2, then S =.6 X +.2 X. That is, X and X are actually being generated (at least partly) by the process S. 11 12 1 1 2 1 2 1 X is typically a time series ---that is, X is measured at discrete intervals. However, our basis doesn’t change, because the fundamental processes that are at work is presumed to be constant. Because of this, W is constant in time, and S changes with time. The end result of ICA is then S(t), and W, which tells us the activity of each component, and how to generate the original data from the components.

5 Er… “Maximally Independent”? Correlated:Uncorrelated: Technical, and the definition depends somewhat on the algorithm being used. Ultimately boils down to cross-correlations. If two variables are uncorrelated, they are independent. Images from web page by Aapo Hyvärinen, http://www.cis.hut.fi/aapo/papers/NCS99web

6 Maximally Independent is NOT: Orthogonal. In general, AS AS  0. This is different than many other techniques. ICA attempts to group correlated data, but sometimes different parts of the data overlap in a non-orthogonal manner. 1 2.

7 Requirements  At most one gaussian-distributed element of data  The number of independent data must be greater than the number of components: m > n. E.g. number of microphones greater than number of voices.

8 For More Info Check out Wikipedia (seriously). The article on independent component analysis is  Actually good.  Provides links to software packages for C++, Java, Matlab, etc. See especially FastICA.  Many of the external links provide good overviews as well.

9 The Aftermath… Great! Now that we have what we’ve always wanted (a list of “components”) what do we do with them? Since ICA is “blind” it doesn’t tells us much about the components. We may simply be interested in data reduction, or categorizing the mechanisms at work. We may be interested in components that correlate with some signal that we drove the experiment with.

10 An Alternative or a Supplement: Partial Least Squares Partial Least Squares (PLS) is a technique that is NOT “blind”. PLS attempts to find components that correlate maximally with one or a few signals that drive the system. Example: psychology trials where patients alternate between different tests. Because the result of PLS will be orthogonal components, care must be taken that the signals driving the system are truly orthogonal (or alternatively, they should be transformed to an orthogonal basis before using PLS).

11 Technical Detail The basic PLS technique is to form a cross-correlation matrix between the signals and the data. Then use singular-value decomposition to find special vectors (analogous to eigenvectors) that explain the variance. These singular vectors are the components, and they are the vectors that minimize the “sum of squares” error when attempting to reconstruct the data from the applied signals.


Download ppt "Some Statistics Stuff (A.K.A. Shamelessly Stolen Stuff)"

Similar presentations


Ads by Google