Estimating Dependency and Significance for High-Dimensional Data

Estimating Dependency and Significance for High-Dimensional Data
Michael R. Siracusa* Kinh Tieu*, Alexander T. Ihler §, John W. Fisher *§, Alan S. Willsky § * Computer Science and Artificial Intelligence Laboratory § Laboratory for Information and Decision Systems Hi, I’m michael Siracusa and I’m going to talk to you about Estimating Dependency and Significance for High Dimensional data. This is joint work with Kinh Tieu, Alex Ihler, John Fisher and Alan Willsky.

Do these depend on each other (and how)?
Lets start with a simple example. We have four high-dimensional data sources, captured simultaneously. The source measurements are represented here in 4 different quadrants, reshaped to look like images. Now suppose, we are interested in the statistical dependency between these measurements rather than within them. The question we are interested in is whether or not we can assess dependency between measurements without explicitly estimating the dependency internal to individual measurements. STOP HERE This is a simple question to answer with this seemingly complex high dimensional data. And I do want to point out that if you stare at this long enough you will probably be able to see that there is dome dependence. We will come back to this example at then end of the talk.

How do we estimate the dependency? From a single realization?
Premise : In many high-dimensional data sources, statistical dependency can be well explained by a lower dimensional latent variable: Intuition: The complexity of the problem is influenced more by the the hypothesis rather than the data. How do we estimate the dependency? From a single realization? How do we avoid strong modeling assumptions? How do we estimate significance? We start with the simple premise that in many cases, dependency across high-dimensional measurements can be well explained by some lower dimensional latent variable. While this is neither a purely geometric nor statistical argument we will resort primarily to a statistical point of view. We’ll cast this problem as a hypothesis test between dependency structures. Additionally, we’re interested in whether we can do this reliably from a single realization and without strong modeling assumptions. Finally, we’ll discuss how to estimate significance values. STOP HERE The intutition is that the complexity of our problem is governed more by the complexity hypothesis space rather than the complexity of the data. We will show that answers to some questions regarding high-dimensional data can be obtained with conceptually straightforward methods (although the underlying details may not be that straightforward). Some of the questions we want to answer is how do we estimate dependency from a single realization of data, and how can we do this while avoiding strong modeling assumptions like assuming gaussianity. Our approach is greatly aided by adopting an information theoretic perspective. We will also discuss how we may the estimate significance of our result. This is the main point of the paper and slightly less emphasized in this talk.

Structure (Graphical Model) Parameterization (Nuisance)
Dependency Structure (Graphical Model) Parameterization (Nuisance) It turns out, that Dependency naturally decomposes into two aspects. Structure, indicated by a graphical model, and parameterization, which we’ll treat as a nusiance.

Dependence: An example
VS To be concrete lets start with a simple example with three sources. Our hypothesis test is between independent data indicated by this graphical model, and dependent shown in this graphical model. It’s important to note that although this is a simple composite, the second hypothesis is in fact composite since this model on the right can explain this entire family of structures. More generally, both hypotheses would be composite.

Factorization Test (In General)
Let’s consider what we call a factorization test which can be loosely described as hypothesis between grouping of variables. Which one would then plug into a log likelihood ratio. Conditioned on H1 implies this graphical *AND* the associated models parameters. And similarly for H0 STOP So we will start by introducing this concept of a factorization test discussed in Ihler and Fisher’s journal paper. We have two different factorizaiton we are testing between. Ie. These two. They introduce this set notation describing the distribution as a product of groups/subsets of data source observations. In both of these hypothese there are two groups. On the left they are x1 x3 and x2 x4. On the right x1 is it’s own group and the rest another. They also define this intersection set which for this case is simple independent data. The average log likelihood ratio is defined as expected. So ignoring all this confusing notation all we care about is that this is to test between two different dependency structures and we can show that asympotically (or in expectation)

Asymptotics Independent vs Some Dependency: 1. : data is independent
Statistical Dependence Model Differences Model Differences Statistical Dependence Independent vs Some Dependency: Let’s look at the asymptotics of the likelihood ratio. There are several points here, one of which is to relate log-likelihood ratios to information theoretic measures. This is because going forward we’ll treat them as equivalent. Additionally, for factorization tests, conditioned on either hypothesis, the K-L divergences terms decompose into two sets. The first describe the dependency structure, while the second are related to the parameterization. Going to my specific 3 variable, where the test is between mutually indepedendent or not Now lets see what happens to this the specific case we are dealing with the majority of this talk. Independent vs Some dependency. That is H_0 is that the data is fully independent. Since there is no dependency under H_0 we see that term goes to zero. Also we don’t actually know these distributions so if we assume we have some consistent density estimator we replace or these p’s with p hats.. Bam. And lastly for our problems we only have a single realization of data and we don’t know hypothesis it came from, this our consitent density estimators will produce the same results for both these densities in the model difference terms, so those go away as well. : data is independent 2. We don’t have the true distributions 3. We are only give a single realization

Factorization Test (cont)
Questions: How do we obtain samples under each factorization? How do we estimate D(||) when x is high dimensional? How do we estimate significance? So in general we are left with terms like this. An estimate of the KL divergence between the data under the two different factorizations. Note that it will be positive, and zero only if we receive independent data and then our joint estimate will be equally to the product of the marginals. And just to point out this is just a generalization of what normally people think of when discussing dependence we see for the two variable case this is simply Mutual information , and if we impose Gaussiantiy for 2 dimensional data it will just come down to estimating correlation coeifficient. But in general how do we compute this term? We can use non-parameteric density estimation. Still some questions remaining.

Drawing Samples From a single realization
Only have 1 realization to estimate the joint But, Can obtain N! sample draws from H0 Given the structure of the hypothesis, it turns out that we can easily obtain samples from H0 even when the actual samples are from H1. We do using permuations, a well known technique in the statistics. permutations

High Dimensional Data VS From the Data Processing Inequality:

High Dimensional Data (cont)
Sufficiency: For High dimensional data Maximize left side of bound Gaussian w/ Linear Projections Close form solution (Eigenvalue problem): Kullback 68 Nonparametric Gradient descent : Ihler and Fisher 03 If T is sufficient for theta, then T is sufficient for H independent of the distribution of theta!!!!!!!!!!!!!! I hate icassp

Swiss Roll PCA 2D Projection MaxKL 2D Optimization 3D Data
So now let’s differentiate between the purely geometric versus statistical point of view and look at a well known manifold example – the Swiss Roll We just want to show our dimensionality reduction approach (i.e optimizing the divergence over T) can largely capture the interesting structure of this geometric object. I’m comparing it to a standard PCA approach. 3D Data

Measuring significance
Lastly, I’ll briefly describe significance. It turns we can again use the standard permutation trick for assessing significance. This was the original motivation for permutations in the statistics community. p-value

Low Dim Latent Var Dependency via
Synthetic data Noise in High Dim Space High Dim Obs Distracter Low Dim Latent Var Dependency via M: Controls that number of dimensions dependency info is uniformly distributed over D: Controls the total dimensionality of our K observations We will start with some synthetic data. This high dimensional data x is generated from some low dimensional latent variable theta that captures all the dependency information through this joint distribution. This latent variable is linearly projected into the high dimensional space and added to a distracter as well as some noise.

Experiments 100 Trial w/ Samples of Dependent Data
100 Trials w/ Samples of Independent Data Each trial gives a statistic and significance p-value

Gaussian Data Bottom pink is given the optimal linear projection/sufficient statistic and measuring depdence in that optimal space The green line is using optimization to learn a linear projection And the blue is measuring dependency in the original high dimension space.

Gaussian

3D Ball Data

Significance Results

Multi-camera

Conclusions We presented a method for estimating statistical dependency across high-dimensional measurements via factorization tests. Exploited a bound on lower dimensional projections. We made use of permutations for drawing from the alternate hypothesis given a single realization. We also made use of permutations to get reliable significance estimates. This was done using a small number of samples relative to the dimensionality of the data Finally we presented some brief analysis on synthetic and real data.

Thank You Questions?

Problem Statement Given N i.i.d. observations for K sources
Determine if the K sources are independent or not: Obtain a dependency measure Estimate the significance of this measurement So here we just introduce some notation. We have N i.id. Observations from K data sources. A particualr observation is represented by this x sub n, and will we use superscripts to distinguish individual data sources.

Applications

Hypothesis Test Two Hypotheses: Assuming we know the distributions:
Given N i.i.d. observations:

Factorization Test Two Factorizations:
But we don’t we know the distributions: Our best approximation (like GLR): Notation Simplification:

Factorization Test (cont)
True Joint Dist Est Joint True Independent Dist Est Prod Est Joint True Independent Dist Est Prod True Independent Dist

Significance

Applications What Vision Problems Can We Solve w/ Accurate Measures of Dependency? Data Association, Correspondence Feature Selection Learning Structure More specifically we explicit address the following problems . The following have dependency has their main focus… We will specifically discuss: Correspondence (for multi-camera tracking) Audio-visual Association

Audio-Visual Association
Useful For: Speaker Localization - Help improve Human-Computer Interaction - Help Source Separation Automatic Transcription of Archival Video - Who is speaking? - Are they seen by the camera? I have been interested in Audio-visual association… Take alook at this video… who is speaking… now focus on the first person.. And raise your hand when he is speaking… So we see that even this a simple problem, but is also not so easy.. .but at it’s core is measure whether or not a single audio stream belongs to any of the video segements.. There are lots of complex things going on.. But how much work do we have to do to answer this simple question.. Hi.. I’m michael and I’m interested in multimodal data association.. Specifically for my masters I wroked on audio-visual data assocation. Take for example this toy problem… if we had audio, our task would be to identify which, if any of these videos lips is associated with the audio. This task is not so hard for humans and we would like the computer to be able to do it.. We have some basic questions, like how we should measure this assocaition and how well we can do with and without a model of human speech.. Ie. Treating it as a generic data association problem or using domain specific knowledge.

Multi-camera Tracking

Hypotheses Camera X Camera Y VS

Maximal Correspondence

Distributions of Transition Times

Discussion and Future Work
Dependence underlies various vision related problems. We studied a framework for measuring dependence. Measure significance (how confident are you) Make it more robust.

Math (oh no!) For 2 variable case

Outline Applications: (for computer vision)
Problem Formulation: (Hypothesis Testing) Computation: (Non-parametric entropy estimation) Curse of Dimensionality: (Informative Statistics) Correspondence: (Markov Chain Monte Carlo)

Question is not how to measure it .. It is that you should measure it.
What does all this mean.. 1 quesiton Are there principle ways of assessing dependency without explicitly choosing a model.

Previous Talks Greg: Model dependence between features and class
Kristen: Model dependence between features and a scene Ariadna: Model dependency between intra-class features Wanmei: Dependency between protocol signal and voxel response Chris: Audio and video dependence with events Antonio: Contextual Dependence Corey: “Inferring Dependencies” We Should understand the tools before we use them. Right Everyone? Certain things come up, KL Divergence, Measruing Correlation, Details about Density Estimation, some people throw some information theory at you. Devil is in the details.. .seems like everyone is worrying more about the specific details so we are going to explore the more general problem formuation. … our stuff is more directly related. (clustering, classification are other tools everyone uses) Some of these have a precise definition of dependence and a particular model, while others it’s a little more fuzzy and measuring dependence may just be some preprocessing step to setup the problem. The point is dependcy comes up over and over again and we would like a precise way to discuss it and some well understood tools to use to characterize dependence. Most people are comfortable with discusisng tools for classification or clustering… we want to be just as comfortable with discussing dependency.. .and particularly for our problems where charactierizing dependcy is the focus. Fundametnal question.. What does it mean to assess dependency We need to define it.. And learn how to compute At the end.. What is the strength and the nature of dependence We are dealing with problems with measurements that are high dimensional… probablistic models don’t fit into nice parameteric families..

Estimating Dependency and Significance for High-Dimensional Data

Similar presentations

Presentation on theme: "Estimating Dependency and Significance for High-Dimensional Data"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Estimating Dependency and Significance for High-Dimensional Data

Similar presentations

Presentation on theme: "Estimating Dependency and Significance for High-Dimensional Data"— Presentation transcript:

Similar presentations

About project

Feedback