Nonparametric Hypothesis Tests for Dependency Structures

Slides:



Advertisements
Similar presentations
Chapter 18: The Chi-Square Statistic
Advertisements

Feature Selection as Relevant Information Encoding Naftali Tishby School of Computer Science and Engineering The Hebrew University, Jerusalem, Israel NIPS.
Contingency Tables Chapters Seven, Sixteen, and Eighteen Chapter Seven –Definition of Contingency Tables –Basic Statistics –SPSS program (Crosstabulation)
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
1 Graphical Models in Data Assimilation Problems Alexander Ihler UC Irvine Collaborators: Sergey Kirshner Andrew Robertson Padhraic Smyth.
Analysis of Variance. Experimental Design u Investigator controls one or more independent variables –Called treatment variables or factors –Contain two.
Independent Component Analysis (ICA) and Factor Analysis (FA)
Maximum-Likelihood estimation Consider as usual a random sample x = x 1, …, x n from a distribution with p.d.f. f (x;  ) (and c.d.f. F(x;  ) ) The maximum.
Chapter 11 Multiple Regression.
Chapter 9: Introduction to the t statistic
The Practice of Social Research
General Linear Model & Classical Inference Guillaume Flandin Wellcome Trust Centre for Neuroimaging University College London SPM M/EEGCourse London, May.
9. Binary Dependent Variables 9.1 Homogeneous models –Logit, probit models –Inference –Tax preparers 9.2 Random effects models 9.3 Fixed effects models.
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Statistical Analysis A Quick Overview. The Scientific Method Establishing a hypothesis (idea) Collecting evidence (often in the form of numerical data)
MURI: Integrated Fusion, Performance Prediction, and Sensor Management for Automatic Target Exploitation 1 Optimal, Robust Information Fusion in Uncertain.
Combining Statistical Language Models via the Latent Maximum Entropy Principle Shaojum Wang, Dale Schuurmans, Fuchum Peng, Yunxin Zhao.
MURI: Integrated Fusion, Performance Prediction, and Sensor Management for Automatic Target Exploitation 1 Dynamic Sensor Resource Management for ATE MURI.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
Three Frameworks for Statistical Analysis. Sample Design Forest, N=6 Field, N=4 Count ant nests per quadrat.
Question paper 1997.
M.Sc. in Economics Econometrics Module I Topic 4: Maximum Likelihood Estimation Carol Newman.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.
Chapter 9: Introduction to the t statistic. The t Statistic The t statistic allows researchers to use sample data to test hypotheses about an unknown.
Estimation Econometría. ADE.. Estimation We assume we have a sample of size T of: – The dependent variable (y) – The explanatory variables (x 1,x 2, x.
Methods of multivariate analysis Ing. Jozef Palkovič, PhD.
Canadian Bioinformatics Workshops
Stochasticity and Probability. A new approach to insight Pose question and think of the answer needed to answer it. Ask: How do the data arise? What is.
Methods of Presenting and Interpreting Information Class 9.
Hypothesis Tests l Chapter 7 l 7.1 Developing Null and Alternative
Multiple Random Variables and Joint Distributions
John Fisher, Alexander Ihler, Jason Williams , Alan Willsky
LECTURE 11: Advanced Discriminant Analysis
ANOVA Econ201 HSTS212.
INTRODUCTORY STATISTICS FOR CRIMINAL JUSTICE
An Introduction to Two-Way ANOVA
MURI Annual Review Meeting Randy Moses November 3, 2008
Innovative Front-End Signal Processing
Dynamical Statistical Shape Priors for Level Set Based Tracking
3.1 Clustering Finding a good clustering of the points is a fundamental issue in computing a representative simplicial complex. Mapper does not place any.
Introduction to Inferential Statistics
Data Analysis for Two-Way Tables
Estimating Dependency and Significance for High-Dimensional Data
Dependence Dependence = NOT Independent Only 1 way to be independent
Hidden Markov Models Part 2: Algorithms
Bayesian Models in Machine Learning
Stat 217 – Day 28 Review Stat 217.
Filtering and State Estimation: Basic Concepts
Contrasts & Statistical Inference
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
I. Statistical Tests: Why do we use them? What do they involve?
Pairwise Sequence Alignment (cont.)
Integration of sensory modalities
Dimension reduction : PCA and Clustering
Psych 231: Research Methods in Psychology
Markov Random Fields Presented by: Vladan Radosavljevic.
Dr. Debaleena Chattopadhyay Department of Computer Science
Parametric Methods Berlin Chen, 2005 References:
Psych 231: Research Methods in Psychology
Psych 231: Research Methods in Psychology
Contrasts & Statistical Inference
Mathematical Foundations of BME Reza Shadmehr
Chapter 18: The Chi-Square Statistic
Contrasts & Statistical Inference
Lecture 4: Likelihoods and Inference
Lecture 4: Likelihoods and Inference
CS249: Neural Language Model
How to win big by thinking straight about relatively trivial problems
Outline Texture modeling - continued Markov Random Field models
Presentation transcript:

Nonparametric Hypothesis Tests for Dependency Structures John Fisher MIT CSAIL/LIDS Alexander Ihler, Alan Willsky

Do these depend on each other (and how)?

Motivation/Problem Domain Heterogenous sensors. E/O, I/R, acoustic, MRI, fMRI, etc. Need to perform local fusion to support global inference. How do we aggregate localized measurements of distributed phenomenon? How do we do principled statistical inference when the model is only partially specified? Critical need to understand statistical relationships between sensor outputs in the face of many modes of uncertainty (sensors, scene, geometry, etc).

Observations When asking simple questions we can use simple/approximate models. Performance and data fusion complexity is dictated by the complexity of the hypotheses. Many problems involving data fusion can be cast as hypotheses between graphical models. In such cases the problem decomposes into terms related to statistical dependency versus modeling assumptions

AV Association at the Signal Level Sounds and motions which are consistent can be attributed to a common cause Question: How do we quantify consistent? consistent inconsistent

Data Association as a Hypothesis Test Information theoretic quantities such as mutual information, Kullback-Leibler divergence arise naturally in the context hypothesis testing vs. 0.61

A/V Association as a Hypothesis Test Assuming independent sources, hypotheses are of the form Asymptotic comparison of known models to those estimated from a single realization

Asymptotics of Likelihood Ratio Decomposes into two sets of terms: Statistical dependencies (groupings) Differences in model parameterizations

Asymptotics of Likelihood Ratio If we estimate from a single realization: Statistical dependence terms remain Model divergences go away

High Dimensional Data Learn low-dimensional auxiliary variables which summarize statistical dependency of measurements

AV Association/Correspondence association matrix for 8 subjects 0.68 0.61 Table contains MI (equivalently Likelihood) for each possible association Hypothesis over any two possible associations is difference of terms in table 0.19 0.20

AV Association/Correspondence association matrix for 8 subjects 0.68 0.61 Hungarian algorithm (or Auction algorithms) provides efficient means for evaluating all possible associations. MI scores are the natural statistic to fill the table with. Note that the complexity in filling the table is N^2 in the number of sources. 0.19 0.20

General Structure Tests Generalization to hypothesis tests over graphical structures How are observations related to each other? vs vs

General Structure Tests Intersection Sets - groupings on which the hypotheses agree H1 vs H2 Nominal set algebra -most sets are empty -at most, D (the number of variables) are non-empty -pigeon-hole, variables can appear in at most one set – complexity is order D^2

General Structure Tests Asymptotics have a similar decomposition as in the 2-variable case (via the intersection sets):

General Structure Tests Extension of previous description data association is straightforward for such tests. Estimation from a single realization incurs a reduction in separability only in terms of the model difference terms. The “curse of dimensionality” (with respect to density estimation) arises in 2 ways: Individual measurements may be of high dimension Could still design low dimensional auxiliary variables The number of variables in a group New results provide a solution

General Structure Tests The test implies potentially 6 joint densities, but is simplified by looking at the intersection sets. H1 H2

General Structure Tests High dimensional variables learning auxiliary variables reduces dimensionality in one aspect. But we would still have to estimate a 3 dimensional density. This only gets worse with larger groupings.

K-L Divergence with Permutations Simple idea which mitigates many of the dimensionality issues. Exploits the fact that the structures are distinguished by their groupings of variables. Key Ideas: Permuting sample order between groupings maintains the statistical dependency structure. D(X||Y) >= D(f(X)||f(Y)) This has the advantage that we can design a single (possibly vector-valued) function of all variables rather than one function for each variable.

K-L Divergence with Permutations f

High-Dimensional, Multi-modal Data Association (a) grayscale (c) hue Top row is what the “sensor” sees. Bottom row is original image to make it easier for audience to see which image goes with which. (b) grayscale (d) img diff

High-Dimensional, Multi-modal Data Association (a) grayscale (c) hue Top row is what the “sensor” sees. Bottom row is original image to make it easier for audience to see which image goes with which. (b) grayscale (d) img diff

Data Association Example LLR estimates for pair-wise associations (left) Compared to the distribution over the null hypothesis Distribution of full association (middle) Incorrect association likelihood shows some global scene dependence (e.g. due to common lighting changes)

More General Structures Analysis has been extended to comparisons between triangulated graphs. Can be expressed as sums and differences of product terms. Admits a wide class of Markov processes.

Modeling Group Interactions Object 3 tries to interpose itself between objects 1 and 2. The graph describes the state (position) dependency structure.

Modeling Group Interactions

Association vs Generative Models One instantiation of IT fusion approach is equivalent to learning a latent variable model of the audio video measurements. Random variables: Parameters, appearance bases: Simultaneously learn statistics of joint audio/video variables and parameters as the statistic of association (consistent with the theory)

Incorporating Nuisance Parameters Extension of multi-modal fusion to include nuisance parameters Audio is an indirect pointer to the object of interest. Combine motion model (nuisance parameters) with audio-video appearance model.

Incorporating Motion Parameters without motion model example frames average image with motion model

High-Dimensional, Multi-modal Data Association grayscale hue grayscale img diff Top row is what the “sensor” sees. Bottom row is original image to make it easier for audience to see which image goes with which. (a) (c) (b) (d)

High-Dimensional, Multi-modal Data Association grayscale hue grayscale img diff Top row is what the “sensor” sees. Bottom row is original image to make it easier for audience to see which image goes with which. (a) (c) (b) (d)