Dependence Dependence = NOT Independent Only 1 way to be independent

Slides:



Advertisements
Similar presentations
Feature Selection as Relevant Information Encoding Naftali Tishby School of Computer Science and Engineering The Hebrew University, Jerusalem, Israel NIPS.
Advertisements

CHAPTER 21 Inferential Statistical Analysis. Understanding probability The idea of probability is central to inferential statistics. It means the chance.
Neuroinformatics 1: review of statistics Kenneth D. Harris UCL, 28/1/15.
1 Graphical Models in Data Assimilation Problems Alexander Ihler UC Irvine Collaborators: Sergey Kirshner Andrew Robertson Padhraic Smyth.
Maximum likelihood (ML) and likelihood ratio (LR) test
Maximum likelihood Conditional distribution and likelihood Maximum likelihood estimations Information in the data and likelihood Observed and Fisher’s.
Lecture 5: Learning models using EM
Dimensional reduction, PCA
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Maximum likelihood (ML) and likelihood ratio (LR) test
Elementary hypothesis testing Purpose of hypothesis testing Type of hypotheses Type of errors Critical regions Significant levels Hypothesis vs intervals.
Independent Component Analysis (ICA) and Factor Analysis (FA)
Computer vision: models, learning and inference Chapter 10 Graphical Models.
Clustering with Bregman Divergences Arindam Banerjee, Srujana Merugu, Inderjit S. Dhillon, Joydeep Ghosh Presented by Rohit Gupta CSci 8980: Machine Learning.
Chapter 9 Title and Outline 1 9 Tests of Hypotheses for a Single Sample 9-1 Hypothesis Testing Statistical Hypotheses Tests of Statistical.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Presented By Wanchen Lu 2/25/2013
Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Hebrew University.
General Tensor Discriminant Analysis and Gabor Features for Gait Recognition by D. Tao, X. Li, and J. Maybank, TPAMI 2007 Presented by Iulian Pruteanu.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
EMIS 7300 SYSTEMS ANALYSIS METHODS FALL 2005 Dr. John Lipp Copyright © Dr. John Lipp.
ECE 471/571 – Lecture 6 Dimensionality Reduction – Fisher’s Linear Discriminant 09/08/15.
Chapter 8. Learning of Gestures by Imitation in a Humanoid Robot in Imitation and Social Learning in Robots, Calinon and Billard. Course: Robots Learning.
Lecture 3: MLE, Bayes Learning, and Maximum Entropy
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Stats Term Test 4 Solutions. c) d) An alternative solution is to use the probability mass function and.
Density Estimation in R Ha Le and Nikolaos Sarafianos COSC 7362 – Advanced Machine Learning Professor: Dr. Christoph F. Eick 1.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Overview G. Jogesh Babu. R Programming environment Introduction to R programming language R is an integrated suite of software facilities for data manipulation,
Lecture Slides Elementary Statistics Twelfth Edition
Advanced Data Analytics
Introduction to Sampling based inference and MCMC
Course Review Questions will not be all on one topic, i.e. questions may have parts covering more than one area.
12. Principles of Parameter Estimation
LECTURE 11: Advanced Discriminant Analysis
Parameter Estimation and Fitting to Data
Unit 3 Hypothesis.
A Forest of Sensors: Using adaptive tracking to classify and monitor activities in a site Eric Grimson AI Lab, Massachusetts Institute of Technology
Statistical Data Analysis - Lecture10 26/03/03
Statistical Models for Automatic Speech Recognition
Motion Detection And Analysis
Basic Probability Theory
Course: Autonomous Machine Learning
Statistical Learning Dong Liu Dept. EEIS, USTC.
Introduction Artificial Intelligent.
Unfolding Problem: A Machine Learning Approach
Estimating Dependency and Significance for High-Dimensional Data
9 Tests of Hypotheses for a Single Sample CHAPTER OUTLINE
Bayesian Models in Machine Learning
ECE539 final project Instructor: Yu Hen Hu Fall 2005
Chapter 11: The ANalysis Of Variance (ANOVA)
Statistical Models for Automatic Speech Recognition
Chapter 9: Hypothesis Tests Based on a Single Sample
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Instructors: Fei Fang (This Lecture) and Dave Touretzky
Presented by Nagesh Adluru
An introduction to Graphical Models – Michael Jordan
Connecting Data with Domain Knowledge in Neural Networks -- Use Deep learning in Conventional problems Lizhong Zheng.
Hypothesis Testing.
≠ Particle-based Variational Inference for Continuous Systems
Generally Discriminant Analysis
Essential Statistics Two-Sample Problems - Two-sample t procedures -
Nonparametric Hypothesis Tests for Dependency Structures
Learning From Observed Data
12. Principles of Parameter Estimation
Report Yang Zhang.
Lecture 4: Likelihoods and Inference
Lecture 4: Likelihoods and Inference
Hairong Qi, Gonzalez Family Professor
CS249: Neural Language Model
Presentation transcript:

Dependence Dependence = NOT Independent Only 1 way to be independent VS Fix this

Estimating Dependency and Significance for High-Dimensional Data Michael R. Siracusa* Kinh Tieu*, Alexander T. Ihler §, John W. Fisher *§, Alan S. Willsky § * Computer Science and Artificial Intelligence Laboratory § Laboratory for Information and Decision Systems

Applications

Problem Statement Given N i.i.d. observations for K sources Determine if the K sources are independent or not by Calculating some dependency measure Estimating the significance of this measurement

Hypothesis Test Two Hypotheses: Assuming we know the distributions: Given N i.i.d. observations:

Factorization Test Two Factorizations: But we don’t we know the distributions: Our best approximation (like GLR): Given N i.i.d. observations:

Factorization Test (cont) Given N i.i.d. observations: True Joint Dist True Independent Dist True Independent Dist True Independent Dist

Factorization Test (cont) For 2 variable case For 2 variable Gaussian case In general: Questions: How do we do density estimation? Can we compute this value when x is high dimensional How do we make our decision between F1 and F0

Sample Based Density Estimates

High Dimensional Data From the Data Processing Inequality: VS

High Dimensional Data (cont) Sufficiency: For High dimensional data Maximize left side of bound Gaussian w/ Linear Projections Close form solution (Eigenvalue problem): Kullback 68 Nonparametric Gradient descent : Ihler and Fisher 03

Swiss Roll PCA 2D Projection MaxKL 2D Optimization 3D Data

Significance

More significance

Low Dim Latent Var Dependency via Synthetic data Noise in High Dim Space High Dim Obs Distracter Low Dim Latent Var Dependency via M: Controls that number of dimensions dependency info is uniformly distributed over D: Controls the total dimensionality of our K observations

Experiments 100 Trial w/ Samples of Dependent Data 100 Trials w/ Samples of Independent Data Each trial gives a statistic and significance pf

Gaussian Data

Gaussian rho = .75

Three D Ball Data

Significance: Permutations good

Multi-camera

Conclusions Nice General Framework Permutations allow us to draw independent samples Have shown cases where Gaussian assumptions will fail, and PCA is no good for dimensionality reduction

Future Work More experiments on real data Better optimization procedure

Applications What Vision Problems Can We Solve w/ Accurate Measures of Dependency? Data Association, Correspondence Feature Selection Learning Structure More specifically we explicit address the following problems . The following have dependency has their main focus… We will specifically discuss: Correspondence (for multi-camera tracking) Audio-visual Association

Audio-Visual Association Useful For: Speaker Localization - Help improve Human-Computer Interaction - Help Source Separation Automatic Transcription of Archival Video - Who is speaking? - Are they seen by the camera? I have been interested in Audio-visual association… Take alook at this video… who is speaking… now focus on the first person.. And raise your hand when he is speaking… So we see that even this a simple problem, but is also not so easy.. .but at it’s core is measure whether or not a single audio stream belongs to any of the video segements.. There are lots of complex things going on.. But how much work do we have to do to answer this simple question.. Hi.. I’m michael and I’m interested in multimodal data association.. Specifically for my masters I wroked on audio-visual data assocation. Take for example this toy problem… if we had audio, our task would be to identify which, if any of these videos lips is associated with the audio. This task is not so hard for humans and we would like the computer to be able to do it.. We have some basic questions, like how we should measure this assocaition and how well we can do with and without a model of human speech.. Ie. Treating it as a generic data association problem or using domain specific knowledge.

Multi-camera Tracking

Hypotheses Camera X Camera Y VS

Maximal Correspondence

Distributions of Transition Times

Discussion and Future Work Dependence underlies various vision related problems. We studied a framework for measuring dependence. Measure significance (how confident are you) Make it more robust.

Math (oh no!) For 2 variable case

Outline Applications: (for computer vision) Problem Formulation: (Hypothesis Testing) Computation: (Non-parametric entropy estimation) Curse of Dimensionality: (Informative Statistics) Correspondence: (Markov Chain Monte Carlo)

Question is not how to measure it .. It is that you should measure it. What does all this mean.. 1 quesiton Are there principle ways of assessing dependency without explicitly choosing a model.

Previous Talks Greg: Model dependence between features and class Kristen: Model dependence between features and a scene Ariadna: Model dependency between intra-class features Wanmei: Dependency between protocol signal and voxel response Chris: Audio and video dependence with events Antonio: Contextual Dependence Corey: “Inferring Dependencies” We Should understand the tools before we use them. Right Everyone? Certain things come up, KL Divergence, Measruing Correlation, Details about Density Estimation, some people throw some information theory at you. Devil is in the details.. .seems like everyone is worrying more about the specific details so we are going to explore the more general problem formuation. … our stuff is more directly related. (clustering, classification are other tools everyone uses) Some of these have a precise definition of dependence and a particular model, while others it’s a little more fuzzy and measuring dependence may just be some preprocessing step to setup the problem. The point is dependcy comes up over and over again and we would like a precise way to discuss it and some well understood tools to use to characterize dependence. Most people are comfortable with discusisng tools for classification or clustering… we want to be just as comfortable with discussing dependency.. .and particularly for our problems where charactierizing dependcy is the focus. Fundametnal question.. What does it mean to assess dependency We need to define it.. And learn how to compute At the end.. What is the strength and the nature of dependence We are dealing with problems with measurements that are high dimensional… probablistic models don’t fit into nice parameteric families..