On optimal quantization rules for some sequential decision problems by X. Nguyen, M. Wainwright & M. Jordan Discussion led by Qi An ECE, Duke University.

Slides:



Advertisements
Similar presentations
Lecture XXIII.  In general there are two kinds of hypotheses: one concerns the form of the probability distribution (i.e. is the random variable normally.
Advertisements

CHAPTER 8 More About Estimation. 8.1 Bayesian Estimation In this chapter we introduce the concepts related to estimation and begin this by considering.
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
Bayesian Decision Theory
Ai in game programming it university of copenhagen Statistical Learning Methods Marco Loog.
Planning under Uncertainty
J. Mike McHugh,Janusz Konrad, Venkatesh Saligrama and Pierre-Marc Jodoin Signal Processing Letters, IEEE Professor: Jar-Ferr Yang Presenter: Ming-Hua Tang.
Maximum likelihood (ML) and likelihood ratio (LR) test
Location Estimation in Sensor Networks Moshe Mishali.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Evaluation.
Evaluating Hypotheses
Statistical Background
Introduction to Boosting Aristotelis Tsirigos SCLT seminar - NYU Computer Science.
Visual Recognition Tutorial
Inference about a Mean Part II
Inferences About Process Quality
Maximum likelihood (ML)
Maximum Entropy Model & Generalized Iterative Scaling Arindam Bose CS 621 – Artificial Intelligence 27 th August, 2007.
Radial Basis Function Networks
Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.
Hypothesis Testing – Introduction
MAKING COMPLEX DEClSlONS
Sequential Detection Overview & Open Problems George V. Moustakides, University of Patras, GREECE.
Principles of Pattern Recognition
1 Power and Sample Size in Testing One Mean. 2 Type I & Type II Error Type I Error: reject the null hypothesis when it is true. The probability of a Type.
Statistical Decision Theory
Prof. Dr. S. K. Bhattacharjee Department of Statistics University of Rajshahi.
Harrison B. Prosper Workshop on Top Physics, Grenoble Bayesian Statistics in Analysis Harrison B. Prosper Florida State University Workshop on Top Physics:
1 ECE-517 Reinforcement Learning in Artificial Intelligence Lecture 7: Finite Horizon MDPs, Dynamic Programming Dr. Itamar Arel College of Engineering.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
Gregory Gurevich and Albert Vexler The Department of Industrial Engineering and Management, SCE- Shamoon College of Engineering, Beer-Sheva 84100, Israel.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
Hypotheses tests for means
Computational Intelligence: Methods and Applications Lecture 12 Bayesian decisions: foundation of learning Włodzisław Duch Dept. of Informatics, UMK Google:
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Bayesian Classification. Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities.
1 E. Fatemizadeh Statistical Pattern Recognition.
Chin-Yu Huang Department of Computer Science National Tsing Hua University Hsinchu, Taiwan Optimal Allocation of Testing-Resource Considering Cost, Reliability,
Clustering and Testing in High- Dimensional Data M. Radavičius, G. Jakimauskas, J. Sušinskas (Institute of Mathematics and Informatics, Vilnius, Lithuania)
Quickest Detection of a Change Process Across a Sensor Array Vasanthan Raghavan and Venugopal V. Veeravalli Presented by: Kuntal Ray.
Chapter 2 Statistical Background. 2.3 Random Variables and Probability Distributions A variable X is said to be a random variable (rv) if for every real.
Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 07: BAYESIAN ESTIMATION (Cont.) Objectives:
Fall 2002Biostat Statistical Inference - Confidence Intervals General (1 -  ) Confidence Intervals: a random interval that will include a fixed.
26134 Business Statistics Tutorial 11: Hypothesis Testing Introduction: Key concepts in this tutorial are listed below 1. Difference.
Conditional Probability Mass Function. Introduction P[A|B] is the probability of an event A, giving that we know that some other event B has occurred.
CHAPTER 10 Widrow-Hoff Learning Ming-Feng Yeh.
1 Random Disambiguation Paths Al Aksakalli In Collaboration with Carey Priebe & Donniell Fishkind Department of Applied Mathematics and Statistics Johns.
Lecture 3: MLE, Bayes Learning, and Maximum Entropy
ELEC 303 – Random Signals Lecture 17 – Hypothesis testing 2 Dr. Farinaz Koushanfar ECE Dept., Rice University Nov 2, 2009.
Smart Sleeping Policies for Wireless Sensor Networks Venu Veeravalli ECE Department & Coordinated Science Lab University of Illinois at Urbana-Champaign.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University.
Markov-Chain-Monte-Carlo (MCMC) & The Metropolis-Hastings Algorithm P548: Intro Bayesian Stats with Psych Applications Instructor: John Miyamoto 01/19/2016:
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Mixture Densities Maximum Likelihood Estimates.
- A Maximum Likelihood Approach Vinod Kumar Ramachandran ID:
1 Minimum Bayes-risk Methods in Automatic Speech Recognition Vaibhava Geol And William Byrne IBM ; Johns Hopkins University 2003 by CRC Press LLC 2005/4/26.
Optimal Decision-Making in Humans & Animals Angela Yu March 05, 2009.
Tutorial 11: Hypothesis Testing
Chapter 7. Classification and Prediction
Task: It is necessary to choose the most suitable variant from some set of objects by those or other criteria.
Hypothesis Testing – Introduction
Course: Autonomous Machine Learning
CONCEPTS OF HYPOTHESIS TESTING
Statistical Learning Dong Liu Dept. EEIS, USTC.
A Non-Parametric Bayesian Method for Inferring Hidden Causes
13. Acting under Uncertainty Wolfram Burgard and Bernhard Nebel
Machine Learning: Lecture 6
Machine Learning: UNIT-3 CHAPTER-1
Information Theoretical Analysis of Digital Watermarking
Presentation transcript:

On optimal quantization rules for some sequential decision problems by X. Nguyen, M. Wainwright & M. Jordan Discussion led by Qi An ECE, Duke University

Outline Introduction Background Approximate design Suboptimality of stationary designs Discussion

Introduction What’s decentralized detection problem? –A hypothesis problem in which the decision maker have no direct access to the raw data, but instead must infer based only on a set of local quantization functions and a sequence of summary statistics. What’s the goal? –The goal is to design both the local quantization functions and to specify a global decision rules so as to predict the underlying hypothesis H in a manner that optimally trades off accuracy and delay

Introduction For general framework of sequential decentralized problems, Veeravalli defined five problems (case A to E) distinguished by amount of information available to the local sensors. Veeravalli use likelihood ratio test to solve the case E, where each local sensor has its current data, O n, as well as the summary statistics form all of the other local sensor.

In Veeravalli’s paper, he conjectured that stationary local decision functions may actually be optimal for case A, where neither local memory nor feedback are assumed to be available. In this paper, the author shows that stationary decision functions are, in fact, not optimal for decentralized problem of case A. S0S0 S1S1 StSt …… U 1 n =φ n (O 1 n ) O1nO1n OtnOtn U t n =φ n (O t n ) Ĥ=γ n (U 1 1,…,U t 1,U 1 2,…,U t 2,…,U 1 n,…,U t n ) Usually, O ∈ {1,…,M} and U ∈ {1,…,K}, where M>>K quantization functions decision rule fusion center Let X n =[O 1 n,…,O t n ] and Z n =[U 1 n,…,U t n ].

Sequential detection Consider a general centralized sequential detection problem. Let P 0 and P 1 represent the class-conditional distributions of Z, when conditioned on {H=0} and {H=1}, and and are corresponding density functions. Focusing the Bayesian formulation, we let and denote the prior probabilities of the two hypothesis The cost function we want to minimize is a weighted sum cost per step stopping time We want to choose the pair so as to minimized the expected loss

Dynamic programming The previous centralized sequential detection can be solved by DP approach by iteratively updating the cost function over horizon. But it is not straightforward to apply DP approach to decentralized versions of sequential detection.

Wald’s approximation The optimal stopping rule for the cost function takes the form: The optimal decision rule has the form: Let’s define two types of errors

Wald’s approximation The cost function of the decision rule based on envelop a and b can be written as where and It is hard to calculate this cost function but it can be easily approximated. The errors α,β are related to a and b by the classical inequalities

Wald’s approximation If we ignore the overshoot and replace the inequalities with the equalities, i.e. where we can therefore get the approximate cost function:

Approximate design Now consider the decentralized setting. Given a fixed stationary quantizer φ, Wald’s approximation suggests the following strategy: –For a given set of admissible errors α and β, first assign the values of thresholds and –Then use the quantity G(α,β) as an approximation to the true cost J(a,b). By assuming, the author proves

Suboptimallity of the stationary design It was shown by Tsitsiklis that optimal quantizers φ n take the form of threshold rules based on the likelihood ratio Veeravalli asked whether these rules can be taken to be stationary, a problem that has remained open.

Suboptimallity of the stationary design A simple counterexample to show the optimal quantizer is not stationary Consider a problem in which and the conditional distributions are Suppose the prior are and, and the cost for each step is If we restrict to binary quantizers, then there are only three possible quantizers 1. Design A:. As a result, the pdf for Z n is 2. Design B:. As a result, the pdf is 3. Design C:. As a result, the pdf is

Suboptimallity of the stationary design Where * is corresponding to the nonstationary design by applying design A for only the first step and applying the design B for the remaining steps.

Asymptotic suboptimality of stationary designs As we can see in the approximate cost function, it is composed of two KL divergences. If we want to achieve a small cost we need to choose a quantizer φ which make both divergences, μ 1 and μ 0, as large as possible.

Discussion The problem of decentralized sequential detection encompasses a wide range of problems involving different assumptions about the memory and feedback. They have provided an asymptotic characterization of the cost of the optimal sequential test in the setting of case A. They provided an explicit counterexample to the stationary conjecture and showed that under some conditions there is a guaranteed range of prior probabilities for which stationary strategies are suboptimal.