John Rice University of California, Berkeley “ All animals act as if they can make decisions. ” (I. J. Good) Joint work with Peter Bickel and Bas Kleijn.

Slides:

Advertisements

Similar presentations

Pattern Recognition and Machine Learning

Advertisements

Pattern Recognition and Machine Learning

Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.

Lecture XXIII.  In general there are two kinds of hypotheses: one concerns the form of the probability distribution (i.e. is the random variable normally.

ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.

CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.

Likelihood Ratio, Wald, and Lagrange Multiplier (Score) Tests

Chapter 7 Title and Outline 1 7 Sampling Distributions and Point Estimation of Parameters 7-1 Point Estimation 7-2 Sampling Distributions and the Central.

Chap 8: Estimation of parameters & Fitting of Probability Distributions Section 6.1: INTRODUCTION Unknown parameter(s) values must be estimated before.

N.D.GagunashviliUniversity of Akureyri, Iceland Pearson´s χ 2 Test Modifications for Comparison of Unweighted and Weighted Histograms and Two Weighted.

Bayesian Model ChoiceSCMA IV1 Current Challenges in Bayesian Model Choice: Comments William H. Jefferys Department of Astronomy University of Texas at.

Random Walks and BLAST Marek Kimmel (Statistics, Rice)

Visual Recognition Tutorial

Copyright © Cengage Learning. All rights reserved. 9 Inferences Based on Two Samples.

Sampling Distributions

Evaluating Hypotheses

Machine Learning CMPT 726 Simon Fraser University

Inferences About Process Quality

Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 1 Evaluating Hypotheses.

Chapter 9 Hypothesis Testing.

CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.

Inferential Statistics

The Neymann-Pearson Lemma Suppose that the data x 1, …, x n has joint density function f(x 1, …, x n ;  ) where  is either  1 or  2. Let g(x 1, …,

Chapter 9 Title and Outline 1 9 Tests of Hypotheses for a Single Sample 9-1 Hypothesis Testing Statistical Hypotheses Tests of Statistical.

Overview G. Jogesh Babu. Probability theory Probability is all about flip of a coin Conditional probability & Bayes theorem (Bayesian analysis) Expectation,

ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:

Copyright © Cengage Learning. All rights reserved. 9 Inferences Involving One Population.

Bayesian parameter estimation in cosmology with Population Monte Carlo By Darell Moodley (UKZN) Supervisor: Prof. K Moodley (UKZN) SKA Postgraduate conference,

1 CSI5388: Functional Elements of Statistics for Machine Learning Part I.

Model Inference and Averaging

Non-Linear Models. Non-Linear Growth models many models cannot be transformed into a linear model The Mechanistic Growth Model Equation: or (ignoring.

Chapter 8 Introduction to Hypothesis Testing

Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.

Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Presented by Chen Yi-Ting.

Stochastic Linear Programming by Series of Monte-Carlo Estimators Leonidas SAKALAUSKAS Institute of Mathematics&Informatics Vilnius, Lithuania

Essential Statistics Chapter 131 Introduction to Inference.

CHAPTER 14 Introduction to Inference BPS - 5TH ED.CHAPTER 14 1.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.

The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.

1 SMU EMIS 7364 NTU TO-570-N Inferences About Process Quality Updated: 2/3/04 Statistical Quality Control Dr. Jerrell T. Stracener, SAE Fellow.

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.

ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.

Maximum Likelihood - "Frequentist" inference x 1,x 2,....,x n ~ iid N( ,  2 ) Joint pdf for the whole random sample Maximum likelihood estimates.

Correlation Assume you have two measurements, x and y, on a set of objects, and would like to know if x and y are related. If they are directly related,

The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.

Overview G. Jogesh Babu. Overview of Astrostatistics A brief description of modern astronomy & astrophysics. Many statistical concepts have their roots.

© Copyright McGraw-Hill 2004

Spectrum Sensing In Cognitive Radio Networks

Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,

Education 793 Class Notes Inference and Hypothesis Testing Using the Normal Distribution 8 October 2003.

Chapter 13- Inference For Tables: Chi-square Procedures Section Test for goodness of fit Section Inference for Two-Way tables Presented By:

Statistics Sampling Distributions and Point Estimation of Parameters Contents, figures, and exercises come from the textbook: Applied Statistics and Probability.

The Unscented Particle Filter 2000/09/29 이 시은. Introduction Filtering –estimate the states(parameters or hidden variable) as a set of observations becomes.

Introduction to Estimation Theory: A Tutorial

Copyright © Cengage Learning. All rights reserved. 9 Inferences Based on Two Samples.

Statistical Methods. 2 Concepts and Notations Sample unit – the basic landscape unit at which we wish to establish the presence/absence of the species.

In Bayesian theory, a test statistics can be defined by taking the ratio of the Bayes factors for the two hypotheses: The ratio measures the probability.

Hypothesis Testing. Statistical Inference – dealing with parameter and model uncertainty  Confidence Intervals (credible intervals)  Hypothesis Tests.

Evaluating Hypotheses. Outline Empirically evaluating the accuracy of hypotheses is fundamental to machine learning – How well does this estimate its.

Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Overview G. Jogesh Babu. R Programming environment Introduction to R programming language R is an integrated suite of software facilities for data manipulation,

Estimating standard error using bootstrap

LECTURE 06: MAXIMUM LIKELIHOOD ESTIMATION

Discrete Event Simulation - 4

Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.

LECTURE 09: BAYESIAN LEARNING

Parametric Methods Berlin Chen, 2005 References:

Event Weighted Tests for Detecting Periodicity in Photon Arrival Times

Threshold Autoregressive

Presentation transcript:

John Rice University of California, Berkeley “ All animals act as if they can make decisions. ” (I. J. Good) Joint work with Peter Bickel and Bas Kleijn. Thanks to Seth Digel, Patrick Nolan and Tom Loredo Detecting Periodicity in Point Processes

Outline Introduction and motivation A family of score tests Assessing significance in a blind search The need for empirical comparisons

Motivation Many gamma-ray sources are unidentified and may be pulsars, but establishing that these sources are periodic is difficult. Might only collect ~1500 photons during a 10 day period.

Difficulties Frequency unknown Spins down Large search space Glitches Celestial foreground Computational demands for a blind search are very substantial. A heroic search did not find any previously unknown gamma-ray pulsars in EGRET data. (Chandler et al, 2001).

Detection problem

Unpleasant fact: There is no optimal test. A detection algorithm optimal for one function will not be optimal for another function. No matter how clever you are, no matter how rich the dictionary from which you adaptively compose a detection statistic, no matter how multilayered your hierarchical prior, your procedure will not be globally optimal. The pulse profile (t) is an infinite dimensional object. Any test can achieve high asymptotic power against local alternatives for at most a finite number of directions. In other words, associated with any particular test is a finite dimensional collection of targets and it is only for such targets that it is highly sensitive. Consequence: You have to be a [closet] Bayesian and choose directions a priori. Lehman & Romano. Testing Statistical Hypotheses. Chapt 14

Specifying a target

Likelihood function and score test Let the point spread function be w(z|e). The likelihood given times (t), energies (e), and locations (z) of photons: Unlike a generalized ratio test, a Rao test does not require fitting parameters under the alternative, but only under the null hypothesis. where w j = w(z j | e j ).. A score test (Rao test) is formed by differentiating the log likelihood with respect to  and evaluating the derivative at  = 0: Neglible if period << T

Phase invariant statistic Square and integrate out phase. Neglecting the second term: Apart from psf-weighting was proposed by Beran as locally most powerful invariant test in the direction ( ) at frequency f. Truncating at n=1 gives Rayleigh test. Truncating at n=M gives Z M 2. Particular choice of coefficients gives Watson’s test (invariant form of Cramer-von Mises). Mardia (1972). Statistics of Directional Data

Relationship to tests based on density estimation In the unweighted case the test statistic can be expressed as A continuous version of a chi-square goodness of fit test using kernel density estimate rather than binning. But note that a kernel for density estimation is usually taken to be sharply peaked and thus have substantial high frequency content! Such a choice of ( ) will not match low frequency targets. Kernel density estimate

Power Let And suppose the signal is and Then

Tradeoffs The n-th harmonic will only contribute to the power if  n is substantial and if n  is small. That is, inclusion of harmonics is only helpful if the signal contains substantial power in those harmonics and if sampling is fine, n  < 1 compared to the spacing of the Fourier frequencies, 1/T; otherwise the cost in variance of including higher harmonics may more than offset potential gains. Viewed from this perspective, tests based on density estimation with a small bandwidth are not attractive unless the light curve has substantial high frequency components and the target frequency is very close to the actual frequency.

Integration versus discretization Rather than fine discretization of frequency, consider integrating the test statistic over a frequency band using a symmetric probability density g(f).

Requires a number of operations quadratic in the number of photons. However the quadratic form can be diagonalized in an eigenfunction expansion, resulting in a number of operations linear in the number of photons. (In the case that g() is uniform, the eigenfunctions are the prolate spheroidal wave functions.) Then Power is still lost in high frequencies unless the support of g is small. This procedure can be extended to integrate over tiles in the plane when MultiTaper

Example: Vela

Assessing significance At a single frequency, significance can be assessed easily through simulation. In a broadband blind search this is not feasible and furthermore one may feel nervous in using the traditional chi-square approximations in the extreme tail (it can be shown that the limiting null distribution of the integrated test statistic is that of a weighted sum of chi-square random variables). We are thus investigating the use of classical extreme value theory in conjunction with affordable simulation.

Gumbel Approximation

Example

Tail Approximations According to this approximation, in order for a Bonferonni corrected p-value to be less than 0.01, a test statistic of about 11 standard deviations or more would be required.

log[- log F(t)] versus t

Need for theoretical and empirical comparisons Since no procedure is a priori optimal, comparisons are needed. Suppose we are considering a testing procedure such as that we have described and two Bayesian procedures: A Gregory-Loredo procedure based on a step function model for phased light curve A prior on Fourier coefficients, eg independent mean zero Gaussian with decreasing variance Also note that within each of these two, the particular prior is important. Even in traditional low-dimensional models, the Bayes factor is sensitive to the prior on model parameters, in contrast to its small effect in estimation. Kass & Raftery (1995). JASA. p. 773-

Example We run the procedure discussed earlier with We also use a Bayesian procedure: The signal has How do the detection procedures compare?

To compare a suite of frequentist and Bayesian procedures, we would like to like to understand the behavior of the Bayes factors if there is no signal and if there is a signal. (Box suggested that statistical models should be Bayesian but should be tested using sampling theory). Theory for the Bayesian models above? It might be possible to convert p-values to Bayes factors. It might be possible to evaluate posterior probabilities of all the competing models and perform composite inference (would involve massive computing). Inference is constrained by computation. Touchstone: blind comparisons on test signals. We understand that GLAST will be making such comparisons. Box (1981). In Bayesian Statistics: Valencia I Good (1992) JASA

Conclusion NERSC's IBM SP, Seaborg, has 6,080 CPUs. Problems are daunting, but with imagination, some theory, and a lot of computing power, there is hope for progress.