Lecture 8 Source detection NASSP Masters 5003S - Computational Astronomy
Different sorts of model NASSP Masters 5003S - Computational Astronomy All models Background + signal Background + many similar signals
b + s vs b + Σ s i NASSP Masters 5003S - Computational Astronomy s may often be assumed to be: -slowly varying with r; -with compact support
Source detection The basic idea is related to Null Hypothesis testing… –But if the sources can be assumed to be localized, we can cut the data up and test each source-sized bit at a time. sliding window. NASSP Masters 5003S - Computational Astronomy Survival function Some missed jargon: the probability at the intercept is called the P-value (you can google it)
Testing the NH: Not all tests are equally good at finding signals! Eg Cash statistic is better than χ 2 (in circumstances where the Cash test is appropriate – eg bkg is a subset of the signal model). Cash stat makes use of knowledge about the signal shape – in general any stat which does similar (eg a matched filter) will also perform well. There is an infinite variety of ‘statistics’ to choose from. NASSP Masters 5003S - Computational Astronomy
NASSP Masters 5003F - Computational Astronomy Source detection. –If the SF probability in each patch (the P- value) is smaller than a previously chosen cutoff, we can call this a positive detection. BUT! Note that there is no certainty. –Sometimes the null model will by chance give a large χ 2 => ‘false positives.’ For given data, background and cutoff, there will be a fixed number of false positives expected in the source list. => ‘reliability’. More on this later. –Sometimes a real source will give a small null- hypothesis χ 2 => ‘false negatives’, real sources which are missed. => ‘completeness’. More on this later.
NASSP Masters 5003F - Computational Astronomy Problems with the NH approach: We don’t have exact knowledge of the background. –Have to estimate it either from separate data – in which case we need the separate data! (Don’t always have the luxury.) or from the same data… but this may be dominated by the source... –Or our background model may be wrong. Same issues as other model fitting. In particular: –χ 2 has to be used with care when the noise is Poisson.
NASSP Masters 5003F - Computational Astronomy But where are the sources? Applying some sort of NH test in a sliding window will return a new random signal – now correlated.. Finding the sources consists rather of looking for peaks in this random signal. The simplest example is when the noise is uncorrelated and the source peaks have width=0.
Looking for sources 1 channel at a time: In each channel, we test the NH with N=1. –Since there are no fitted parameters, υ =1 also. –If the source occupies a single channel, this procedure is optimal. –If, however, the source is spread over several channels (as is usual), this procedure is not efficient. –We want a statistic which uses the maximum amount of information about the source shape. NASSP Masters 5003F - Computational Astronomy
A generic source-detection algorithm We shall assume that: –The data is ‘binned’ (eg CCD data). –We have a good independent estimate of the background. –The sources are sparsely distributed – such that we can deal with them one at a time. –The shape of the source profile is known. –The source position is unknown. –The source amplitude is unknown (but >0).
NASSP Masters 5003F - Computational Astronomy Generic source-detection algorithm: The algorithm has 3 steps: Calculate a sliding-window map. Find the peaks in this map. For each peak, calculate the probability that it could arise by chance from the background (the null hypothesis P-value). P < P cutoff ? Sources Rejects NoYes 1: 2: 3: Choose a P cutoff
NASSP Masters 5003F - Computational Astronomy : The sliding window. y y y U U U
NASSP Masters 5003F - Computational Astronomy : The sliding window. For each position of the sliding window, a single number U is calculated from the values falling within the window. The output is a map of the U values. The intent is to: –Raise the signal-to-noise –Improve sensitivity –Amplify the sources at the expense of the noise. Sliding-window processing only has value when the source has a width > 1 pixel. Edges need special treatment. Same thing.
NASSP Masters 5003F - Computational Astronomy : Window functions A weighted sum (= a convolution). –Simplest with all weights = 1: “sliding box”. –Optimum weights – a “matched filter”: For uniform Gaussian noise, w opt = s. Trickier to optimize for Poisson noise. Per-window null-hypothesis χ 2. –With either an independent value of bkg (in which case degrees of freedom = number of pixels N w in the window), or… –…one fitted from the data (deg free = N w -1). Likelihood (same bkg provisions as χ 2 ).
NASSP Masters 5003F - Computational Astronomy : Window functions Parent function Data
NASSP Masters 5003F - Computational Astronomy Parent function 1: Window functions Matched filter, size=10 Chi squared, size=100 Log-likelihood, size=100
NASSP Masters 5003F - Computational Astronomy : Peak finding Gaussian noise, convolved with a gaussian filter. …don’t get the gaussians mixed up!
NASSP Masters 5003F - Computational Astronomy : Peak finding How best to do it? There’s no single neat prescription. Naive prescription: –Pixel i is a peak pixel if y i > any other y within a patch of pixels from i-j to i+j. This probably looks familiar to you. But what value to choose for j? Things to avoid are: –j too small – results in more than 1 peak per source; –j too large – misses a close adjacent source.
NASSP Masters 5003F - Computational Astronomy : Peak finding Box too small: Box too large:
NASSP Masters 5003F - Computational Astronomy : Decision time – is it a source or not? To calculate a P-value we need the probability distribution of peaks in the post- window map of U values (given the null hypothesis). This is not the same as the probability distribution of the original data values… …nor is it even the same as the probability distribution of U values. In fact, little work seems to have been done on p peaks. (Though there is quite a lot on the distribution of extrema – not quite the same thing.)
NASSP Masters 5003F - Computational Astronomy ‘Map’ vs ‘peak’ distributions for Gaussian noise. 3: The decision Black: all pixels Red: peaks
NASSP Masters 5003F - Computational Astronomy : Cash to the rescue A practical recipe for applying Cash to source detection goes as follows: –Choose a window area surrounding each peak. –Within this window, calculate L null with model m i = b i (the background map values). –Calculate L best by fitting a model Degrees of freedom ν = 1 (the amplitude) + d (the dimensions of the spatial fit). –The Cash statistic 2(L best -L null ) behaves like χ 2 with 1+d deg. free. m i = b i + θ 1 s(r i – θ r )
NASSP Masters 5003F - Computational Astronomy : Cash to the rescue The only difficult point (which is a problem for every method) is to calculate the fraction of pixels which are peaks. –Monte Carlo –Possibly a Fourier technique? Also, don’t want to use the fit for final parameter values. A Mighell fit is better. From my 2009 Cash paper.
NASSP Masters 5003F - Computational Astronomy What is the best detection method? From my 2009 Cash paper.