R.K.Bock, Durham, March Gamma/Hadron separation in atmospheric Cherenkov telescopes Overview multi-wavelength astrophysics imaging Cherenkov telescopes (IACT-s) image classification methods under study trying for a rigorous comparison
R.K.Bock, Durham, March Wavelength regimes in astrophysics extend over 20 orders of magnitude in energy, if one adds infrared, radio and microwave observations Cherenkov telescopes use visible light, but few quanta: ‘imaging’ takes a different meaning some instruments have to be satellite-based, due to the absorbing effect of the atmosphere
R.K.Bock, Durham, March Full sky at different wavelengths
R.K.Bock, Durham, March An AGN at different wavelengths
R.K.Bock, Durham, March Objects of interest: active galactic nuclei Black holes spin and develop a jet with shock waves: electrons and protons get accelerated and impart their energy to high-E -rays
R.K.Bock, Durham, March Principle of imaging Cherenkov telescopes a shower develops in the atmosphere, charged relativistic particles emit Cherenkov radiation (at WLs visible to UV) some photons arrive at sea level, get reflected by a mirror to a camera high sensitivity and good time resolution are vital, precision is not: high reflectivity mirrors, the best possible photomultipliers in the camera
R.K.Bock, Durham, March Principle of imaging Cherenkov telescopes
R.K.Bock, Durham, March Principle of image parameters hadron showers (cosmics) dominate the hardware trigger, image analysis must discriminate gammas from hadrons showers show different characteristics (like in any calorimeter): feature extraction using principal component analysis and other characteristics must be used - experiment in view of best separation
R.K.Bock, Durham, March One of the predecessor telescopes (HEGRA) in 1999
R.K.Bock, Durham, March Photomontage of the MAGIC telescope in La Palma (2000)
R.K.Bock, Durham, March Installing the mirror dish of MAGIC La Palma, Dec 2001
R.K.Bock, Durham, March
R.K.Bock, Durham, March Multivariate classification cuts are in the n-space of features (in our case image parameters), the problem gets unwieldy even at low n correlations between the features cause simple cuts in variables to be an ineffective method decorrelation by standard methods (e.g. Karhunen-Loeve) does not solve the problem, being a linear operation finding new variables does help, so do cut parameters along one axis, that depend on features along a different axis: dynamic cuts (subjective!) ideally, a transformation to a single test statistic should be found
R.K.Bock, Durham, March Different classification methods cuts in the image parameters (including dynamic cuts) mathematically optimized cuts in the image parameters: classification and regression tree (CART), commercial products available linear discriminant analysis (LDA) composite (2-D) probabilities (CP) kernel methods artificial neural networks (ANN)
R.K.Bock, Durham, March There are many general methods on the market (this slide from A.Faruque, Mississipi State University)
R.K.Bock, Durham, March Method details and comments: cuts and supercuts wide experience exists in many physics experiments and for all IACT-s; any method claiming to be superior must use results from these as yardstick does need an optimization criterion, will not result in a relation between gamma acceptance and hadron contamination (i.e. no single test statistic) usually leads to separate studies and approximations for each new data set (this is past experience) - often difficult to reproduce
R.K.Bock, Durham, March Method details and comments: CART developed originally by high-energy physicists to do away with the randomness in optimizing cuts (Breimann, Friedmann, Olshen, Stone, 1984) now developed into a data mining method, commercially available from several companies basic operations: growing a tree, pruning it, splitting the leaves again - done in some heuristic succession the problem is to find a robust measure to choose from the many trees that are (or can be) grown made for large samples: no experience with IACT-s, but there are promising early results
R.K.Bock, Durham, March Method details and comments: LDA parametric method, finding linear combinations of the original image parameters such that the separation between signal (gamma) and background (hadron) distributions gets maximized fast, simple and (probably) very robust ignores non-linear correlations in n-dimensional space (because of linear transformation) little experience with LDA in IACT-s, early tests show that higher-order variables are needed (e.g. x,y -> x 2 y)
R.K.Bock, Durham, March Method details and comments: LDA
R.K.Bock, Durham, March Method details and comments: LDA Like Principal Component Analysis (PCA), LDA is used for data classification and dimensionality reduction. LDA maximizes the ratio of between-class variance to within-class variance, for any pair of data sets. This guarantees maximal separability. The prime difference between LDA and PCA is that PCA performs feature classification (e.g. image parameters!) while LDA performs data classification. PCA changes both the shape and location of the data in its transformed space, whereas LDA provides more class separability by building a decision region between the classes. The formalism is simple: the transformation into the ‘best separable space’ is performed by the eigenvectors of a matrix readily derived from the data (for our application: in two classes, gammas and hadrons) Caveat: both the PCA and LDA are linear transformations; they may be of limited efficiency when non-linearity is involved.
R.K.Bock, Durham, March Method details and comments: kernel kernel density estimation is a nonparametric multivariate classification technique. The advantage is that of generality of the class-conditional and consistently estimated densities uses individual event likelihoods, defined as the closeness to the population of gamma events or hadron events in n-dimensional space. The closeness is expressed by a kernel function as metric mathematically convincing, but leading into practical problems, including limitations in dimensionality; there is also some randomness in choosing the kernel function has been toyed with in Whipple (the earliest functioning IACT), results look convincing; however, Whipple still uses supercuts; only first experience with kernels in MAGIC: positive
R.K.Bock, Durham, March Method details and comments: kernel
R.K.Bock, Durham, March Method details and comments: composite probabilities (2-D) intuitive determination of event probabilities by multiplying the probabilities in all 2D projections that can be made from image parameters, using constant bin content for some data shown on some IACT data to at least match best existing results (but strict comparisons suffered from moving data sets)
R.K.Bock, Durham, March CP program uses same-content binning in 2 dimensions Bins are set up for gammas (red), probabilities are evaluated for protons (blue) all possible 2-D projections are used Method details and comments: composite probabilities (2-D)
R.K.Bock, Durham, March Method details and comments: ANN-s method has been presented often in the past - resembles the CART method but works in locally linearly transformed data substantial randomness in choosing depth of tree, training method, transfer function….. so far no convincing results on IACT-s, Whipple have tried and rejected
R.K.Bock, Durham, March Gamma events in MAGIC before and after cleaning
R.K.Bock, Durham, March Proton events in MAGIC before and after cleaning
R.K.Bock, Durham, March Comparison MC gammas / MC protons
R.K.Bock, Durham, March Comparison MC gammas / MC protons
R.K.Bock, Durham, March Comparison MC gammas / MC protons
R.K.Bock, Durham, March Typically, optimization parameters are fully defined by cost, purity, and sample size Different methods on the same data set
R.K.Bock, Durham, March We are running a comparative study: criteria strictly defined disjoint training and control samples must give estimators for hadron contamination and gamma acceptance (purity and cost) should ideally result in a smooth function relating purity with cost, i.e. result in a single test statistic if not, must show results for several optimization criteria, e.g. estimated hadron contamination at fixed gamma acceptance values, significance, etc. for MC events, can control results by comparing classification to the known origin of events
R.K.Bock, Durham, March Even if there were a clear conclusion….. there remain some serious caveats these methods all assume an abstract space of image parameters, which is ok in Monte Carlo situations, only real data are subject to influences that distort this space: starfield and night sky background atmospheric conditions unavoidable detector changes and malfunction no method can invent new independent parameters we assume that in final analysis, gammas will be Monte Carlo, measurements are on/off: we must deal with variables which may not be representative in Monte Carlo events and yet influence the observed image parameters; e.g zenith angle changes continuously, energy is something we want to observe, hence unknown some compromise between frequent Monte Carlo-ing and parametric corrections to parameters is the likely solution