R.K.Bock, Durham, March 2002 1 Gamma/Hadron separation in atmospheric Cherenkov telescopes Overview multi-wavelength astrophysics imaging Cherenkov telescopes.

Slides:

Advertisements

Similar presentations

CBM Calorimeter System CBM collaboration meeting, October 2008 I.Korolko(ITEP, Moscow)

Advertisements

Component Analysis (Review)

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.

ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.

OBSERVATIONS OF AGNs USING PACT (Pachmarhi Array of Cherenkov Telescopes) Debanjan Bose (On behalf of PACT collaboration) “The Multi-Messenger Approach.

Principal Component Analysis CMPUT 466/551 Nilanjan Ray.

8. Statistical tests 8.1 Hypotheses K. Desch – Statistical methods of data analysis SS10 Frequent problem: Decision making based on statistical information.

Top Turns Ten March 2 nd, Measurement of the Top Quark Mass The Low Bias Template Method using Lepton + jets events Kevin Black, Meenakshi Narain.

Dimensional reduction, PCA

Kevin Black Meenakshi Narain Boston University

Optimization of Signal Significance by Bagging Decision Trees Ilya Narsky, Caltech presented by Harrison Prosper.

Independent Component Analysis (ICA) and Factor Analysis (FA)

Multivariate Analysis A Unified Perspective

Neural Nets for Ground Based Gamma-ray Astronomy G.M. Maneva *, J.Procureur **, P.P. Temnikov * * Institute for Nuclear Research and Nuclear EnergySofia,

Ch. 10: Linear Discriminant Analysis (LDA) based on slides from

The ANTARES Neutrino Telescope Mieke Bouwhuis 27/03/2006.

The Neymann-Pearson Lemma Suppose that the data x 1, …, x n has joint density function f(x 1, …, x n ;  ) where  is either  1 or  2. Let g(x 1, …,

1 Linear Methods for Classification Lecture Notes for CMPUT 466/551 Nilanjan Ray.

July 11, 2001Daniel Whiteson Support Vector Machines: Get more Higgs out of your data Daniel Whiteson UC Berkeley.

1 Tuning in to Nature’s Tevatrons Stella Bradbury, University of Leeds T e V  -ray Astronomy the atmospheric Cherenkov technique the Whipple 10m telescope.

Irakli Chakaberia Final Examination April 28, 2014.

ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.

Reconstruction techniques, Aart Heijboer, OWG meeting, Marseille nov Reconstruction techniques Estimators ML /   Estimator M-Estimator Background.

ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.

Descriptive Statistics vs. Factor Analysis Descriptive statistics will inform on the prevalence of a phenomenon, among a given population, captured by.

Ronald Bruijn – 10 th APP Symposium Antares results and status Ronald Bruijn.

August 30, 2006 CAT physics meeting Calibration of b-tagging at Tevatron 1. A Secondary Vertex Tagger 2. Primary and secondary vertex reconstruction 3.

Analysis chain for MAGIC Telescope data Daniel Mazin and Nadia Tonello Max-Planck-Institut für Physik München D.Mazin, N.Tonello MPI for Physics, Munich.

Gus Sinnis Asilomar Meeting 11/16/2003 The Next Generation All-Sky VHE Gamma-Ray Telescope.

The ANTARES neutrino telescope is located on the bottom of the Mediterranean Sea, 40 km off the French coast. The detector is installed at a depth of 2.5.

A Cherenkov Radiation Detector Design for the Observation and Measurement of High Energy Cosmic Rays Yvette Cendes, Research Adviser: Corbin Covault Department.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 07: BAYESIAN ESTIMATION (Cont.) Objectives:

Matteo Palermo “Estimation of the probability of observing a gamma-ray flare based on the analysis of the Fermi data” Student: Matteo Palermo.

Data collected during the year 2006 by the first 9 strings of IceCube can be used to measure the energy spectrum of the atmospheric muon neutrino flux.

Linear Models for Classification

Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.

Outline Cosmic Rays and Super-Nova Remnants

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 12: Advanced Discriminant Analysis Objectives:

June 6, 2006 CALOR 2006 E. Hays University of Chicago / Argonne National Lab VERITAS Imaging Calorimetry at Very High Energies.

MACHINE LEARNING 7. Dimensionality Reduction. Dimensionality of input Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Preliminary Measurement of the Ke3 Form Factor f + (t) M. Antonelli, M. Dreucci, C. Gatti Introduction: Form Factor Parametrization Fitting Function and.

A First Look At VERITAS Data Stephen Fegan Vladimir Vassiliev UCLA.

Feature Extraction 主講人：虞台文. Content Principal Component Analysis (PCA) PCA Calculation — for Fewer-Sample Case Factor Analysis Fisher’s Linear Discriminant.

2D-LDA: A statistical linear discriminant analysis for image matrix

1 Introduction to Statistics − Day 2 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Brief catalogue of probability densities.

Markarian 421 with MAGIC telescope Daniel Mazin for the MAGIC Collaboration Max-Planck-Institut für Physik, München

G. Cowan Lectures on Statistical Data Analysis Lecture 6 page 1 Statistical Data Analysis: Lecture 6 1Probability, Bayes’ theorem 2Random variables and.

Feature Extraction 主講人：虞台文.

Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.

Overfitting, Bias/Variance tradeoff. 2 Content of the presentation Bias and variance definitions Parameters that influence bias and variance Bias and.

Methods of multivariate analysis Ing. Jozef Palkovič, PhD.

Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.

Helge VossAdvanced Scientific Computing Workshop ETH Multivariate Methods of data analysis Helge Voss Advanced Scientific Computing Workshop ETH.

MARS – CheObs ed.* A Flexible Software Framework for Future Cherenkov Telescopes Thomas Bretz and Daniela Dorner presented by Pavel Binko * Modular Analysis.

Daniel Mazin and Nadia Tonello Max-Planck-Institut für Physik München

LECTURE 11: Advanced Discriminant Analysis

LECTURE 09: BAYESIAN ESTIMATION (Cont.)

LECTURE 10: DISCRIMINANT ANALYSIS

Electron Observations from ATIC and HESS

Multi-dimensional likelihood

Litao Zhao Liaoning University&IHEP

ECE539 final project Instructor: Yu Hen Hu Fall 2005

Descriptive Statistics vs. Factor Analysis

Feature space tansformation methods

Principal Component Analysis

Generally Discriminant Analysis

LECTURE 09: DISCRIMINANT ANALYSIS

Parametric Methods Berlin Chen, 2005 References:

Feature Selection Methods

Presentation transcript:

R.K.Bock, Durham, March Gamma/Hadron separation in atmospheric Cherenkov telescopes Overview multi-wavelength astrophysics imaging Cherenkov telescopes (IACT-s) image classification methods under study trying for a rigorous comparison

R.K.Bock, Durham, March Wavelength regimes in astrophysics extend over 20 orders of magnitude in energy, if one adds infrared, radio and microwave observations Cherenkov telescopes use visible light, but few quanta: ‘imaging’ takes a different meaning some instruments have to be satellite-based, due to the absorbing effect of the atmosphere

R.K.Bock, Durham, March Full sky at different wavelengths

R.K.Bock, Durham, March An AGN at different wavelengths

R.K.Bock, Durham, March Objects of interest: active galactic nuclei Black holes spin and develop a jet with shock waves: electrons and protons get accelerated and impart their energy to high-E  -rays

R.K.Bock, Durham, March Principle of imaging Cherenkov telescopes a shower develops in the atmosphere, charged relativistic particles emit Cherenkov radiation (at WLs visible to UV) some photons arrive at sea level, get reflected by a mirror to a camera high sensitivity and good time resolution are vital, precision is not: high reflectivity mirrors, the best possible photomultipliers in the camera

R.K.Bock, Durham, March Principle of imaging Cherenkov telescopes

R.K.Bock, Durham, March Principle of image parameters hadron showers (cosmics) dominate the hardware trigger, image analysis must discriminate gammas from hadrons showers show different characteristics (like in any calorimeter): feature extraction using principal component analysis and other characteristics must be used - experiment in view of best separation

R.K.Bock, Durham, March One of the predecessor telescopes (HEGRA) in 1999

R.K.Bock, Durham, March Photomontage of the MAGIC telescope in La Palma (2000)

R.K.Bock, Durham, March Installing the mirror dish of MAGIC La Palma, Dec 2001

R.K.Bock, Durham, March

R.K.Bock, Durham, March Multivariate classification cuts are in the n-space of features (in our case image parameters), the problem gets unwieldy even at low n correlations between the features cause simple cuts in variables to be an ineffective method decorrelation by standard methods (e.g. Karhunen-Loeve) does not solve the problem, being a linear operation finding new variables does help, so do cut parameters along one axis, that depend on features along a different axis: dynamic cuts (subjective!) ideally, a transformation to a single test statistic should be found

R.K.Bock, Durham, March Different classification methods cuts in the image parameters (including dynamic cuts) mathematically optimized cuts in the image parameters: classification and regression tree (CART), commercial products available linear discriminant analysis (LDA) composite (2-D) probabilities (CP) kernel methods artificial neural networks (ANN)

R.K.Bock, Durham, March There are many general methods on the market (this slide from A.Faruque, Mississipi State University)

R.K.Bock, Durham, March Method details and comments: cuts and supercuts wide experience exists in many physics experiments and for all IACT-s; any method claiming to be superior must use results from these as yardstick does need an optimization criterion, will not result in a relation between gamma acceptance and hadron contamination (i.e. no single test statistic) usually leads to separate studies and approximations for each new data set (this is past experience) - often difficult to reproduce

R.K.Bock, Durham, March Method details and comments: CART developed originally by high-energy physicists to do away with the randomness in optimizing cuts (Breimann, Friedmann, Olshen, Stone, 1984) now developed into a data mining method, commercially available from several companies basic operations: growing a tree, pruning it, splitting the leaves again - done in some heuristic succession the problem is to find a robust measure to choose from the many trees that are (or can be) grown made for large samples: no experience with IACT-s, but there are promising early results

R.K.Bock, Durham, March Method details and comments: LDA parametric method, finding linear combinations of the original image parameters such that the separation between signal (gamma) and background (hadron) distributions gets maximized fast, simple and (probably) very robust ignores non-linear correlations in n-dimensional space (because of linear transformation) little experience with LDA in IACT-s, early tests show that higher-order variables are needed (e.g. x,y -> x 2 y)

R.K.Bock, Durham, March Method details and comments: LDA

R.K.Bock, Durham, March Method details and comments: LDA Like Principal Component Analysis (PCA), LDA is used for data classification and dimensionality reduction. LDA maximizes the ratio of between-class variance to within-class variance, for any pair of data sets. This guarantees maximal separability. The prime difference between LDA and PCA is that PCA performs feature classification (e.g. image parameters!) while LDA performs data classification. PCA changes both the shape and location of the data in its transformed space, whereas LDA provides more class separability by building a decision region between the classes. The formalism is simple: the transformation into the ‘best separable space’ is performed by the eigenvectors of a matrix readily derived from the data (for our application: in two classes, gammas and hadrons) Caveat: both the PCA and LDA are linear transformations; they may be of limited efficiency when non-linearity is involved.

R.K.Bock, Durham, March Method details and comments: kernel kernel density estimation is a nonparametric multivariate classification technique. The advantage is that of generality of the class-conditional and consistently estimated densities uses individual event likelihoods, defined as the closeness to the population of gamma events or hadron events in n-dimensional space. The closeness is expressed by a kernel function as metric mathematically convincing, but leading into practical problems, including limitations in dimensionality; there is also some randomness in choosing the kernel function has been toyed with in Whipple (the earliest functioning IACT), results look convincing; however, Whipple still uses supercuts; only first experience with kernels in MAGIC: positive

R.K.Bock, Durham, March Method details and comments: kernel

R.K.Bock, Durham, March Method details and comments: composite probabilities (2-D) intuitive determination of event probabilities by multiplying the probabilities in all 2D projections that can be made from image parameters, using constant bin content for some data shown on some IACT data to at least match best existing results (but strict comparisons suffered from moving data sets)

R.K.Bock, Durham, March CP program uses same-content binning in 2 dimensions Bins are set up for gammas (red), probabilities are evaluated for protons (blue) all possible 2-D projections are used Method details and comments: composite probabilities (2-D)

R.K.Bock, Durham, March Method details and comments: ANN-s method has been presented often in the past - resembles the CART method but works in locally linearly transformed data substantial randomness in choosing depth of tree, training method, transfer function….. so far no convincing results on IACT-s, Whipple have tried and rejected

R.K.Bock, Durham, March Gamma events in MAGIC before and after cleaning

R.K.Bock, Durham, March Proton events in MAGIC before and after cleaning

R.K.Bock, Durham, March Comparison MC gammas / MC protons

R.K.Bock, Durham, March Comparison MC gammas / MC protons

R.K.Bock, Durham, March Comparison MC gammas / MC protons

R.K.Bock, Durham, March Typically, optimization parameters are fully defined by cost, purity, and sample size Different methods on the same data set

R.K.Bock, Durham, March We are running a comparative study: criteria strictly defined disjoint training and control samples must give estimators for hadron contamination and gamma acceptance (purity and cost) should ideally result in a smooth function relating purity with cost, i.e. result in a single test statistic if not, must show results for several optimization criteria, e.g. estimated hadron contamination at fixed gamma acceptance values, significance, etc. for MC events, can control results by comparing classification to the known origin of events

R.K.Bock, Durham, March Even if there were a clear conclusion….. there remain some serious caveats these methods all assume an abstract space of image parameters, which is ok in Monte Carlo situations, only real data are subject to influences that distort this space: starfield and night sky background atmospheric conditions unavoidable detector changes and malfunction no method can invent new independent parameters we assume that in final analysis, gammas will be Monte Carlo, measurements are on/off: we must deal with variables which may not be representative in Monte Carlo events and yet influence the observed image parameters; e.g zenith angle changes continuously, energy is something we want to observe, hence unknown some compromise between frequent Monte Carlo-ing and parametric corrections to parameters is the likely solution