Multivariate Methods of Data Analysis in Cosmic Ray Astrophysics A. Chilingarian, A. Vardanyan Cosmic Ray Division, Yerevan Physics Institute, Armenia
ACAT 2002, June, Moscow, RussiaA.Chilingarian, A.Vardanyan, CRD-YerPhI Topics Main tasks to be solved in cosmic ray astrophysics Analysis methods Preprocessing and indication of the best parameters Neural Networks for the main data analysis Multi Start Random Search learning algorithm Training, Validation and Generalization errors Overtraining control
ACAT 2002, June, Moscow, RussiaA.Chilingarian, A.Vardanyan, CRD-YerPhI Individual event weights Results of NN classification and estimation Examples of applications
ACAT 2002, June, Moscow, RussiaA.Chilingarian, A.Vardanyan, CRD-YerPhI The MAGIC telescope for detecting -rays from point sources
ACAT 2002, June, Moscow, RussiaA.Chilingarian, A.Vardanyan, CRD-YerPhI The MAKET-ANI installation for the registration of Extensive Air Showers
ACAT 2002, June, Moscow, RussiaA.Chilingarian, A.Vardanyan, CRD-YerPhI The development of an extensive air shower induced by primary cosmic ray particle in the atmosphere
ACAT 2002, June, Moscow, RussiaA.Chilingarian, A.Vardanyan, CRD-YerPhI The Monte-Carlo Simulation is the key problem of any physical inference in indirect experiments
ACAT 2002, June, Moscow, RussiaA.Chilingarian, A.Vardanyan, CRD-YerPhI What tasks we want to solve measuring EAS characteristics? An inverse problem to be solved: Experimental data Simulated data Experimental data Simulated data ?,?(N e,N μ,N h,S…) E,A(N e,N μ,N h,S…) ?,?(N e,N μ,N h,S…) E,A(N e,N μ,N h,S…) Identification of primary particle type Estimation of primary particle energy
ACAT 2002, June, Moscow, RussiaA.Chilingarian, A.Vardanyan, CRD-YerPhI Why Neural Networks? Neural Networks belong to the general class of nonparametric methods that do not require any assumption about the parametric form of a statistical model they use Are appropriate technique for classification and estimation tasks Are able to treat multidimensional input data
ACAT 2002, June, Moscow, RussiaA.Chilingarian, A.Vardanyan, CRD-YerPhI The Neural information techniques The central issue of Neural Networks is a bounded mapping of n-dimensional input to m-dimensional output: The functional form of is accumulated in -NN parameters (weights) during the NN training process. The NN training process consists in iterative processing of simulated events, The aim of the training process consists in finding that provides the minimum of error (quality) function:
ACAT 2002, June, Moscow, RussiaA.Chilingarian, A.Vardanyan, CRD-YerPhI A Feed-Forward Neural Network
ACAT 2002, June, Moscow, RussiaA.Chilingarian, A.Vardanyan, CRD-YerPhI An example of the NN output distribution in case of classification task
ACAT 2002, June, Moscow, RussiaA.Chilingarian, A.Vardanyan, CRD-YerPhI Common drawbacks in NN training process Training only one network can lead to the suboptimal generalization Insufficient training events and a risk of overtraining
ACAT 2002, June, Moscow, RussiaA.Chilingarian, A.Vardanyan, CRD-YerPhI Multi start random search algorithm The Random search learning algorithm implements the following steps: 1. The initial values of NN weights are chosen randomly from Gaussian distribution with =0 and The random step in the multidimensional space of NN weights is performed from initial point to modify the weights, the alternation of weights is done according to: where is the NN weight vector at th iteration, is the step size, RNDM is a random number from [0,1] interval, and the term introduces and controls the degree of dependence of the random step value on the already achieved quality function
ACAT 2002, June, Moscow, RussiaA.Chilingarian, A.Vardanyan, CRD-YerPhI 3. The quality function is calculated at each iteration by presenting all the training events to NN 4. If i ≤ i-1, then the vector is kept as new weights of NN and the next step is initializing from that point in space of NN weights, otherwise – return to the previous point is implemented and a new random step is performed. Multi start technique consists in training many Neural Nets starting from different initial weights and using different step size parameters, allowing to scan many points in the multidimensional space of NN weights
ACAT 2002, June, Moscow, RussiaA.Chilingarian, A.Vardanyan, CRD-YerPhI Training and Validation errors, Overtraining control
ACAT 2002, June, Moscow, RussiaA.Chilingarian, A.Vardanyan, CRD-YerPhI An acceptable procedure to avoid the overtraining after each successful iteration of the learning process the net error is calculated for the validation sample if the validation error is less than the one obtained at previous iteration, then the NN weights obtained at the current training iteration are memorized else, the NN weights obtained at the previous iteration are stored At the end of training process the weights which provide the minimal error on the validation sample are found and used as the final best weights for NN
ACAT 2002, June, Moscow, RussiaA.Chilingarian, A.Vardanyan, CRD-YerPhI The multi start RS technique provides a possibility to select the NN with best performance on the control data set
ACAT 2002, June, Moscow, RussiaA.Chilingarian, A.Vardanyan, CRD-YerPhI Results of energy estimation by NN
ACAT 2002, June, Moscow, RussiaA.Chilingarian, A.Vardanyan, CRD-YerPhI Results of mass classification by NN
ACAT 2002, June, Moscow, RussiaA.Chilingarian, A.Vardanyan, CRD-YerPhI Application of NN for gamma/hadron separation task in gamma-ray astronomy
ACAT 2002, June, Moscow, RussiaA.Chilingarian, A.Vardanyan, CRD-YerPhI Cosmic Ray differential energy spectra obtained by NN classification and estimation