C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec Jens Zimmermann, Christian Kiesling Max-Planck-Institut für Physik, München MPI für extraterrestrische Physik, München Forschungszentrum Jülich GmbH Statistical Learning: Introduction with a simple example Occam‘s Razor Decision Trees Local Density Estimators Methods Based on Linear Separation Examples: Triggers in HEP and Astrophysics Conclusion Statistical Learning Methods in HEAP
C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec Statistical Learning Does not use prior knowledge „No theory required“ Learns only from examples „Trial and error“ „Learning by reinforcement“ Two classes of statistical learning: discrete output 0/1: „classification“ continuous output: „regression“ Application in High Energy- and Astro-Physics: Background suppression, purification of events Estimation of parameters not directly measured
C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec A simple Example: Preparing a Talk x10 # formulas # slides x10 # formulas# slides Experimentalists Theorists Data base established by Jens during Young Scientists Meeting at MPI
C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec Discriminating Theorists from Experimentalists: A First Analysis x10 # formulas x10 # slides Experimentalists Theorists x10 # formulas # slides x10 First talks handed in Talks a week before meeting
C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec Completely separable, but only via complicated boundary x10 # formulas # slides x10 First Problems x10 # formulas # slides x10 New talk by Ludger: 28 formulas on 31 slides At this point we cannot know which feature is „real“! Use Train/Test or Cross-Validation! Simple „model“, but no complete separation
C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec See Overtraining - Want Generalization Need Regularization Want to tune the parameters of the learning algorithm depending on the overtraining seen! x10 # formulas # slides x10 TrainTest Training epochs E Training Set Test Set Overtraining
C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec See Overtraining - Want Generalization Need Regularization x10 # formulas # slides x10 TrainTest Training epochs E Training Set Test Set Regularization will ensure adequate performance (e.g. VC dimensions): Limit the complexity of the model “Factor 10” - Rule: (“Uncle Bernie’s Rule #2”)
C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec Philosophy: Occam‘s Razor Pluralitas non est ponenda sine necessitate. Do not make assumptions, unless they are really necessary. From theories which describe the same phenomenon equally well choose the one which contains the least number of assumptions. First razor: Given two models with the same generalization error, the simpler one should be preferred because simplicity is desirable in itself. Second razor: Given two models with the same training-set error, the simpler one should be preferred because it is likely to have lower generalization error. 14 th century No! „No free lunch“-theorem Wolpert 1996 Yes! But not of much use.
C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec Decision Trees x10 # formulas #formulas < 20 exp x10 # slides 20 < #formulas < 60? #slides > 40exp #slides < 40th #slides < 40#slides > 40 expth #formulas < 20 #formulas > 60 rest exp th all events subset 20 < #formulas < 60 Classify Ringaile: 31 formulas on 32 slides th Regularization: Pruning #formulas > 60th
C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec Local Density Estimators Search for similar events already classified within specified region, count the members of the two classes in that region x10 # formulas # slides x10 # formulas # slides x10
C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec Maximum Likelihood x10 # formulas x10 # slides 3132 out= Correlation gets lost completely by projection! Regularization: Binning
C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec k-Nearest-Neighbour x10 # formulas # slides x10 k=1 out= k=2 out= k=3 out= k=4 out= k=5 out= For every evaluation position the distances to each training position need to be determined! Regularization: Parameter k
C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec x10 # formulas # slides x Range Search xx x yy Tree needs to be traversed only partially if box size is small enough! Small box: checked 1,2,4,9 out= Large box: checked all out= 3 58 y 6 10 x Regularization: Box-Size
C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec Methods Based on Linear Separation Divide the input space into regions separated by one or more hyperplanes. Extrapolation is done! x10 # formulas # slides x10 # formulas # slides x10 LDA (Fisher discr.)
C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec Neural Networks # formulas# slides x Regularization: # hidden neurons weight decay arbitrary inputs and hidden neurons Network with two hidden neurons (gradient descent):
C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec Support Vector Machines Separating hyperplane with maximum distance to each data point: Maximum margin classifier Found by setting up condition for correct classfication and minimizing which leads to the Lagrangian Necessary condition for a minimum is Output becomes Only linear separation? The mapping to feature space is hidden in a kernel No! Replace dot products: Non-separable case:
C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec Physics Applications: Neural Network Trigger at HERA keep physicsreject background H1
C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec Trigger for J/ Events H1 NN 99.6% SVM 98.3% k-NN 97.7% RS 97.5% C % ML 91.2% LDA 82%
C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec Triggering Charged Current Events signal background NN 74% SVM 73% C4.5 72% RS 72% k-NN 71% LDA 68% ML 65%
C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec Astrophysics: MAGIC - Gamma/Hadron Separation Random Forest: = 93.3Neural Net: = 96.5 Training with Data and MC Evaluation with Data vs. PhotonHadron = signal (photon) enhancement factor
C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec Future Experiment XEUS: Position of X-ray Photons of reconstruction in µm NN 3.6 SVM 3.6 k-NN 3.7 RS 3.7 ETA 3.9 CCOM 4.0 XEUS ~300µm ~10µm electron potential transfer direction (Application of Stat. Learning in Regression Problems)
C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec Conclusion Statistical learning theory is full of subtle details (models statistics) Neural Networks found superior in the HEP and Astrophysics applications (classification, regression) studied so far Widely used statistical learning methods studied: Decision Trees LDE: ML, k-NN, RS Linear separation: LDA, Neural Nets, SVM‘s Further applications (trigger, offline analyses) under study
C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec From Classification to Regression k-NN RS NN Fit Gauss a= (-2.1x - 1) b= (+2.1x - 1)out= (-12.7a-12.7b+9.4)