C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 20031 Jens Zimmermann, Christian Kiesling Max-Planck-Institut für Physik, München.

C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 20031 Jens Zimmermann, Christian Kiesling Max-Planck-Institut für Physik, München MPI für extraterrestrische Physik, München Forschungszentrum Jülich GmbH Statistical Learning: Introduction with a simple example Occam‘s Razor Decision Trees Local Density Estimators Methods Based on Linear Separation Examples: Triggers in HEP and Astrophysics Conclusion Statistical Learning Methods in HEAP

C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 20032 Statistical Learning Does not use prior knowledge „No theory required“ Learns only from examples „Trial and error“ „Learning by reinforcement“ Two classes of statistical learning: discrete output 0/1: „classification“ continuous output: „regression“ Application in High Energy- and Astro-Physics: Background suppression, purification of events Estimation of parameters not directly measured

C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 20033 A simple Example: Preparing a Talk 0 1 2 3 4 5 6 x10 # formulas # slides 0 1 2 3 4 5 6 x10 # formulas# slides 4221 288 7119 6431 2936 1534 4844 5651 2555 1216 Experimentalists Theorists Data base established by Jens during Young Scientists Meeting at MPI

C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 20034 Discriminating Theorists from Experimentalists: A First Analysis 0 2 4 6 x10 # formulas 0 2 4 6 x10 # slides Experimentalists Theorists 0 1 2 3 4 5 6 x10 # formulas # slides 0 1 2 3 4 5 6 x10 First talks handed in Talks a week before meeting

C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 20035 Completely separable, but only via complicated boundary 0 1 2 3 4 5 6 x10 # formulas # slides 0 1 2 3 4 5 6 x10 First Problems 0 1 2 3 4 5 6 x10 # formulas # slides 0 1 2 3 4 5 6 x10 New talk by Ludger: 28 formulas on 31 slides At this point we cannot know which feature is „real“! Use Train/Test or Cross-Validation! Simple „model“, but no complete separation

C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 20036 See Overtraining - Want Generalization Need Regularization Want to tune the parameters of the learning algorithm depending on the overtraining seen! 0 1 2 3 4 5 6 x10 # formulas # slides 0 1 2 3 4 5 6 x10 TrainTest Training epochs E Training Set Test Set Overtraining

C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 20037 See Overtraining - Want Generalization Need Regularization 0 1 2 3 4 5 6 x10 # formulas # slides 0 1 2 3 4 5 6 x10 TrainTest Training epochs E Training Set Test Set Regularization will ensure adequate performance (e.g. VC dimensions): Limit the complexity of the model “Factor 10” - Rule: (“Uncle Bernie’s Rule #2”)

C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 20038 Philosophy: Occam‘s Razor Pluralitas non est ponenda sine necessitate. Do not make assumptions, unless they are really necessary. From theories which describe the same phenomenon equally well choose the one which contains the least number of assumptions. First razor: Given two models with the same generalization error, the simpler one should be preferred because simplicity is desirable in itself. Second razor: Given two models with the same training-set error, the simpler one should be preferred because it is likely to have lower generalization error. 14 th century No! „No free lunch“-theorem Wolpert 1996 Yes! But not of much use.

C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 20039 Decision Trees 0 2 4 6 x10 # formulas #formulas < 20 exp 0 2 4 6 x10 # slides 20 < #formulas < 60? #slides > 40exp #slides < 40th #slides < 40#slides > 40 expth #formulas < 20 #formulas > 60 rest exp th all events subset 20 < #formulas < 60 Classify Ringaile: 31 formulas on 32 slides th Regularization: Pruning #formulas > 60th

C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 200310 Local Density Estimators Search for similar events already classified within specified region, count the members of the two classes in that region. 0 1 2 3 4 5 6 x10 # formulas # slides 0 1 2 3 4 5 6 x10 # formulas # slides 0 1 2 3 4 5 6 x10

C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 200311 Maximum Likelihood 0 2 4 6 x10 # formulas 0 2 4 6 x10 # slides 3132 out= Correlation gets lost completely by projection! Regularization: Binning

C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 200312 k-Nearest-Neighbour 0 1 2 3 4 5 6 x10 # formulas # slides 0 1 2 3 4 5 6 x10 k=1 out= k=2 out= k=3 out= k=4 out= k=5 out= For every evaluation position the distances to each training position need to be determined! Regularization: Parameter k

C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 200313 0 1 2 3 4 5 6 x10 # formulas # slides 0 1 2 3 4 5 6 x10 1 3 4 5 6 8 7 Range Search 1 23 456 7 8 910 xx x yy Tree needs to be traversed only partially if box size is small enough! Small box: checked 1,2,4,9 out= Large box: checked all out= 3 58 y 6 10 x 7 2 9 Regularization: Box-Size

C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 200314 Methods Based on Linear Separation Divide the input space into regions separated by one or more hyperplanes. Extrapolation is done! 0 1 2 3 4 5 6 x10 # formulas # slides 0 1 2 3 4 5 6 x10 # formulas # slides 0 1 2 3 4 5 6 x10 LDA (Fisher discr.)

C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 200315 Neural Networks -50 +0.1 +1.1-1.1 +20 +0.2 +3.6 -1.8 # formulas# slides 0 1 2 3 4 5 6 x10 0 1 Regularization: # hidden neurons weight decay arbitrary inputs and hidden neurons Network with two hidden neurons (gradient descent):

C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 200316 Support Vector Machines Separating hyperplane with maximum distance to each data point: Maximum margin classifier Found by setting up condition for correct classfication and minimizing which leads to the Lagrangian Necessary condition for a minimum is Output becomes Only linear separation? The mapping to feature space is hidden in a kernel No! Replace dot products: Non-separable case:

C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 200317 Physics Applications: Neural Network Trigger at HERA keep physicsreject background H1

C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 200318 Trigger for J/  Events H1 NN 99.6% SVM 98.3% k-NN 97.7% RS 97.5% C4.5 97.5% ML 91.2% LDA 82% Eff@Rej=95%:

C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 200319 Triggering Charged Current Events signal background NN 74% SVM 73% C4.5 72% RS 72% k-NN 71% LDA 68% ML 65% Eff@Rej=80%:

C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 200320 Astrophysics: MAGIC - Gamma/Hadron Separation Random Forest:  = 93.3Neural Net:  = 96.5 Training with Data and MC Evaluation with Data vs. PhotonHadron  = signal (photon) enhancement factor

C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 200321 Future Experiment XEUS: Position of X-ray Photons  of reconstruction in µm NN 3.6 SVM 3.6 k-NN 3.7 RS 3.7 ETA 3.9 CCOM 4.0 XEUS ~300µm ~10µm electron potential transfer direction (Application of Stat. Learning in Regression Problems)

C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 200322 Conclusion Statistical learning theory is full of subtle details (models statistics) Neural Networks found superior in the HEP and Astrophysics applications (classification, regression) studied so far Widely used statistical learning methods studied: Decision Trees LDE: ML, k-NN, RS Linear separation: LDA, Neural Nets, SVM‘s Further applications (trigger, offline analyses) under study

C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 200323 From Classification to Regression k-NN 3 4 5 3 2 2 5 RS 3 4 5 3 2 2 5 NN Fit Gauss a=  (-2.1x - 1) b=  (+2.1x - 1)out=  (-12.7a-12.7b+9.4)

C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 20031 Jens Zimmermann, Christian Kiesling Max-Planck-Institut für Physik, München.

Similar presentations

Presentation on theme: "C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 20031 Jens Zimmermann, Christian Kiesling Max-Planck-Institut für Physik, München."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 20031 Jens Zimmermann, Christian Kiesling Max-Planck-Institut für Physik, München.

Similar presentations

Presentation on theme: "C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 20031 Jens Zimmermann, Christian Kiesling Max-Planck-Institut für Physik, München."— Presentation transcript:

Similar presentations

About project

Feedback