C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec. 20031 Jens Zimmermann, Christian Kiesling Max-Planck-Institut für Physik, München.

Slides:



Advertisements
Similar presentations
Generative Models Thus far we have essentially considered techniques that perform classification indirectly by modeling the training data, optimizing.
Advertisements

Jens Zimmermann, MPI für Physik München, ACAT 2005 Zeuthen1 Statistical Learning Basics Jens Zimmermann Max-Planck-Institut für Physik,
Jens Zimmermann, MPI für Physik München, ACAT 2005 Zeuthen1 Performance of Statistical Learning Methods Jens Zimmermann Max-Planck-Institut.
Slides from: Doug Gray, David Poole
VC theory, Support vectors and Hedged prediction technology.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
An Introduction of Support Vector Machine
Support Vector Machines
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
A (very) brief introduction to multivoxel analysis “stuff” Jo Etzel, Social Brain Lab
Lecture 14 – Neural Networks
Support Vector Machines (and Kernel Methods in general)
Introduction to Statistics and Machine Learning 1 How do we: understandunderstand interpretinterpret our measurements How do we get the data for our measurements.
MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 
Sparse vs. Ensemble Approaches to Supervised Learning
Artificial Intelligence Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)
Speaker Adaptation for Vowel Classification
Machine Learning CMPT 726 Simon Fraser University
Sparse vs. Ensemble Approaches to Supervised Learning
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Jens Zimmermann, Forschungszentrum Jülich, ACAT 021 Class Separation and Parameter Estimation with Neural Nets for the XEUS Project Jens Zimmermann
Jens Zimmermann, MPI für Physik München, ACAT 2005 Zeuthen1 Backups Jens Zimmermann Max-Planck-Institut für Physik, München Forschungszentrum.
Classification III Tamara Berg CS Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell,
July 11, 2001Daniel Whiteson Support Vector Machines: Get more Higgs out of your data Daniel Whiteson UC Berkeley.
Outline Separating Hyperplanes – Separable Case
This week: overview on pattern recognition (related to machine learning)
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.
DIGITAL IMAGE PROCESSING Dr J. Shanbehzadeh M. Hosseinajad ( J.Shanbehzadeh M. Hosseinajad)
Machine Learning CSE 681 CH2 - Supervised Learning.
Machine Learning Lecture 11 Summary G53MLE | Machine Learning | Dr Guoping Qiu1.
LOGO Ensemble Learning Lecturer: Dr. Bo Yuan
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
Feb 2007Alon Slapak 1 of 1 Classification A practical approach Classification Methods Training Set Classifier Example Definition Bibliography.
An Introduction to Support Vector Machine (SVM) Presenter : Ahey Date : 2007/07/20 The slides are based on lecture notes of Prof. 林智仁 and Daniel Yeung.
Comparison of Bayesian Neural Networks with TMVA classifiers Richa Sharma, Vipin Bhatnagar Panjab University, Chandigarh India-CMS March, 2009 Meeting,
Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.
Text Classification 2 David Kauchak cs459 Fall 2012 adapted from:
CS Inductive Bias1 Inductive Bias: How to generalize on novel data.
An Introduction to Support Vector Machine (SVM)
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
CSSE463: Image Recognition Day 14 Lab due Weds, 3:25. Lab due Weds, 3:25. My solutions assume that you don't threshold the shapes.ppt image. My solutions.
Support vector machine LING 572 Fei Xia Week 8: 2/23/2010 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A 1.
Support Vector Machines Tao Department of computer science University of Illinois.
Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
Machine Learning 5. Parametric Methods.
Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 9: Review.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 23: Linear Support Vector Machines Geoffrey Hinton.
Neural NetworksNN 21 Architecture We consider the architecture: feed- forward NN with one layer It is sufficient to study single layer perceptrons with.
Start with student evals. What function does perceptron #4 represent?
SVMs in a Nutshell.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
A Brief Introduction to Support Vector Machine (SVM) Most slides were from Prof. A. W. Moore, School of Computer Science, Carnegie Mellon University.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Helge VossAdvanced Scientific Computing Workshop ETH Multivariate Methods of data analysis Helge Voss Advanced Scientific Computing Workshop ETH.
CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
Usman Roshan Dept. of Computer Science NJIT
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Deep Feedforward Networks
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
LINEAR AND NON-LINEAR CLASSIFICATION USING SVM and KERNELS
Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis
Computing and Statistical Data Analysis Stat 5: Multivariate Methods
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Linear Discrimination
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Presentation transcript:

C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec Jens Zimmermann, Christian Kiesling Max-Planck-Institut für Physik, München MPI für extraterrestrische Physik, München Forschungszentrum Jülich GmbH Statistical Learning: Introduction with a simple example Occam‘s Razor Decision Trees Local Density Estimators Methods Based on Linear Separation Examples: Triggers in HEP and Astrophysics Conclusion Statistical Learning Methods in HEAP

C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec Statistical Learning Does not use prior knowledge „No theory required“ Learns only from examples „Trial and error“ „Learning by reinforcement“ Two classes of statistical learning: discrete output 0/1: „classification“ continuous output: „regression“ Application in High Energy- and Astro-Physics: Background suppression, purification of events Estimation of parameters not directly measured

C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec A simple Example: Preparing a Talk x10 # formulas # slides x10 # formulas# slides Experimentalists Theorists Data base established by Jens during Young Scientists Meeting at MPI

C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec Discriminating Theorists from Experimentalists: A First Analysis x10 # formulas x10 # slides Experimentalists Theorists x10 # formulas # slides x10 First talks handed in Talks a week before meeting

C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec Completely separable, but only via complicated boundary x10 # formulas # slides x10 First Problems x10 # formulas # slides x10 New talk by Ludger: 28 formulas on 31 slides At this point we cannot know which feature is „real“! Use Train/Test or Cross-Validation! Simple „model“, but no complete separation

C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec See Overtraining - Want Generalization Need Regularization Want to tune the parameters of the learning algorithm depending on the overtraining seen! x10 # formulas # slides x10 TrainTest Training epochs E Training Set Test Set Overtraining

C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec See Overtraining - Want Generalization Need Regularization x10 # formulas # slides x10 TrainTest Training epochs E Training Set Test Set Regularization will ensure adequate performance (e.g. VC dimensions): Limit the complexity of the model “Factor 10” - Rule: (“Uncle Bernie’s Rule #2”)

C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec Philosophy: Occam‘s Razor Pluralitas non est ponenda sine necessitate. Do not make assumptions, unless they are really necessary. From theories which describe the same phenomenon equally well choose the one which contains the least number of assumptions. First razor: Given two models with the same generalization error, the simpler one should be preferred because simplicity is desirable in itself. Second razor: Given two models with the same training-set error, the simpler one should be preferred because it is likely to have lower generalization error. 14 th century No! „No free lunch“-theorem Wolpert 1996 Yes! But not of much use.

C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec Decision Trees x10 # formulas #formulas < 20 exp x10 # slides 20 < #formulas < 60? #slides > 40exp #slides < 40th #slides < 40#slides > 40 expth #formulas < 20 #formulas > 60 rest exp th all events subset 20 < #formulas < 60 Classify Ringaile: 31 formulas on 32 slides th Regularization: Pruning #formulas > 60th

C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec Local Density Estimators Search for similar events already classified within specified region, count the members of the two classes in that region x10 # formulas # slides x10 # formulas # slides x10

C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec Maximum Likelihood x10 # formulas x10 # slides 3132 out= Correlation gets lost completely by projection! Regularization: Binning

C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec k-Nearest-Neighbour x10 # formulas # slides x10 k=1 out= k=2 out= k=3 out= k=4 out= k=5 out= For every evaluation position the distances to each training position need to be determined! Regularization: Parameter k

C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec x10 # formulas # slides x Range Search xx x yy Tree needs to be traversed only partially if box size is small enough! Small box: checked 1,2,4,9 out= Large box: checked all out= 3 58 y 6 10 x Regularization: Box-Size

C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec Methods Based on Linear Separation Divide the input space into regions separated by one or more hyperplanes. Extrapolation is done! x10 # formulas # slides x10 # formulas # slides x10 LDA (Fisher discr.)

C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec Neural Networks # formulas# slides x Regularization: # hidden neurons weight decay arbitrary inputs and hidden neurons Network with two hidden neurons (gradient descent):

C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec Support Vector Machines Separating hyperplane with maximum distance to each data point: Maximum margin classifier Found by setting up condition for correct classfication and minimizing which leads to the Lagrangian Necessary condition for a minimum is Output becomes Only linear separation? The mapping to feature space is hidden in a kernel No! Replace dot products: Non-separable case:

C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec Physics Applications: Neural Network Trigger at HERA keep physicsreject background H1

C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec Trigger for J/  Events H1 NN 99.6% SVM 98.3% k-NN 97.7% RS 97.5% C % ML 91.2% LDA 82%

C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec Triggering Charged Current Events signal background NN 74% SVM 73% C4.5 72% RS 72% k-NN 71% LDA 68% ML 65%

C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec Astrophysics: MAGIC - Gamma/Hadron Separation Random Forest:  = 93.3Neural Net:  = 96.5 Training with Data and MC Evaluation with Data vs. PhotonHadron  = signal (photon) enhancement factor

C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec Future Experiment XEUS: Position of X-ray Photons  of reconstruction in µm NN 3.6 SVM 3.6 k-NN 3.7 RS 3.7 ETA 3.9 CCOM 4.0 XEUS ~300µm ~10µm electron potential transfer direction (Application of Stat. Learning in Regression Problems)

C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec Conclusion Statistical learning theory is full of subtle details (models statistics) Neural Networks found superior in the HEP and Astrophysics applications (classification, regression) studied so far Widely used statistical learning methods studied: Decision Trees LDE: ML, k-NN, RS Linear separation: LDA, Neural Nets, SVM‘s Further applications (trigger, offline analyses) under study

C. Kiesling, MPI for Physics, Munich - ACAT03 Workshop, KEK, Japan, Dec From Classification to Regression k-NN RS NN Fit Gauss a=  (-2.1x - 1) b=  (+2.1x - 1)out=  (-12.7a-12.7b+9.4)