Presentation is loading. Please wait.

Presentation is loading. Please wait.

Computing and Statistical Data Analysis Stat 5: Multivariate Methods

Similar presentations


Presentation on theme: "Computing and Statistical Data Analysis Stat 5: Multivariate Methods"— Presentation transcript:

1 Computing and Statistical Data Analysis Stat 5: Multivariate Methods
London Postgraduate Lectures on Particle Physics; University of London MSci course PH4515 Glen Cowan Physics Department Royal Holloway, University of London Course web page: G. Cowan Computing and Statistical Data Analysis / Stat 5

2 Finding an optimal decision boundary
H0 In particle physics usually start by making simple “cuts”: xi < ci xj < cj H1 Maybe later try some other type of decision boundary: H0 H0 H1 H1 G. Cowan Computing and Statistical Data Analysis / Stat 5

3 Computing and Statistical Data Analysis / Stat 5
Multivariate methods Many new (and some old) methods: Fisher discriminant Neural networks Kernel density methods Support Vector Machines Decision trees Boosting Bagging New software for HEP, e.g., TMVA , Höcker, Stelzer, Tegenfeldt, Voss, Voss, physics/ StatPatternRecognition, I. Narsky, physics/ G. Cowan Computing and Statistical Data Analysis / Stat 5

4 Computing and Statistical Data Analysis / Stat 5
G. Cowan Computing and Statistical Data Analysis / Stat 5

5 Computing and Statistical Data Analysis / Stat 5
G. Cowan Computing and Statistical Data Analysis / Stat 5

6 Computing and Statistical Data Analysis / Stat 5
G. Cowan Computing and Statistical Data Analysis / Stat 5

7 Computing and Statistical Data Analysis / Stat 5
2 G. Cowan Computing and Statistical Data Analysis / Stat 5

8 Computing and Statistical Data Analysis / Stat 5
G. Cowan Computing and Statistical Data Analysis / Stat 5

9 Computing and Statistical Data Analysis / Stat 5
G. Cowan Computing and Statistical Data Analysis / Stat 5

10 Computing and Statistical Data Analysis / Stat 5
G. Cowan Computing and Statistical Data Analysis / Stat 5

11 Computing and Statistical Data Analysis / Stat 5
G. Cowan Computing and Statistical Data Analysis / Stat 5

12 Computing and Statistical Data Analysis / Stat 5
G. Cowan Computing and Statistical Data Analysis / Stat 5

13 Computing and Statistical Data Analysis / Stat 5
G. Cowan Computing and Statistical Data Analysis / Stat 5

14 Computing and Statistical Data Analysis / Stat 5
G. Cowan Computing and Statistical Data Analysis / Stat 5

15 Computing and Statistical Data Analysis / Stat 5
G. Cowan Computing and Statistical Data Analysis / Stat 5

16 Computing and Statistical Data Analysis / Stat 5
G. Cowan Computing and Statistical Data Analysis / Stat 5

17 Computing and Statistical Data Analysis / Stat 5
G. Cowan Computing and Statistical Data Analysis / Stat 5

18 Computing and Statistical Data Analysis / Stat 5
G. Cowan Computing and Statistical Data Analysis / Stat 5

19 Computing and Statistical Data Analysis / Stat 5
G. Cowan Computing and Statistical Data Analysis / Stat 5

20 Computing and Statistical Data Analysis / Stat 5
G. Cowan Computing and Statistical Data Analysis / Stat 5

21 Computing and Statistical Data Analysis / Stat 5
G. Cowan Computing and Statistical Data Analysis / Stat 5

22 Computing and Statistical Data Analysis / Stat 5
G. Cowan Computing and Statistical Data Analysis / Stat 5

23 Computing and Statistical Data Analysis / Stat 5
G. Cowan Computing and Statistical Data Analysis / Stat 5

24 Computing and Statistical Data Analysis / Stat 5
Overtraining If decision boundary is too flexible it will conform too closely to the training points → overtraining. Monitor by applying classifier to independent validation sample. training sample independent validation sample G. Cowan Computing and Statistical Data Analysis / Stat 5

25 Computing and Statistical Data Analysis / Stat 5
Choose classifier that minimizes error function for validation sample. G. Cowan Computing and Statistical Data Analysis / Stat 5

26 Neural network example from LEP II
Signal: e+e- → W+W- (often 4 well separated hadron jets) Background: e+e- → qqgg (4 less well separated hadron jets) ← input variables based on jet structure, event shape, ... none by itself gives much separation. Neural network output: (Garrido, Juste and Martinez, ALEPH ) G. Cowan Computing and Statistical Data Analysis / Stat 5

27 Computing and Statistical Data Analysis / Stat 5
G. Cowan Computing and Statistical Data Analysis / Stat 5

28 Computing and Statistical Data Analysis / Stat 5
G. Cowan Computing and Statistical Data Analysis / Stat 5

29 Computing and Statistical Data Analysis / Stat 5
G. Cowan Computing and Statistical Data Analysis / Stat 5

30 Computing and Statistical Data Analysis / Stat 5
G. Cowan Computing and Statistical Data Analysis / Stat 5

31 Computing and Statistical Data Analysis / Stat 5
G. Cowan Computing and Statistical Data Analysis / Stat 5

32 Kernel-based PDE (KDE, Parzen window)
Consider d dimensions, N training events, x1, ..., xN, estimate f (x) with bandwidth (smoothing parameter) kernel Use e.g. Gaussian kernel: Need to sum N terms to evaluate function (slow); faster algorithms only count events in vicinity of x (k-nearest neighbor, range search). G. Cowan Computing and Statistical Data Analysis / Stat 5

33 Computing and Statistical Data Analysis / Stat 5
G. Cowan Computing and Statistical Data Analysis / Stat 5

34 Computing and Statistical Data Analysis / Stat 5
G. Cowan Computing and Statistical Data Analysis / Stat 5

35 Computing and Statistical Data Analysis / Stat 5
Find these on next homework assignment. G. Cowan Computing and Statistical Data Analysis / Stat 5


Download ppt "Computing and Statistical Data Analysis Stat 5: Multivariate Methods"

Similar presentations


Ads by Google