Download presentation
Presentation is loading. Please wait.
1
Benefits of Minimizing the Number of Discriminators Used in a Multivariate Analysis Sherry Towers State University of New York at Stony Brook
2
S.Towers The case for fewer discriminators… Using a large number of variables indiscriminantly can indicate a lack of forethought in the design and conceptualization of an analysis
3
S.Towers The case for fewer discriminators… Also, each added variable makes it more difficult to determine if modelling of data is sound, and makes analysis more difficult to understand And, each added variable adds statistical noise…This can degrade overall discrimination power!
4
S.Towers Optimising discrimination… Maximise S/sqrt(S+B), or:
5
S.Towers The curse of too many variables: a simple example Signal 5D Gaussian = (1,0,0,0,0) = (1,1,1,1,1) Bkgnd 5D Gaussian = (0,0,0,0,0) = (1,1,1,1,1) Only difference between signal and background is in first dimension. Other four dimensions are `useless’ discriminators
6
S.Towers The curse of too many variables: a simple example
7
S.Towers The curse of too many variables: a simple example
8
S.Towers Optimising the number of variables (the easy way)… Use a `build-up’ process: 1) Start with a bunch of possible discriminators 2) Choose the one that gives maximal S/sqrt(S+B) 3) Add in others one-at-a- time, calculating S/sqrt(S+B) for each combo 4) choose the combo that maximises S/sqrt(S+B) (as long as S/sqrt(S+B) gets bigger!) 5) Repeat steps 3 and 4
9
S.Towers Optimising the number of variables (method II) 1) Start with a bunch of possible discriminators 2) Choose the one that gives maximal S/sqrt(S+B) 3) Add in others one-at-a- time, calculating S/sqrt(S+B) for each combo. Also add in, one-at-a-time N “dummy” variables. Mean and RMS of S/sqrt(S+B) with dummies forms basis for “null hypothesis” test.
10
S.Towers Optimising the number of variables (method II)… 4) choose the combo of real variables that maximises S/sqrt(S+B) (as long as S/sqrt(S+B) is X standard deviations better than S/sqrt(S+B) from previous iteration) 5) Repeat steps 3 and 4 until no further variables pass
11
S.Towers Implementing the procedure… Very easy to implement in analysis code! TerraFerMA, a program that interfaces to MLPfit, Jetnet, PDE methods, Fisher Discriminant, etc, etc, etc, includes this variable sorting method. User can quickly and easily sort potential discriminators. http://www-d0.fnal.gov/~smjt/ferma.ps
12
S.Towers A “real-world” example… A Tevatron RunI analysis used a 7 variable NN to discriminate between signal and background. Were all 7 needed? Ran the signal and background n-tuples through the TerraFerMA interface to the sorting method…
13
S.Towers A “real-world” example…
14
S.Towers Another “real-world” example… A Tevatron “physics- object-ID” method uses 9 variables in the analysis. How many are actually needed?
15
S.Towers Another “real-world” example…
16
S.Towers Summary Careful examination of discriminators used in a multivariate analysis is always a good idea! Reduction of number of variables can simplify analysis considerably, and can even increase discrimination power!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.