Presentation is loading. Please wait.

Presentation is loading. Please wait.

Benefits of Minimizing the Number of Discriminators Used in a Multivariate Analysis Sherry Towers State University of New York at Stony Brook.

Similar presentations


Presentation on theme: "Benefits of Minimizing the Number of Discriminators Used in a Multivariate Analysis Sherry Towers State University of New York at Stony Brook."— Presentation transcript:

1 Benefits of Minimizing the Number of Discriminators Used in a Multivariate Analysis Sherry Towers State University of New York at Stony Brook

2 S.Towers The case for fewer discriminators…  Using a large number of variables indiscriminantly can indicate a lack of forethought in the design and conceptualization of an analysis

3 S.Towers The case for fewer discriminators…  Also, each added variable makes it more difficult to determine if modelling of data is sound, and makes analysis more difficult to understand  And, each added variable adds statistical noise…This can degrade overall discrimination power!

4 S.Towers Optimising discrimination…  Maximise S/sqrt(S+B), or:

5 S.Towers The curse of too many variables: a simple example Signal 5D Gaussian  = (1,0,0,0,0)  = (1,1,1,1,1) Bkgnd 5D Gaussian  = (0,0,0,0,0)  = (1,1,1,1,1) Only difference between signal and background is in first dimension. Other four dimensions are `useless’ discriminators

6 S.Towers The curse of too many variables: a simple example

7 S.Towers The curse of too many variables: a simple example

8 S.Towers Optimising the number of variables (the easy way)… Use a `build-up’ process: 1) Start with a bunch of possible discriminators 2) Choose the one that gives maximal S/sqrt(S+B) 3) Add in others one-at-a- time, calculating S/sqrt(S+B) for each combo 4) choose the combo that maximises S/sqrt(S+B) (as long as S/sqrt(S+B) gets bigger!) 5) Repeat steps 3 and 4

9 S.Towers Optimising the number of variables (method II) 1) Start with a bunch of possible discriminators 2) Choose the one that gives maximal S/sqrt(S+B) 3) Add in others one-at-a- time, calculating S/sqrt(S+B) for each combo. Also add in, one-at-a-time N “dummy” variables. Mean and RMS of S/sqrt(S+B) with dummies forms basis for “null hypothesis” test.

10 S.Towers Optimising the number of variables (method II)… 4) choose the combo of real variables that maximises S/sqrt(S+B) (as long as S/sqrt(S+B) is X standard deviations better than S/sqrt(S+B) from previous iteration) 5) Repeat steps 3 and 4 until no further variables pass

11 S.Towers Implementing the procedure… Very easy to implement in analysis code! TerraFerMA, a program that interfaces to MLPfit, Jetnet, PDE methods, Fisher Discriminant, etc, etc, etc, includes this variable sorting method. User can quickly and easily sort potential discriminators. http://www-d0.fnal.gov/~smjt/ferma.ps

12 S.Towers A “real-world” example… A Tevatron RunI analysis used a 7 variable NN to discriminate between signal and background. Were all 7 needed? Ran the signal and background n-tuples through the TerraFerMA interface to the sorting method…

13 S.Towers A “real-world” example…

14 S.Towers Another “real-world” example… A Tevatron “physics- object-ID” method uses 9 variables in the analysis. How many are actually needed?

15 S.Towers Another “real-world” example…

16 S.Towers Summary  Careful examination of discriminators used in a multivariate analysis is always a good idea!  Reduction of number of variables can simplify analysis considerably, and can even increase discrimination power!


Download ppt "Benefits of Minimizing the Number of Discriminators Used in a Multivariate Analysis Sherry Towers State University of New York at Stony Brook."

Similar presentations


Ads by Google