Benefits of Minimizing the Number of Discriminators Used in a Multivariate Analysis Sherry Towers State University of New York at Stony Brook.

Slides:



Advertisements
Similar presentations
S.Towers TerraFerMA TerraFerMA A Suite of Multivariate Analysis tools Sherry Towers SUNY-SB Version 1.0 has been released! useable by anyone with access.
Advertisements

A short Introduction to the Ideas of the Algorithm used in MVQCA
P3, P4, P5, P6.
Mixture Models and the EM Algorithm
Chapter 7 Introduction to Procedures. So far, all programs written in such way that all subtasks are integrated in one single large program. There is.
Object Specific Compressed Sensing by minimizing a weighted L2-norm A. Mahalanobis.
Statistical Techniques I EXST7005 Sample Size Calculation.
Dimension reduction (1)
Chapter 7 – Classification and Regression Trees
Chapter 7 – Classification and Regression Trees
Symphony The simple way to collect data and perform statistics!
Psychology 202b Advanced Psychological Statistics, II February 22, 2011.
8. Statistical tests 8.1 Hypotheses K. Desch – Statistical methods of data analysis SS10 Frequent problem: Decision making based on statistical information.
Overview of Non-Parametric Probability Density Estimation Methods Sherry Towers State University of New York at Stony Brook.
1 TerraFerMA A Suite of Multivariate Analysis Tools Sherry Towers SUNY-SB TerraFerMA is now ROOT-dependent only (ie; it is CLHEP-free) www-d0.fnal.gov/~smjt/multiv.html.
Linear Models Tony Dodd January 2007An Overview of State-of-the-Art Data Modelling Overview Linear models. Parameter estimation. Linear in the.
Jump to first page 1 System Design (Finalizing Design Specifications) Chapter 3d.
1 1 Slide Chapter 14: Goal Programming Goal programming is used to solve linear programs with multiple objectives, with each objective viewed as a "goal".
Starting Out with C++: Early Objects 5/e © 2006 Pearson Education. All Rights Reserved Starting Out with C++: Early Objects 5 th Edition Chapter 9 Searching.
Impact Evaluation Session VII Sampling and Power Jishnu Das November 2006.
Multiple Linear Regression A method for analyzing the effects of several predictor variables concurrently. - Simultaneously - Stepwise Minimizing the squared.
AMIR RACHUM CHAI RONEN FINAL PRESENTATION INDUSTRIAL SUPERVISOR: DR. ROEE ENGELBERG, LSI Optimized Caching Policies for Storage Systems.
Example 10.1 Experimenting with a New Pizza Style at the Pepperoni Pizza Restaurant Concepts in Hypothesis Testing.
CASE Tools And Their Effect On Software Quality Peter Geddis – pxg07u.
Accurate 3D Modeling of User Inputted Molecules Computer Systems Lab: Ben Parr Period 6.
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 9. Hypothesis Testing I: The Six Steps of Statistical Inference.
Basic Data Analysis for Quantitative Research
Inference for the mean vector. Univariate Inference Let x 1, x 2, …, x n denote a sample of n from the normal distribution with mean  and variance 
5.1 and 5.4 through 5.6 Various Things. Terminology Identifiers: a name representing a variable, class name, method name, etc. Operand: a named memory.
Taking Raw Data Towards Analysis 1 iCSC2015, Vince Croft, NIKHEF Exploring EDA, Clustering and Data Preprocessing Lecture 2 Taking Raw Data Towards Analysis.
Introduction to variable selection I Qi Yu. 2 Problems due to poor variable selection: Input dimension is too large; the curse of dimensionality problem.
Developing a Solution How to create the computer-based solution for a real-world problem. 1.
1 Characteristics of good problems Theoretical or practical significance Problem can be answered through the research process This is a good problem for.
Generalized Minimum Bias Models
TERRAIN SET09115 Intro to Graphics Programming. Breakdown  Basics  What do we mean by terrain?  How terrain rendering works  Generating terrain 
Chapter 9 – Classification and Regression Trees
Copyright © 2013, 2009, 2005 Pearson Education, Inc. 1 5 Systems and Matrices Copyright © 2013, 2009, 2005 Pearson Education, Inc.
Chapter 10 Strings, Searches, Sorts, and Modifications Midterm Review By Ben Razon AP Computer Science Period 3.
ICS 145B -- L. Bic1 Project: Main Memory Management Textbook: pages ICS 145B L. Bic.
Scientific Method for a controlled experiment. Observation Previous data Previous results Previous conclusions.
1. 2 Traditional Income Statement LO1: Prepare a contribution margin income statement.
STATISTICAL METHODS AND DATA MANAGEMENT TOOLS FOR OUTLIER DETECTION IN TRI DATA Dr. Nagaraj K. Neerchal and Justin Newcomer Department of Mathematics and.
MICE Analysis Code Makeover Chris Rogers 14th September 2004.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Variable Selection 1 Chapter 8 Variable Selection Terry Dielman Applied Regression Analysis:
Learning to Detect Faces A Large-Scale Application of Machine Learning (This material is not in the text: for further information see the paper by P.
G. Cowan RHUL Physics LR test to determine number of parameters page 1 Likelihood ratio test to determine best number of parameters ATLAS Statistics Forum.
Sample Size Determination
1 Methods of Experimental Particle Physics Alexei Safonov Lecture #24.
Class 5 Multiple Regression Models. We can readily imagine that there may be several factors that we can include in our model to explain test scores.
Advanced Computer Graphics Optimization Part 2 Spring 2002 Professor Brogan.
2005MEE Software Engineering Lecture 7 –Stacks, Queues.
Another way to select sample size The sample R-squared method.
1 CS362 High Level Program Design Tool Structure Chart © 2011, Regis University.
Lecture 1 INTRODUCTION TO ALGORITHMS Professor Uday Reddy
AP Statistics Section Statistical significance is valued because it points to an effect that is __________ to occur simply by chance. Carrying out.
Scientific Method In eight easy steps. State the problem as a question. 1. Begin with a solid problem. 2. Need to prove it true or false. 3. Choose a.
Linear Models Tony Dodd. 21 January 2008Mathematics for Data Modelling: Linear Models Overview Linear models. Parameter estimation. Linear in the parameters.
Complex Numbers and Equation Solving 1. Simple Equations 2. Compound Equations 3. Systems of Equations 4. Quadratic Equations 5. Determining Quadratic.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Accurate 3D Modeling of User Inputted Molecules Using a Nelder Mead Algorithm Ben Parr Period 6.
List Algorithms Taken from notes by Dr. Neil Moore & Dr. Debby Keen
INF385G: Topic Discussion Huang, S. C.
Business Should Adopt Agile Testing. Test Driven Development is an essential software development way that is made by an automated test program which.
ISTEP 2016 Final Project— Project on
List Algorithms Taken from notes by Dr. Neil Moore
LESSON 13 – INTRO TO ARRAYS
Arrays versus ArrayList
Computing and Statistical Data Analysis Stat 5: Multivariate Methods
TerraFerMA A Suite of Multivariate Analysis Tools
Adding variables. There is a difference between assessing the statistical significance of a variable acting alone and a variable being added to a model.
Presentation transcript:

Benefits of Minimizing the Number of Discriminators Used in a Multivariate Analysis Sherry Towers State University of New York at Stony Brook

S.Towers The case for fewer discriminators…  Using a large number of variables indiscriminantly can indicate a lack of forethought in the design and conceptualization of an analysis

S.Towers The case for fewer discriminators…  Also, each added variable makes it more difficult to determine if modelling of data is sound, and makes analysis more difficult to understand  And, each added variable adds statistical noise…This can degrade overall discrimination power!

S.Towers Optimising discrimination…  Maximise S/sqrt(S+B), or:

S.Towers The curse of too many variables: a simple example Signal 5D Gaussian  = (1,0,0,0,0)  = (1,1,1,1,1) Bkgnd 5D Gaussian  = (0,0,0,0,0)  = (1,1,1,1,1) Only difference between signal and background is in first dimension. Other four dimensions are `useless’ discriminators

S.Towers The curse of too many variables: a simple example

S.Towers The curse of too many variables: a simple example

S.Towers Optimising the number of variables (the easy way)… Use a `build-up’ process: 1) Start with a bunch of possible discriminators 2) Choose the one that gives maximal S/sqrt(S+B) 3) Add in others one-at-a- time, calculating S/sqrt(S+B) for each combo 4) choose the combo that maximises S/sqrt(S+B) (as long as S/sqrt(S+B) gets bigger!) 5) Repeat steps 3 and 4

S.Towers Optimising the number of variables (method II) 1) Start with a bunch of possible discriminators 2) Choose the one that gives maximal S/sqrt(S+B) 3) Add in others one-at-a- time, calculating S/sqrt(S+B) for each combo. Also add in, one-at-a-time N “dummy” variables. Mean and RMS of S/sqrt(S+B) with dummies forms basis for “null hypothesis” test.

S.Towers Optimising the number of variables (method II)… 4) choose the combo of real variables that maximises S/sqrt(S+B) (as long as S/sqrt(S+B) is X standard deviations better than S/sqrt(S+B) from previous iteration) 5) Repeat steps 3 and 4 until no further variables pass

S.Towers Implementing the procedure… Very easy to implement in analysis code! TerraFerMA, a program that interfaces to MLPfit, Jetnet, PDE methods, Fisher Discriminant, etc, etc, etc, includes this variable sorting method. User can quickly and easily sort potential discriminators.

S.Towers A “real-world” example… A Tevatron RunI analysis used a 7 variable NN to discriminate between signal and background. Were all 7 needed? Ran the signal and background n-tuples through the TerraFerMA interface to the sorting method…

S.Towers A “real-world” example…

S.Towers Another “real-world” example… A Tevatron “physics- object-ID” method uses 9 variables in the analysis. How many are actually needed?

S.Towers Another “real-world” example…

S.Towers Summary  Careful examination of discriminators used in a multivariate analysis is always a good idea!  Reduction of number of variables can simplify analysis considerably, and can even increase discrimination power!