Download presentation
Presentation is loading. Please wait.
Published byOphelia Johnston Modified over 6 years ago
1
Multivariate Analysis Past, Present and Future
Harrison B. Prosper Florida State University PHYSTAT 2003 10 September 2003 Multivariate Analysis PHYSTAT Harrison B. Prosper Multivariate Analysis Harrison B. Prosper Durham, UK 2002
2
Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper
Outline Introduction Historical Note Current Practice Issues Summary Multivariate Analysis PHYSTAT Harrison B. Prosper Multivariate Analysis Harrison B. Prosper Durham, UK 2002
3
Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper
Introduction Data are invariably multivariate Particle physics (h, f, E, f) Astrophysics (θ, f, E, t) Multivariate Analysis PHYSTAT Harrison B. Prosper
4
Introduction – II A Textbook Example
Objects Jet 1 (b) 3 Jet 2 3 Jet 3 3 Jet 4 (b) 3 Positron 3 Neutrino 2 17 Multivariate Analysis PHYSTAT Harrison B. Prosper
5
Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper
Introduction – III Astrophysics/Particle physics: Similarities Events Interesting events occur at random Poisson processes Backgrounds are important Experimental response functions Huge datasets Multivariate Analysis PHYSTAT Harrison B. Prosper
6
Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper
Introduction – IV Differences In particle physics we control when events occur and under what conditions We have detailed predictions of the relative frequency of various outcomes Multivariate Analysis PHYSTAT Harrison B. Prosper
7
Introduction – V All we do is Count!
Our experiments are ideal Bernoulli trials At Fermilab, each collision, that is, trial, is conducted the same way every 400ns de Finetti’s analysis of exchangeable trials is an accurate model of what we do Time → Multivariate Analysis PHYSTAT Harrison B. Prosper
8
Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper
Introduction – VI Typical analysis tasks Data Compression Clustering and cluster characterization Classification/Discrimination Estimation Model selection/Hypothesis testing Optimization Multivariate Analysis PHYSTAT Harrison B. Prosper
9
Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper
Historical Note Karl Pearson (1857 – 1936) R.A. Fisher (1890 – 1962) P.C. Mahalanobis (1893 – 1972) Multivariate Analysis PHYSTAT Harrison B. Prosper
10
Historical Note – Iris Data
Iris Versicolor Iris Sotosa R.A. Fisher, The Use of Multiple Measurements in Taxonomic Problems, Annals of Eugenics, v. 7, p (1936) Multivariate Analysis PHYSTAT Harrison B. Prosper
11
Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper
Iris Data Variables X1 Sepal length X2 Sepal width X3 Petal length X4 Petal width “What linear function of the four measurements will maximize the ratio of the difference between the specific means to the standard deviations within species?” R.A. Fisher Multivariate Analysis PHYSTAT Harrison B. Prosper
12
Fisher Linear Discriminant (1936)
Solution: Which is the same, within a constant, as Multivariate Analysis PHYSTAT Harrison B. Prosper
13
Current Practice in Particle Physics
Reducing number of variables Principal Component Analysis (PCA) Discrimination/Classification Fisher Linear Discriminant (FLD) Random Grid Search (RGS) Feedforward Neural Network (FNN) Kernel Density Estimation (KDE) Multivariate Analysis PHYSTAT Harrison B. Prosper
14
Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper
Current Practice – II Parameter Estimation Maximum Likelihood (ML) Bayesian (KDE and analytical methods) e.g., see talk by Florencia Canelli (12A) Weighting Usually 0, 1, referred to as “cuts” Sometimes use the R. Barlow method Multivariate Analysis PHYSTAT Harrison B. Prosper
15
Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper
Cuts (0, 1 weights) S = B = Points that lie below the cuts are “cut out” 1 We refer to (x0, y0) as a cut-point Multivariate Analysis PHYSTAT Harrison B. Prosper Multivariate Analysis Harrison B. Prosper Durham, UK 2002
16
Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper
Grid Search S = B = Apply cuts at each grid point compute some measure of their effectiveness and choose most effective cuts Curse of dimensionality: number of cut-points ~ NbinNdim Multivariate Analysis PHYSTAT Harrison B. Prosper Multivariate Analysis Harrison B. Prosper Durham, UK 2002
17
Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper
Random Grid Search Take each point of the signal class as a cut-point Signal fraction Background fraction 1 y n = # events in sample k = # events after cuts fraction = n/k x H.B.P. et al, Proceedings, CHEP 1995 Multivariate Analysis PHYSTAT Harrison B. Prosper Multivariate Analysis Harrison B. Prosper Durham, UK 2002
18
Example: DØ Top Discovery (1995)
Multivariate Analysis PHYSTAT Harrison B. Prosper
19
Optimal Discrimination
r(x,y) = constant defines the optimal decision boundary Bayes Discriminant Multivariate Analysis PHYSTAT Harrison B. Prosper
20
FeedForward Neural Networks
Applications Discrimination Parameter estimation Function and density estimation Basic Idea Encode mapping (Kolmogorov, 1950s). using a set of 1-D functions. Multivariate Analysis PHYSTAT Harrison B. Prosper
21
Example: DØ Search for LeptoQuarks
LQ g q LQ Multivariate Analysis PHYSTAT Harrison B. Prosper
22
Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper
Issues Method choice Life is short and data finite; so how should one choose a method? Model complexity How to reduce dimensionality of data, while minimizing loss of “information”? How many model parameters? How should one avoid over-fitting? Multivariate Analysis PHYSTAT Harrison B. Prosper
23
Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper
Issues – I I Model robustness Is a cut on a multivariate discriminant necessarily more sensitive to modeling errors than a cut on each of its input variables? What is a practical, but useful, way to assess sensitivity to modeling errors and robustness with respect to assumptions? Multivariate Analysis PHYSTAT Harrison B. Prosper
24
Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper
Issues - III Accuracy of predictions How should one place “error bars” on multivariate-based results? Is a Bayesian approach useful? Goodness of fit How can this be done in multiple dimensions? Multivariate Analysis PHYSTAT Harrison B. Prosper
25
Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper
Summary After ~ 80 years of effort we have many powerful methods of analysis A few of which are now used routinely in physics analyses The most pressing need is to understand some issues better so that when the data tsunami strikes we can respond sensibly Multivariate Analysis PHYSTAT Harrison B. Prosper
26
FNN – Probabilistic Interpretation
Minimize the empirical risk function with respect to w Solution (for large N) If t(x) = kd[1-I(x)], where I(x) = 1 if x is of class k, 0 otherwise D.W. Ruck et al., IEEE Trans. Neural Networks 1(4), (1990) E.A. Wan, IEEE Trans. Neural Networks 1(4), (1990) Multivariate Analysis PHYSTAT Harrison B. Prosper Multivariate Analysis Harrison B. Prosper Durham, UK 2002
27
Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper
Self Organizing Map Basic Idea (Kohonen, 1988) Map each of K feature vectors X = (x1,..,xN)T into one of M regions of interest defined by the vector wm so that all X mapped to a given wm are closer to it than to all remaining wm. Basically, perform a coarse-graining of the feature space. Multivariate Analysis PHYSTAT Harrison B. Prosper
28
Support Vector Machines
Basic Idea Data that are non-separable in N-dimensions have a higher chance of being separable if mapped into a space of higher dimension Use a linear discriminant to partition the high dimensional feature space. Multivariate Analysis PHYSTAT Harrison B. Prosper
29
Independent Component Analysis
Basic Idea Assume X = (x1,..,xN)T is a linear sum X = AS of independent sources S = (s1,..,sN)T. Both A, the mixing matrix, and S are unknown. Find a de-mixing matrix T such that the components of U = TX are statistically independent Multivariate Analysis PHYSTAT Harrison B. Prosper
30
Multivariate Analysis PHYSTAT 2003 Harrison B. Prosper
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.