9/9/2003PHYSTAT20031 Application of Adaptive Mixtures and Fractal Dimension Analysis  Adaptive Mixtures  KDELM  Fractal Dimension Sang-Joon Lee (Rice.

Slides:



Advertisements
Similar presentations
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Advertisements

Kriging.
Cost of surrogates In linear regression, the process of fitting involves solving a set of linear equations once. For moving least squares, we need to form.
Probability and Statistics Basic concepts II (from a physicist point of view) Benoit CLEMENT – Université J. Fourier / LPSC
USING DECISION SUPPORT SYSTEM TECHNIQUE FOR HYDROLOGICAL RISK ASSESSMENT CASE OF OUED MEKERRA IN THE WESTERN OF ALGERIA M. A. Yahiaoui Université de Bechar.
Cost of surrogates In linear regression, the process of fitting involves solving a set of linear equations once. For moving least squares, we need to.
Departments of Medicine and Biostatistics
Supervised Learning Recap
Chapter 10 Section 2 Hypothesis Tests for a Population Mean
CMPUT 466/551 Principal Source: CMU
2008 SIAM Conference on Imaging Science July 7, 2008 Jason A. Palmer
Automatic Identification of Bacterial Types using Statistical Image Modeling Sigal Trattner, Dr. Hayit Greenspan, Prof. Shimon Abboud Department of Biomedical.
Visual Recognition Tutorial
Kernel methods - overview
CF-3 Bank Hapoalim Jun-2001 Zvi Wiener Computational Finance.
Correlation 2 Computations, and the best fitting line.
Optimization of Signal Significance by Bagging Decision Trees Ilya Narsky, Caltech presented by Harrison Prosper.
Clustering.
Power Laws Otherwise known as any semi- straight line on a log-log plot.
Ranking individuals by group comparison New exponentiel model Two methods for calculations  Regularized least square  Maximum likelihood.
Visual Recognition Tutorial
Calibration & Curve Fitting
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
1 Doing Statistics for Business Doing Statistics for Business Data, Inference, and Decision Making Marilyn K. Pelosi Theresa M. Sandifer Chapter 11 Regression.
LINEAR REGRESSION Introduction Section 0 Lecture 1 Slide 1 Lecture 5 Slide 1 INTRODUCTION TO Modern Physics PHYX 2710 Fall 2004 Intermediate 3870 Fall.
July 11, 2001Daniel Whiteson Support Vector Machines: Get more Higgs out of your data Daniel Whiteson UC Berkeley.
(a.k.a: The statistical bare minimum I should take along from STAT 101)
Random Sampling, Point Estimation and Maximum Likelihood.
Models and Algorithms for Complex Networks Power laws and generative processes.
Introduction to Linear Regression
Learning Theory Reza Shadmehr logistic regression, iterative re-weighted least squares.
CS433: Modeling and Simulation Dr. Anis Koubâa Al-Imam Mohammad bin Saud University 15 October 2010 Lecture 05: Statistical Analysis Tools.
Comparison of Bayesian Neural Networks with TMVA classifiers Richa Sharma, Vipin Bhatnagar Panjab University, Chandigarh India-CMS March, 2009 Meeting,
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Phase Congruency Detects Corners and Edges Peter Kovesi School of Computer Science & Software Engineering The University of Western Australia.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
Spatial Analysis & Geostatistics Methods of Interpolation Linear interpolation using an equation to compute z at any point on a triangle.
Overview of the final test for CSC Overview PART A: 7 easy questions –You should answer 5 of them. If you answer more we will select 5 at random.
Spatial Statistics in Ecology: Point Pattern Analysis Lecture Two.
Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Supervised Learning Resources: AG: Conditional Maximum Likelihood DP:
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Speech Lab, ECE, State University of New York at Binghamton  Classification accuracies of neural network (left) and MXL (right) classifiers with various.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 12: Advanced Discriminant Analysis Objectives:
Machine Learning 5. Parametric Methods.
Logistic Regression Saed Sayad 1www.ismartsoft.com.
Likelihood-tuned Density Estimator Yeojin Chung and Bruce G. Lindsay The Pennsylvania State University Nonparametric Maximum Likelihood Estimator Nonparametric.
Logistic Regression: Regression with a Binary Dependent Variable.
Chapter 13 Simple Linear Regression
Chapter 3: Maximum-Likelihood Parameter Estimation
Deep Feedforward Networks
Correlation, Bivariate Regression, and Multiple Regression
Classification of unlabeled data:
Top Tagging at CLIC 1.4TeV Using Jet Substructure
Multi-dimensional likelihood
Machine Learning Basics
Digital Processing Techniques for Transmission Electron Microscope Images of Combustion-generated Soot Bing Hu and Jiangang Lu Department of Civil and.
Essential Statistics (a.k.a: The statistical bare minimum I should take along from STAT 101)
Igor V. Cadez, Padhraic Smyth, Geoff J. Mclachlan, Christine and E
Search for point-like source in ANTARES
Mathematical Foundations of BME Reza Shadmehr
EE513 Audio Signals and Systems
Generally Discriminant Analysis
Parametric Methods Berlin Chen, 2005 References:
Introduction to Sensor Interpretation
Linear Discrimination
Introduction to Sensor Interpretation
Probabilistic Surrogate Models
Presentation transcript:

9/9/2003PHYSTAT20031 Application of Adaptive Mixtures and Fractal Dimension Analysis  Adaptive Mixtures  KDELM  Fractal Dimension Sang-Joon Lee (Rice University)

9/9/2003PHYSTAT20032 Adaptive Mixtures *  Accepts strengths of Kernel Estimation and Finite Mixtures and discards their weaknesses. Kernel Estimation:  Robust  Needs intensive CPU power Finite Mixtures:  Advantage in the computing time  Strong assumptions on the underlying density & initial state  Algorithm determines the number of kernels. For a new data point x i, a kernel is added only when “Mahalanobis” distance is greater than a pre-defined threshold T c. * Priebe, Carey (1994), “Adaptive Mixtures”, JASA, 89,

9/9/2003PHYSTAT20033 Update/Creation in Adaptive Mixtures  Update Rule  Create Rule

9/9/2003PHYSTAT20034 Performance of Adaptive Mixtures

9/9/2003PHYSTAT20035 Conclusion in Adaptive Mixtures  For the 1-D Gaussian example, Adaptive Mixtures give “over-fit”.  Poor consistency in the 1-D exponential example.  Need an algorithm for better iteration preventing “over-fit”

9/9/2003PHYSTAT20036 KDELM (Kernel Density Estimation with Likelihood Maximization)  Add a kernel only when it results in a better fit The goodness of fit is estimated by comparing the minus log likelihood (MINUIT for minimization ): where and. is a normal probability density function with mean and standard deviation.

9/9/2003PHYSTAT20037 Performance of KDELM

9/9/2003PHYSTAT20038 Performance of KDELM (2)  Discrimination of Tau signal events from generic QCD backgrounds  Discriminant Function:  At efficiency 50%, S/B = 26.32

9/9/2003PHYSTAT20039 Conclusions in KDELM  KDELM is robust.  KDELM is fast in computation.  KDELM gives a good background rejection in the Tau lepton identification.  May need a new algorithm for a better fit to an extreme distribution such as the 1-D exponential.

9/9/2003PHYSTAT Fractal Dimension  Fractal dimension, also called capacity dimension, as defined by Mathworld is n(e)=exp(-D) where n(e) is the minimum number of open sets of diameter e to cover the set.  Fractal dimension quantifies the increase in structural definition that magnification yields.

9/9/2003PHYSTAT Mandelbrot's Example  Consider measuring the length of a coastline, (an example given by Mandelbrot). Using a meterstick, you might get a good estimate of length, yet using a centimeter stick (and with more time of course) you can get an even better measurement.  Fractal dimension quantifies this increase in detail that occurs by magnifying or in this case, by switching rulers.

9/9/2003PHYSTAT Calculation of Fractal Dimension  There are several techniques, yet all involve estimating the dimension from the slope of a log-log power law plot. (from power law relationship on earlier slide)  Box Counting Technique  Radial Covering Method  Fourier Estimator

9/9/2003PHYSTAT Box Counting Technique *  Grids (boxes) of varying lengths are placed over the data set  A count of how many boxes contain data points is made for the power law plot  Dimension derived from least squares fit of slope *The specific implementation used was coded by John Sarraille and Peter DiFalco of CSU, who based their algorithm on "A Fast Algorithm To Determine Fractal Dimensions By Box Counting", by Liebovitch and Toth.

9/9/2003PHYSTAT Box Counting Technique (2)  Goal: to find the combination of variables would help create a clearer distinction between signal and background  ttbar MC data composed of 13 different variables was used.  There were 96 background events and 158 signal events.  Fractal dimension was calculated for pairs of variables for varying combinations of signal and background events to see if the fractal dimension value really helped indicating signal or background.

9/9/2003PHYSTAT Results  Fractal dimension was calculated for the full signal sample, the full background sample and a sample composed of 96 events from each  Of the 78 distinct combinations, 37 appear to show some significance in indicating signal or background  These 37 combinations show fractal dimension values which, in the mixed case, interpolate between the pure sample values

9/9/2003PHYSTAT Results (2)  Fractal dimension was also calculated for mixtures composed of 96 events of varying proportions of signal and background (0%-100%,25%-75%,50%- 50%,75%-25%,100%-0%)  15 of the combinations continue to interpolate across the mixes  Many others appear to reach a maximum or minimum fractal dimension value at the mixture, yet those which share more signal (75-25,100-0) have similar fractal dimension values, whereas those with more background (25-75,0-100) also share similar fractal dimension values

9/9/2003PHYSTAT Difference of Fractal Dimension  Signal-background Fractal dimension differences in 2-D 13 variables => Possible number of variable-pairs =( )/2 = 78  0.4 difference is significant comparing to typical 2-D fractal dimension 1.0.

9/9/2003PHYSTAT Conclusions in Fractal Dimension  Fractal dimension appears to be useful for some combinations of variables as a discriminating feature.  The next step would be to take those pairs of variables which show promise and use them as features for one of the classifier techniques (kernel density estimation, decision tree, etc.).

9/9/2003PHYSTAT Acknowledgements I would like to thank to: Professor Paul Padley (Rice University) Professor David Scott (Rice University) Professor Bruce Knuteson (MIT) Bradley Chase (Rice University)