Multi-dimensional likelihood

Slides:



Advertisements
Similar presentations
S.Towers TerraFerMA TerraFerMA A Suite of Multivariate Analysis tools Sherry Towers SUNY-SB Version 1.0 has been released! useable by anyone with access.
Advertisements

Probability and Statistics Basic concepts II (from a physicist point of view) Benoit CLEMENT – Université J. Fourier / LPSC
CHAPTER 21 Inferential Statistical Analysis. Understanding probability The idea of probability is central to inferential statistics. It means the chance.
1 Methods of Experimental Particle Physics Alexei Safonov Lecture #22.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 10: The Bayesian way to fit models Geoffrey Hinton.
Ai in game programming it university of copenhagen Statistical Learning Methods Marco Loog.
Parameter Estimation: Maximum Likelihood Estimation Chapter 3 (Duda et al.) – Sections CS479/679 Pattern Recognition Dr. George Bebis.
Searching for Single Top Using Decision Trees G. Watts (UW) For the DØ Collaboration 5/13/2005 – APSNW Particles I.
8. Statistical tests 8.1 Hypotheses K. Desch – Statistical methods of data analysis SS10 Frequent problem: Decision making based on statistical information.
Chapter 9 Hypothesis Testing Testing Hypothesis about µ, when the s.t of population is known.
Maximum likelihood Conditional distribution and likelihood Maximum likelihood estimations Information in the data and likelihood Observed and Fisher’s.
AGC DSP AGC DSP Professor A G Constantinides© Estimation Theory We seek to determine from a set of data, a set of parameters such that their values would.
Update on NC/CC separation At the previous phone meeting I presented a method to separate NC/CC using simple cuts on reconstructed quantities available.
Maximum likelihood (ML) and likelihood ratio (LR) test
A) Transformation method (for continuous distributions) U(0,1) : uniform distribution f(x) : arbitrary distribution f(x) dx = U(0,1)(u) du When inverse.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
July 11, 2001Daniel Whiteson Support Vector Machines: Get more Higgs out of your data Daniel Whiteson UC Berkeley.
880.P20 Winter 2006 Richard Kass 1 Confidence Intervals and Upper Limits Confidence intervals (CI) are related to confidence limits (CL). To calculate.
1 Lesson 3: Choosing from distributions Theory: LLN and Central Limit Theorem Theory: LLN and Central Limit Theorem Choosing from distributions Choosing.
Monte Carlo Methods1 T Special Course In Information Science II Tomas Ukkonen
1 Methods of Experimental Particle Physics Alexei Safonov Lecture #23.
Confidence intervals and hypothesis testing Petter Mostad
Statistics In HEP Helge VossHadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP 1 How do we understand/interpret our measurements.
9/9/2003PHYSTAT20031 Application of Adaptive Mixtures and Fractal Dimension Analysis  Adaptive Mixtures  KDELM  Fractal Dimension Sang-Joon Lee (Rice.
Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate.
The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.
Measurements of Top Quark Properties at Run II of the Tevatron Erich W.Varnes University of Arizona for the CDF and DØ Collaborations International Workshop.
Chapter1: Introduction Chapter2: Overview of Supervised Learning
פרקים נבחרים בפיסיקת החלקיקים אבנר סופר אביב
Lecture 8 Source detection NASSP Masters 5003S - Computational Astronomy
2005 Unbinned Point Source Analysis Update Jim Braun IceCube Fall 2006 Collaboration Meeting.
Various Rupak Mahapatra (for Angela, Joel, Mike & Jeff) Timing Cuts.
1 Introduction to Statistics − Day 2 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Brief catalogue of probability densities.
Machine Learning CUNY Graduate Center Lecture 6: Linear Regression II.
QCD Background Estimation From Data Rob Duxfield, Dan Tovey University of Sheffield.
G. Cowan Lectures on Statistical Data Analysis Lecture 6 page 1 Statistical Data Analysis: Lecture 6 1Probability, Bayes’ theorem 2Random variables and.
In Bayesian theory, a test statistics can be defined by taking the ratio of the Bayes factors for the two hypotheses: The ratio measures the probability.
Helge VossAdvanced Scientific Computing Workshop ETH Multivariate Methods of data analysis Helge Voss Advanced Scientific Computing Workshop ETH.
W.Murray PPD 1 Bill Murray RAL, CCLRC RAL 5 th February 2007 Particle Physics Particle Physics Why do we want all those computers?Why.
W.Murray PPD 1 Machine Learning Bill Murray RAL, CCLRC home.cern.ch/~murray IoP Half Day Meeting on Statistics in High Energy Physics.
CS479/679 Pattern Recognition Dr. George Bebis
Statistical Significance & Its Systematic Uncertainties
Statistical Estimation
TNSmooth: Root Multi-dimensional PDFs
Inclusive Tag-Side Vertex Reconstruction in Partially Reconstructed B decays -A Progress Report - 11/09/2011 TDBC-BRECO.
Chapter 3: Maximum-Likelihood Parameter Estimation
Oliver Schulte Machine Learning 726
Deep Feedforward Networks
12. Principles of Parameter Estimation
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
Multivariate Methods of
SUSY Particle Mass Measurement with the Contransverse Mass Dan Tovey, University of Sheffield 1.
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
ISTEP 2016 Final Project— Project on
Unfolding Problem: A Machine Learning Approach
Discrete Event Simulation - 4
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Computing and Statistical Data Analysis Stat 5: Multivariate Methods
Principles of the Global Positioning System Lecture 11
Analytics – Statistical Approaches
Model generalization Brief summary of methods
Parametric Methods Berlin Chen, 2005 References:
Learning From Observed Data
Top mass measurements at the Tevatron and the standard model fits
Dilepton Mass. Progress report.
Unfolding with system identification
12. Principles of Parameter Estimation
Using Clustering to Make Prediction Intervals For Neural Networks
Measurement of the Single Top Production Cross Section at CDF
Presentation transcript:

Multi-dimensional likelihood Bill Murray CCLRC RAL w.murray@rl.ac.uk home.cern.ch/~murray ATLAS UK Higgs meeting RAL 17 June, 2006 Likelihood An n-D implementation Application to b-tag Statistics in HEP, IoP Half Day Meeting, 16 November 2005, Manchester

What is likelihood? Likelihood is the product, over all observed events, of the probability of observing those events. Maximising the likelihood by varying model parameters is extremely powerful fit technique Separation of events into signal and background according to ratio of likelihood of the hypotheses is optimal technique

The right answer Consider separating a dataset into 2 classes Call them background and signal Background Signal n-Dimensional space of observables. e.g. ETmiss, num. leptons A simple cut is not optimal

The right answer II What is optimal? Background Signal Maybe something like this might be...

The right answer III For an given efficiency, we want to minimize background Sort space by s/b ratio in small box Accept all areas with s/b above some threshold This is the Likelihood Ratio

This is the right answer For optimal separation, order events by Ls/Lb Identical to ordering by Ls+b/Lb More powerful is to fit in the 1-D space defined above. This is the right answer

Determination of s, b densities We may know matrix elements Not for e.g. a b-tag But anyway there are detector effects Usually taken from simulation

Using MC to calculate density Brute force: Divide our n-D space into hypercubes with m divisions of each axis mn elements, need 100 mn events for 10% estimate. e.g. 1,000,000,000 for 7 dimensions and 10 bins in each This assumed a uniform density – actually need far more The purpose was to separate different distributions

Better likelihood estimation Clever binning Starts to lead to tree techniques Kernel density estimators Size of kernel grows with dimensions Edges are an issue Ignore correlations in variables Very commonly done ‘I used likelihood’ Pretend measured=true, correct later Linked to OO techniques, bias correction Problem for b-tag: There is no true

Alternative approaches Cloud of cuts Use a random cut generator to optimise cuts Neural nets Well known, good for high-dimensions Support vector machines Computationally easier than kernel Decision trees Boosted or not?

Kernel Likelihoods Directly estimate PDF of distributions based upon training sample events. Some kernel, usually Gaussian smears the sample – increases widths Fully optimal iff infinite MC statistics Metric of kernel, (size, aspect ratio) hard to optimize Watch kernel size dependence on stats. Kernel size must grow with dimensions; Lose precision if unnecessary ones added Big storage/computational requirements

Conclusions The likelihood ratio underpins everything Consider using it Neural nets are a good tool too of course Boosted trees very interesting? Cost of computing becoming important Optimal methods not necessarily optimal But the data is very expensive too. A general purpose density estimator would be very useful See next talk!