Signal/Background Discrimination Harrison B. Prosper SAMSI, March 20061 Signal/Background Discrimination in Particle Physics Harrison B. Prosper Florida.

Slides:



Advertisements
Similar presentations
Basic Steps 1.Compute the x and y image derivatives 2.Classify each derivative as being caused by either shading or a reflectance change 3.Set derivatives.
Advertisements

November 20GEOINFO 20061/25 A Robust Strategy for Handling Linear Features in Topologically Consistent Polyline Simplification Department of Computer Engineering.
Change-Point Detection Techniques for Piecewise Locally Stationary Time Series Michael Last National Institute of Statistical Sciences Talk for Midyear.
User meeting June Monthly and Seasonal forecasts Laura Ferranti and the Seasonal Forecast Section User meeting June 2006.
R-parity conserving SUSY studies with jets and E T Miss Alexander Richards, UCL 1.
5-1 Chapter 5 Theory & Problems of Probability & Statistics Murray R. Spiegel Sampling Theory.
1 23 maart 2006 Surface construction techniques for volumetric objects How to maintain convex and concave features? Eddy Loke and Erik Jansen.
Insert Date HereSlide 1 Using Derivative and Integral Information in the Statistical Analysis of Computer Models Gemma Stephenson March 2007.
© Ken Meter, Ken Meter Crossroads Resource Center (Minneapolis) Finding Food in Sarasota County Photo: Team Santa Rosa.
EAR-BASED AMENDMENT FORUM. September PROCESS AND PROCEDURES From Preparation of an Amendment to a Finding of “In Compliance”
EAR-BASED AMENDMENT FORUM. September Sponsored by the Pinellas Planning Council September 12 & 13, 2006 Harborview Center Clearwater.
THHGCS07B Coordinate Marketing Activities Lecture 2.
CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu Lecture 27 – Overview of probability concepts 1.
FAA AAR-410 December 5, FAA Airport Pavement Roughness R&D u Gordon Hayhoe, AAR-410, FAA William J. Hughes Technical Center, Atlantic City, New Jersey,
Data accreditation standard for the IM&T DES12 Sept The IM&T DES Using the tools that support e- audit John Williams & James Barrett.
1 ECE 776 Project Information-theoretic Approaches for Sensor Selection and Placement in Sensor Networks for Target Localization and Tracking Renita Machado.
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
Efficient Cosmological Parameter Estimation with Hamiltonian Monte Carlo Amir Hajian Amir Hajian Cosmo06 – September 25, 2006 Astro-ph/
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 10: The Bayesian way to fit models Geoffrey Hinton.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.
Neural Networks II CMPUT 466/551 Nilanjan Ray. Outline Radial basis function network Bayesian neural network.
The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press IMPRS Summer School 2009, Prof. William H. Press 1 4th IMPRS Astronomy.
Statistical Tools PhyStat Workshop 2004 Harrison B. Prosper1 Statistical Tools A Few Comments Harrison B. Prosper Florida State University PHYSTAT Workshop.
Optimization of Signal Significance by Bagging Decision Trees Ilya Narsky, Caltech presented by Harrison Prosper.
Bayesian Neural Networks Pushpa Bhat Fermilab Harrison Prosper Florida State University.
End of Chapter 8 Neil Weisenfeld March 28, 2005.
Multivariate Analysis A Unified Perspective
Bayesian Framework EE 645 ZHAO XIN. A Brief Introduction to Bayesian Framework The Bayesian Philosophy Bayesian Neural Network Some Discussion on Priors.
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Introduction to Monte Carlo Methods D.J.C. Mackay.
July 11, 2001Daniel Whiteson Support Vector Machines: Get more Higgs out of your data Daniel Whiteson UC Berkeley.
Stochastic Algorithms Some of the fastest known algorithms for certain tasks rely on chance Stochastic/Randomized Algorithms Two common variations – Monte.
Harrison B. Prosper Workshop on Top Physics, Grenoble Bayesian Statistics in Analysis Harrison B. Prosper Florida State University Workshop on Top Physics:
Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:
G. Cowan Statistical Methods in Particle Physics1 Statistical Methods in Particle Physics Day 3: Multivariate Methods (II) 清华大学高能物理研究中心 2010 年 4 月 12—16.
Comparison of Bayesian Neural Networks with TMVA classifiers Richa Sharma, Vipin Bhatnagar Panjab University, Chandigarh India-CMS March, 2009 Meeting,
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
Learning Theory Reza Shadmehr Linear and quadratic decision boundaries Kernel estimates of density Missing data.
Practical Statistics for Particle Physicists Lecture 3 Harrison B. Prosper Florida State University European School of High-Energy Physics Anjou, France.
Virtual Vector Machine for Bayesian Online Classification Yuan (Alan) Qi CS & Statistics Purdue June, 2009 Joint work with T.P. Minka and R. Xiang.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
Lesson 4: Computer method overview
October 19, 2000ACAT 2000, Fermilab, Suman B. Beri Top Quark Mass Measurements Using Neural Networks Suman B. Beri, Rajwant Kaur Panjab University, India.
Markov Chain Monte Carlo for LDA C. Andrieu, N. D. Freitas, and A. Doucet, An Introduction to MCMC for Machine Learning, R. M. Neal, Probabilistic.
ECE 8443 – Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem Proof EM Example – Missing Data Intro to Hidden Markov Models.
Experience from Searches at the Tevatron Harrison B. Prosper Florida State University 18 January, 2011 PHYSTAT 2011 CERN.
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
The Unscented Particle Filter 2000/09/29 이 시은. Introduction Filtering –estimate the states(parameters or hidden variable) as a set of observations becomes.
1 Introduction to Statistics − Day 2 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Brief catalogue of probability densities.
G. Cowan Lectures on Statistical Data Analysis Lecture 6 page 1 Statistical Data Analysis: Lecture 6 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1 Statistical Data Analysis: Lecture 5 1Probability, Bayes’ theorem 2Random variables and.
Multivariate Methods in Particle Physics Today and Tomorrow Harrison B. Prosper Florida State University 5 November, 2008 ACAT 08, Erice, Sicily.
From Small-N to Large Harrison B. Prosper SCMA IV, June Bayesian Methods in Particle Physics: From Small-N to Large Harrison B. Prosper Florida State.
Statistical Tools In Dzero PHYSTAT Workshop 2005 Harrison B. Prosper1 Statistical Software In DØ The Good, the Bad and the Non-Existent Harrison B. Prosper.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
CS498-EA Reasoning in AI Lecture #19 Professor: Eyal Amir Fall Semester 2011.
Generalization Performance of Exchange Monte Carlo Method for Normal Mixture Models Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.
Bayesian Within The Gates A View From Particle Physics
First Evidence for Electroweak Single Top Quark Production
Multivariate Analysis Past, Present and Future
Overview G. Jogesh Babu.
Bayesian Models in Machine Learning
Multidimensional Integration Part I
Computing and Statistical Data Analysis Stat 5: Multivariate Methods
Junghoo “John” Cho UCLA
Parametric Methods Berlin Chen, 2005 References:
Mathematical Foundations of BME
Presentation transcript:

Signal/Background Discrimination Harrison B. Prosper SAMSI, March Signal/Background Discrimination in Particle Physics Harrison B. Prosper Florida State University SAMSI 8 March, 2006

Signal/Background Discrimination Harrison B. Prosper SAMSI, March Outline Particle Physics Data Signal/Background Discrimination Summary

Signal/Background Discrimination Harrison B. Prosper SAMSI, March Particle Physics Data proton + anti-proton ->positron (e + ) neutrino ( ) Jet1 Jet2 Jet3 Jet4 This event is described by (at least) x 4 = 17 measured quantities.

Signal/Background Discrimination Harrison B. Prosper SAMSI, March Particle Physics Data H 0 Standard Model H 1 Model of the Week

Signal/Background Discrimination Harrison B. Prosper SAMSI, March Signal/Background Discrimination To minimize misclassification probability, compute p(S|x) = p(x|S) p(S) / [p(x|S) p(S) + p(x|B) p(B)] Every signal/background discrimination method is ultimately an algorithm to approximate this function, or a mapping thereof. p(s) / p(b) is the prior signal to background ratio, that is, it is S/B before applying a cut to p(S|x).

Signal/Background Discrimination Harrison B. Prosper SAMSI, March Given D D = x, y x = {x 1,…x N },y = {y 1,…y N } of N training examples (events) Infer A discriminant function f(x, w), with parameters w www p(w|x, y) = p(x, y|w) p(w) / p(x, y) ww = p(y|x, w) p(x|w) p(w) / p(y|x) p(x) w = p(y|x, w) p(w) / p(y|x) assuming p(x|w) -> p(x) Signal/Background Discrimination

Signal/Background Discrimination Harrison B. Prosper SAMSI, March A typical likelihood for classification: www p(y|x, w) = i f(x i, w) y [1 – f(x i, w)] 1-y where y = 0 for background events y = 1 for signal events ww If f(x, w) flexible enough, then maximizing p(y|x, w) with respect to w yields f = p(S|x), asymptotically. Signal/Background Discrimination

Signal/Background Discrimination Harrison B. Prosper SAMSI, March However, in a Bayesian calculation it is more natural to average ww y(x) = f(x, w) p(w|D) dw Questions: w 1. Do suitably flexible functions f(x, w) exist? 2. Is there a feasible way to do the integral? Signal/Background Discrimination

Signal/Background Discrimination Harrison B. Prosper SAMSI, March Answer 1: Yes! Hilberts 13 th problem: Prove a special case of the conjecture: The following is impossible, in general, f(x 1,…,x n ) = F( g 1 (x 1 ),…, g n (x n ) ) In 1957, Kolmogorov proved the contrary: A function f:R n -> R can be represented as follows f(x 1,..,x n ) = i=1 2n+1 Q i ( j=1 n G ij (x j ) ) where G ij are independent of f(.)

Signal/Background Discrimination Harrison B. Prosper SAMSI, March Kolmogorov Functions n(x,w) x1x1 x2x2 u, a v, b A neural network is an example of a Kolmogorov function, that is, a function capable of approximating arbitrary mappings f:R n -> R weights The parameters w = (u, a, v, b) are called weights

Signal/Background Discrimination Harrison B. Prosper SAMSI, March Answer 2: Yes! Computational Method Generate a Markov chain (MC) of N points {w}, whose stationary density is p(w|D), and average over the last M points. Map problem into that of particle moving in a spatially-varying potential and use methods of statistical mechanics to generate states (p, w) with probability ~ exp(- H), where H is the Hamiltonian H = log p(w|D) + p 2, with momentum p.

Signal/Background Discrimination Harrison B. Prosper SAMSI, March Hybrid Markov Chain Monte Carlo Computational Method… For a fixed H traverse space (p, w) using Hamiltons equations, which guarantees that all points consistent with H will be visited with equal probability ~ exp(- H). To allow exploration of states with differing values of H one introduces, periodically, random changes to the momentum p. Software Flexible Bayesian Modeling by Radford Neal

Example 1

Signal/Background Discrimination Harrison B. Prosper SAMSI, March Example 1: 1-D Signal p+pbar -> t q b Background p+pbar -> W b b NN Model Class (1, 15, 1) MCMC 500 tqb + Wbb events Use last 20 points in a chain of 10,000, x tqb skipping every 20 th Wbb

Signal/Background Discrimination Harrison B. Prosper SAMSI, March Example 1: 1-D x Dots p(S|x) = H S /(H S +H B ) H S, H B, 1-D histograms Curves Individual NNs w k n(x, w k ) Black curve

Example 2

Signal/Background Discrimination Harrison B. Prosper SAMSI, March Example 2: 14-D (Finding Susy!) Transverse momentum spectra Signal: black curve Signal/Noise1/25,000

Signal/Background Discrimination Harrison B. Prosper SAMSI, March Example 2: 14-D (Finding Susy!) Missing transverse momentum spectrum (caused by escape of neutrinos and Susy particles) Measured quantities: 4 x (E T,, ) + (E T, ) 14 = 14

Signal/Background Discrimination Harrison B. Prosper SAMSI, March LikelihoodPrior Example 2: 14-D (Finding Susy!) Signal 250 p+pbar -> gluino, gluino (Susy) events Background 250 p+pbar -> top, anti-top events NN Model Class (14, 40, 1)(w є 641-D parameter space!) MCMC Use last 100 networks in a Markov chain of 10,000, skipping every 20.

Signal/Background Discrimination Harrison B. Prosper SAMSI, March Results Network distribution beyond n(x) > 0.9 Assuming L = 10 fb -1 CutSB S/B 0.905x10 3 2x x10 3 7x x10 3 2x

Signal/Background Discrimination Harrison B. Prosper SAMSI, March But Does It Really Work? Let d(x) = N p(x|S) + N p(x|B) be the density of the data, containing 2N events, assuming, for simplicity, p(S) = p(B). A properly trained classifier y(x) approximates p(S|x) = p(x|S)/[p(x|S) + p(x|B)] Therefore, if the data (signal + background) are weighted with y(x), we should recover the signal density.

Signal/Background Discrimination Harrison B. Prosper SAMSI, March But Does It Really Work? It seems to!

Example 3

Signal/Background Discrimination Harrison B. Prosper SAMSI, March Particle Physics Data, Take 2 Two varieties of jet: 1.Tagged (Jet 1, Jet 4) 2.Untagged (Jet 2, Jet 3) We are often interested in Pr(Tagged|Jet Variables)

Signal/Background Discrimination Harrison B. Prosper SAMSI, March Example 3: Tagging Jets Tagged-jet Untagged-jet collision point pd p(T|x)= p(x|T) p(T) / d(x) p d(x) = p(x|T) p(T) + p(x|U) p(U) x = (P T,, ) d (red curve is d(x)!) pd p(x|T) or d(x)

Signal/Background Discrimination Harrison B. Prosper SAMSI, March Probability Density Estimation Approximate a density by a sum over kernels K(.), one placed at each of the N points x i of the training sample. h is one or more smoothing parameters adjusted to provide the best approximation to the true density p(x). If h is too small, the model will be very spiky; if h is too large, features of the density p(x) will be lost.

Signal/Background Discrimination Harrison B. Prosper SAMSI, March Probability Density Estimation Why does this work? Consider the limit as N -> of In the limit N ->, the true density p(x) will be recovered provided that h -> 0 in such a way that

Signal/Background Discrimination Harrison B. Prosper SAMSI, March Probability Density Estimation As long as the kernel behaves sensibly in the N -> limit any kernel will do. In practice, the most commonly used kernel is the product of 1-D Gaussians, one for each dimension i: One advantage of the PDE approximation is that it contains very few adjustable parameters: basically, the smoothing parameters.

Signal/Background Discrimination Harrison B. Prosper SAMSI, March Example 3: Tagging Jets Tagged-jet Untagged-jet collision point Projections of estimated p(T|x) (black curve) onto the P T, and axes. Blue points: ratio of blue to red histograms (see slide 25)

Signal/Background Discrimination Harrison B. Prosper SAMSI, March Example 3: Tagging Jets Tagged-jet Untagged-jet collision point Projections of data weighted by p(T|x). Recovers tagged x density p(x|T).

Signal/Background Discrimination Harrison B. Prosper SAMSI, March But, How Well Does It Work? Tagged-jet Untagged-jet collision point How well do the n-D model and the n-D data agree? A thought (JL, HBP): 1. Project the model and the data onto the same set of randomly directed rays through the origin. 2. Compute some measure of discrepancy for each pair of projections. 3. Do something sensible with this set of numbers!!

Signal/Background Discrimination Harrison B. Prosper SAMSI, March Tagged-jet Untagged-jet collision point But, How Well Does It Work? Projections of p(T|x) onto 3 randomly chosen rays through the origin.

Signal/Background Discrimination Harrison B. Prosper SAMSI, March Tagged-jet Untagged-jet collision point Projections of weighted tagged + untagged data onto the 3 randomly selected rays. But, How Well Does It Work?

Signal/Background Discrimination Harrison B. Prosper SAMSI, March Summary Multivariate methods have been applied with considerable success in particle physics, especially for classification. However, there is considerable room for improving our understanding of them as well as expanding their domain of application. The main challenge is data/model comparison when each datum is a point in 1…20 dimensions. During the SAMSI workshop we hope to make some progress on the use of projections onto multiple rays. This may be an interesting area for collaboration between physicists and statisticians.