Bayesian Within The Gates A View From Particle Physics

Slides:



Advertisements
Similar presentations
Signal/Background Discrimination Harrison B. Prosper SAMSI, March Signal/Background Discrimination in Particle Physics Harrison B. Prosper Florida.
Advertisements

CS479/679 Pattern Recognition Dr. George Bebis
Probability and Statistics Basic concepts II (from a physicist point of view) Benoit CLEMENT – Université J. Fourier / LPSC
Practical Statistics for LHC Physicists Bayesian Inference Harrison B. Prosper Florida State University CERN Academic Training Lectures 9 April, 2015.
0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Ai in game programming it university of copenhagen Statistical Learning Methods Marco Loog.
Bayesian Reasoning: Markov Chain Monte Carlo
Top Thinkshop-2 Nov , 2000 Pushpa Bhat1 Advanced Analysis Algorithms for Top Analysis Pushpa Bhat Fermilab Top Thinkshop 2 Fermilab, IL November.
Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.
Statistical Tools PhyStat Workshop 2004 Harrison B. Prosper1 Statistical Tools A Few Comments Harrison B. Prosper Florida State University PHYSTAT Workshop.
Sample Selection Bias Lei Tang Feb. 20th, Classical ML vs. Reality  Training data and Test data share the same distribution (In classical Machine.
Classification for High Dimensional Problems Using Bayesian Neural Networks and Dirichlet Diffusion Trees Radford M. Neal and Jianguo Zhang the winners.
Bayesian Neural Networks Pushpa Bhat Fermilab Harrison Prosper Florida State University.
Multivariate Analysis A Unified Perspective
1 lBayesian Estimation (BE) l Bayesian Parameter Estimation: Gaussian Case l Bayesian Parameter Estimation: General Estimation l Problems of Dimensionality.
Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.
Bayesian Estimation (BE) Bayesian Parameter Estimation: Gaussian Case
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
0 Pattern Classification, Chapter 3 0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda,
Harrison B. Prosper Workshop on Top Physics, Grenoble Bayesian Statistics in Analysis Harrison B. Prosper Florida State University Workshop on Top Physics:
G. Cowan Statistical Methods in Particle Physics1 Statistical Methods in Particle Physics Day 3: Multivariate Methods (II) 清华大学高能物理研究中心 2010 年 4 月 12—16.
Comparison of Bayesian Neural Networks with TMVA classifiers Richa Sharma, Vipin Bhatnagar Panjab University, Chandigarh India-CMS March, 2009 Meeting,
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.
Learning Theory Reza Shadmehr Linear and quadratic decision boundaries Kernel estimates of density Missing data.
Practical Statistics for Particle Physicists Lecture 3 Harrison B. Prosper Florida State University European School of High-Energy Physics Anjou, France.
Bayesian Classification. Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities.
Bayes’ Nets: Sampling [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available.
Chapter 3 (part 2): Maximum-Likelihood and Bayesian Parameter Estimation Bayesian Estimation (BE) Bayesian Estimation (BE) Bayesian Parameter Estimation:
The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.
Experience from Searches at the Tevatron Harrison B. Prosper Florida State University 18 January, 2011 PHYSTAT 2011 CERN.
Practical Statistics for Particle Physicists Lecture 2 Harrison B. Prosper Florida State University European School of High-Energy Physics Parádfürdő,
6. Population Codes Presented by Rhee, Je-Keun © 2008, SNU Biointelligence Lab,
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
The Unscented Particle Filter 2000/09/29 이 시은. Introduction Filtering –estimate the states(parameters or hidden variable) as a set of observations becomes.
G. Cowan Lectures on Statistical Data Analysis Lecture 6 page 1 Statistical Data Analysis: Lecture 6 1Probability, Bayes’ theorem 2Random variables and.
CS Ensembles and Bayes1 Ensembles, Model Combination and Bayesian Combination.
Multivariate Methods in Particle Physics Today and Tomorrow Harrison B. Prosper Florida State University 5 November, 2008 ACAT 08, Erice, Sicily.
From Small-N to Large Harrison B. Prosper SCMA IV, June Bayesian Methods in Particle Physics: From Small-N to Large Harrison B. Prosper Florida State.
Statistical Tools In Dzero PHYSTAT Workshop 2005 Harrison B. Prosper1 Statistical Software In DØ The Good, the Bad and the Non-Existent Harrison B. Prosper.
Multivariate Discriminants Lectures 4.1, 4.2 Harrison B. Prosper Florida State University INFN School of Statistics 2013 Vietri sul Mare (Salerno, Italy)
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Confidence Intervals Lecture 2 First ICFA Instrumentation School/Workshop At Morelia, Mexico, November 18-29, 2002 Harrison B. Prosper Florida State University.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
Helge VossAdvanced Scientific Computing Workshop ETH Multivariate Methods of data analysis Helge Voss Advanced Scientific Computing Workshop ETH.
Generalization Performance of Exchange Monte Carlo Method for Normal Mixture Models Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.
Lecture 1.31 Criteria for optimal reception of radio signals.
CS479/679 Pattern Recognition Dr. George Bebis
Exploring SUSY Parameter Space A New Bayesian Approach
Chapter 3: Maximum-Likelihood Parameter Estimation
PDF, Normal Distribution and Linear Regression
Background on Classification
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
First Evidence for Electroweak Single Top Quark Production
Multivariate Analysis Past, Present and Future
Statistical Models for Automatic Speech Recognition
Pattern Classification, Chapter 3
Chapter 3: Maximum-Likelihood and Bayesian Parameter Estimation (part 2)
LECTURE 05: THRESHOLD DECODING
Course Outline MODEL INFORMATION COMPLETE INCOMPLETE
Multidimensional Integration Part I
Statistical Models for Automatic Speech Recognition
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Instructors: Fei Fang (This Lecture) and Dave Touretzky
TAE 2018 Benasque, Spain 3-15 Sept 2018 Glen Cowan Physics Department
LECTURE 07: BAYESIAN ESTIMATION
Parametric Methods Berlin Chen, 2005 References:
Learning From Observed Data
Mathematical Foundations of BME
Chapter 3: Maximum-Likelihood and Bayesian Parameter Estimation (part 2)
Applied Statistics and Probability for Engineers
Presentation transcript:

Bayesian Within The Gates A View From Particle Physics Harrison B. Prosper Florida State University SAMSI 24 January, 2006 Bayesian within the Gates Harrison B. Prosper SAMSI, 2006 Multivariate Analysis Harrison B. Prosper Durham, UK 2002

Bayesian within the Gates Harrison B. Prosper SAMSI, 2006 Outline Measuring Zero as Precisely as Possible! Signal/Background Discrimination 1-D Example 14-D Example Some Open Issues Summary Bayesian within the Gates Harrison B. Prosper SAMSI, 2006

Bayesian within the Gates Harrison B. Prosper SAMSI, 2006 Measuring Zero! Diamonds may not be forever Neutron <-> anti-neutron transitions, CRISP Experiment (1982 – 1985), Institut Laue Langevin Grenoble, France Method Fire gas of cold neutrons onto a graphite foil. Look for annihilation of anti-neutron component. Bayesian within the Gates Harrison B. Prosper SAMSI, 2006

Bayesian within the Gates Harrison B. Prosper SAMSI, 2006 Measuring Zero! Count number of signal + background events N. Suppress putative signal and count background events B, independently. Results: N = 3 B = 7 Bayesian within the Gates Harrison B. Prosper SAMSI, 2006

Bayesian within the Gates Harrison B. Prosper SAMSI, 2006 Measuring Zero! Classic 2-Parameter Counting Experiment N ~ Poisson(s+b) B ~ Poisson(b) Wanted: A statement like s < u(N,B) @ 90% CL Bayesian within the Gates Harrison B. Prosper SAMSI, 2006

Bayesian within the Gates Harrison B. Prosper SAMSI, 2006 Measuring Zero! In 1984, no exact solution existed in the particle physics literature! But, surely it must have been solved by statisticians. Alas, from Kendal and Stuart I learnt that calculating exact confidence intervals is “a matter of very considerable difficulty”. Bayesian within the Gates Harrison B. Prosper SAMSI, 2006

Bayesian within the Gates Harrison B. Prosper SAMSI, 2006 Measuring Zero! Exact in what way? Over the ensemble of statements of the form s є [0, u) at least 90% of them should be true whatever the true value of the signal s AND whatever the true value of the background parameter b. blame… Neyman (1937) Bayesian within the Gates Harrison B. Prosper SAMSI, 2006

Bayesian within the Gates Harrison B. Prosper SAMSI, 2006 “Keep it simple, but no simpler” Albert Einstein Bayesian within the Gates Harrison B. Prosper SAMSI, 2006

Bayesian within the Gates Harrison B. Prosper SAMSI, 2006 Bayesian @ the Gate (1984) Solution: p(N,B|s,b) = Poisson(s+b) Poisson(b) the likelihood p(s,b) = uniform(s,b) the prior Compute the posterior density p(s,b|N,B) p(s,b|N,B) = p(N,B|s,b) p(s,b)/p(N,B) Marginalize over b p(s|N,B) = ∫p(s,b|N,B) db This reasoning was compelling to me then, and is much more so now! Bayesian within the Gates Harrison B. Prosper SAMSI, 2006

Bayesian within the Gates Harrison B. Prosper SAMSI, 2006 Particle Physics Data proton + anti-proton -> positron (e+) neutrino (n) Jet1 Jet2 Jet3 Jet4 This event “lives” in 3 + 2 + 3 x 4 = 17 dimensions. Bayesian within the Gates Harrison B. Prosper SAMSI, 2006

Bayesian within the Gates Harrison B. Prosper SAMSI, 2006 Particle Physics Data CDF/Dzero Discovery of top quark (1995) Data red Signal green Background blue, magenta Dzero: 17-D -> 2-D Bayesian within the Gates Harrison B. Prosper SAMSI, 2006

Bayesian within the Gates Harrison B. Prosper SAMSI, 2006 But that was then, and now is now! Today we have 2 GHz laptops, with 2 GB of memory! It is fun to deploy huge, sometimes unreliable, computational resources, that is, brains, to reduce the dimensionality of data. But perhaps it is now feasible to work directly in the original high-dimensional space, using hardware! Bayesian within the Gates Harrison B. Prosper SAMSI, 2006

Signal/Background Discrimination The optimal solution is to compute p(S|x) = p(x|s) p(s) / [p(x|s) p(s) + p(x|B) p(B)] Every signal/background discrimination method is ultimately an algorithm to approximate this solution, or a mapping thereof. Therefore, if a method is already at the Bayes limit, no other method, however sophisticated, can do better! Bayesian within the Gates Harrison B. Prosper SAMSI, 2006

Signal/Background Discrimination Given D = x, y x = {x1,…xN}, y = {y1,…yN} of N training examples Infer A discriminant function f(x, w), with parameters w p(w|x, y) = p(x, y|w) p(w) / p(x, y) = p(y|x, w) p(x|w) p(w) / p(y|x) p(x) = p(y|x, w) p(w) / p(y|x) assuming p(x|w) -> p(x) Bayesian within the Gates Harrison B. Prosper SAMSI, 2006

Signal/Background Discrimination A typical likelihood for classification: p(y|x, w) = Pi f(xi, w)y [1 – f(xi, w)]1-y where y = 0 for background events y = 1 for signal events If f(x, w) flexible enough, then maximizing p(y|x, w) with respect to w yields f = p(S|x), asymptotically. Bayesian within the Gates Harrison B. Prosper SAMSI, 2006

Signal/Background Discrimination However, in a full Bayesian calculation one usually averages with respect to the posterior density y(x) = ∫ f(x, w) p(w|D) dw Questions: 1. Do suitably flexible functions f(x, w) exist? 2. Is there a feasible way to do the integral? Bayesian within the Gates Harrison B. Prosper SAMSI, 2006

Answer 1: Hilbert’s 13th Problem! Prove that the following is impossible y(x,y,z) = F( A(x), B(y), C(z) ) In 1957, Kolmogorov proved the contrary conjecture y(x1,..,xn) = F( f1(x1),…,fn(xn) ) I’ll call such functions, F, Kolmogorov functions Bayesian within the Gates Harrison B. Prosper SAMSI, 2006

Bayesian within the Gates Harrison B. Prosper SAMSI, 2006 Kolmogorov Functions n(x,w) x1 x2 u, a v, b A neural network is an example of a Kolmogorov function, that is, a function capable of approximating arbitrary mappings f:RN -> U The parameters w = (u, a, v, b) are called weights Bayesian within the Gates Harrison B. Prosper SAMSI, 2006

Answer 2: Use Hybrid MCMC Computational Method Generate a Markov chain (MC) of N points {w} drawn from the posterior density p(w|D) and average over the last M points. Each point corresponds to a network. Software Flexible Bayesian Modeling by Radford Neal http://www.cs.utoronto.ca/~radford/fbm.software.html Bayesian within the Gates Harrison B. Prosper SAMSI, 2006

Bayesian within the Gates Harrison B. Prosper SAMSI, 2006 A 1-D Example Signal p+pbar -> t q b Background p+pbar -> W b b NN Model Class (1, 15, 1) MCMC 500 tqb + Wbb events Use last 20 networks in a MC chain of 500. Wbb tqb x Bayesian within the Gates Harrison B. Prosper SAMSI, 2006

Bayesian within the Gates Harrison B. Prosper SAMSI, 2006 A 1-D Example Dots p(S|x) = HS/(HS+HB) HS, HB, 1-D histograms Curves Individual NNs n(x, wk) Black curve < n(x, w) > x Bayesian within the Gates Harrison B. Prosper SAMSI, 2006

A 14-D Example (Finding Susy!) Transverse momentum spectra Signal: black curve Signal/Noise 1/100,000 Bayesian within the Gates Harrison B. Prosper SAMSI, 2006

A 14-D Example (Finding Susy!) Missing transverse momentum spectrum (caused by escape of neutrinos and Susy particles) Variable count 4 x (ET, h, f) + (ET, f) = 14 Bayesian within the Gates Harrison B. Prosper SAMSI, 2006

A 14-D Example (Finding Susy!) Signal 250 p+pbar -> top + anti-top (MC) events Background 250 p+pbar -> gluino gluino (MC) events NN Model Class (14, 40, 1) (641-D parameter space!) MCMC Use last 100 networks in a Markov chain of 10,000, skipping every 20. Likelihood Prior Bayesian within the Gates Harrison B. Prosper SAMSI, 2006

Bayesian within the Gates Harrison B. Prosper SAMSI, 2006 But does it Work? Signal to noise can reach 1/1 with an acceptable signal strength Bayesian within the Gates Harrison B. Prosper SAMSI, 2006

Bayesian within the Gates Harrison B. Prosper SAMSI, 2006 But does it Work? Let d(x) = N p(x|S) + N p(x|B) be the density of the data, containing 2N events, assuming, for simplicity, p(S) = p(B). A properly trained classifier y(x) approximates p(S|x) = p(x|S)/[p(x|S) + p(x|B)] Therefore, if the signal and background events are weighted with y(x), we should recover the signal density. Bayesian within the Gates Harrison B. Prosper SAMSI, 2006

Bayesian within the Gates Harrison B. Prosper SAMSI, 2006 But does it Work? Amazingly well ! Bayesian within the Gates Harrison B. Prosper SAMSI, 2006

Bayesian within the Gates Harrison B. Prosper SAMSI, 2006 Some Open Issues Why does this insane function p(w1,…,w641|x1,…,x500) behave so well? 641 parameters > 500 events! How should one verify that an n-D (n ~ 14) swarm of simulated background events matches the n-D swarm of observed events (in the background region)? How should one verify that y(x) is indeed a reasonable approximation to the Bayes discriminant, p(S|x)? Bayesian within the Gates Harrison B. Prosper SAMSI, 2006

Bayesian within the Gates Harrison B. Prosper SAMSI, 2006 Summary Bayesian methods have been, and are being, used with considerable success by particle physicists. Happily, the frequentist/Bayesian Cold War is abating! The application of Bayesian methods to highly flexible functions, e.g., neural networks, is very promising and should be broadly applicable. Needed: A powerful way to compare high-dimensional swarms of points. Agree, or not agree, that is the question! Bayesian within the Gates Harrison B. Prosper SAMSI, 2006