Bayesian Neural Networks Pushpa Bhat Fermilab Harrison Prosper Florida State University.

Slides:



Advertisements
Similar presentations
Signal/Background Discrimination Harrison B. Prosper SAMSI, March Signal/Background Discrimination in Particle Physics Harrison B. Prosper Florida.
Advertisements

Continuous simulation of Beyond-Standard-Model processes with multiple parameters Jiahang Zhong (University of Oxford * ) Shih-Chang Lee (Academia Sinica)
Tau dilepton channel The data sample used in this analysis comprises high-p T inclusive lepton events that contain an electron with E T >20 GeV or a muon.
On-Line Probabilistic Classification with Particle Filters Pedro Højen-Sørensen, Nando de Freitas, and Torgen Fog, Proceedings of the IEEE International.
ACAT2000 Oct , 2000 Pushpa Bhat1 Advanced Analysis Techniques in HEP Pushpa Bhat Fermilab ACAT2000 Fermilab, IL October 2000 A reasonable man adapts.
Top Thinkshop-2 Nov , 2000 Pushpa Bhat1 Advanced Analysis Algorithms for Top Analysis Pushpa Bhat Fermilab Top Thinkshop 2 Fermilab, IL November.
Summary of Results and Projected Sensitivity The Lonesome Top Quark Aran Garcia-Bellido, University of Washington Single Top Quark Production By observing.
Neural Networks II CMPUT 466/551 Nilanjan Ray. Outline Radial basis function network Bayesian neural network.
Searching for Single Top Using Decision Trees G. Watts (UW) For the DØ Collaboration 5/13/2005 – APSNW Particles I.
Summary of Results and Projected Precision Rediscovering the Top Quark Marc-André Pleier, Universität Bonn Top Quark Pair Production and Decay According.
Statistical Tools PhyStat Workshop 2004 Harrison B. Prosper1 Statistical Tools A Few Comments Harrison B. Prosper Florida State University PHYSTAT Workshop.
Top Turns Ten March 2 nd, Measurement of the Top Quark Mass The Low Bias Template Method using Lepton + jets events Kevin Black, Meenakshi Narain.
Current Statistical Issues in Particle Physics Louis Lyons Particle Physics Oxford U.K. Future of Statistical Theory Hyderabad December 2004.
Kevin Black Meenakshi Narain Boston University
Top Physics at the Tevatron Mike Arov (Louisiana Tech University) for D0 and CDF Collaborations 1.
The new Silicon detector at RunIIb Tevatron II: the world’s highest energy collider What’s new?  Data will be collected from 5 to 15 fb -1 at  s=1.96.
Experimental Top Status Gordon Watts Brown University For the DØ & CDF Collaborations July 25-29, 1999 (soon to be at University of Washington)
On the Trail of the Higgs Boson Meenakshi Narain.
S. Martí i García Liverpool December 02 1 Selection of events in the all-hadronic channel S. Martí i García CDF End Of Year Review Liverpool / December.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 6 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
1 Statistical Inference Problems in High Energy Physics and Astronomy Louis Lyons Particle Physics, Oxford BIRS Workshop Banff.
G. Cowan RHUL Physics Bayesian Higgs combination page 1 Bayesian Higgs combination using shapes ATLAS Statistics Meeting CERN, 19 December, 2007 Glen Cowan.
July 11, 2001Daniel Whiteson Support Vector Machines: Get more Higgs out of your data Daniel Whiteson UC Berkeley.
Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 1 Multivariate Methods in HEP Pushpa Bhat Fermilab.
Pushpa Bhat Fermilab August 6, Pushpa Bhat DPF2015  Over the past 25 years, Multivariate analysis (MVA) methods have gained gradual acceptance.
G. Cowan Lectures on Statistical Data Analysis Lecture 7 page 1 Statistical Data Analysis: Lecture 7 1Probability, Bayes’ theorem 2Random variables and.
1 g g s Richard E. Hughes The Ohio State University for The CDF and D0 Collaborations Low Mass SM Higgs Search at the Tevatron hunting....
Harrison B. Prosper Workshop on Top Physics, Grenoble Bayesian Statistics in Analysis Harrison B. Prosper Florida State University Workshop on Top Physics:
1 ZH Analysis Yambazi Banda, Tomas Lastovicka Oxford SiD Collaboration Meeting
G. Cowan Statistical Methods in Particle Physics1 Statistical Methods in Particle Physics Day 3: Multivariate Methods (II) 清华大学高能物理研究中心 2010 年 4 月 12—16.
Comparison of Bayesian Neural Networks with TMVA classifiers Richa Sharma, Vipin Bhatnagar Panjab University, Chandigarh India-CMS March, 2009 Meeting,
Use of Multivariate Analysis (MVA) Technique in Data Analysis Rakshya Khatiwada 08/08/2007.
October 19, 2000ACAT 2000, Fermilab, Suman B. Beri Top Quark Mass Measurements Using Neural Networks Suman B. Beri, Rajwant Kaur Panjab University, India.
Dropout as a Bayesian Approximation
DPF2000, 8/9-12/00 p. 1Richard E. Hughes, The Ohio State UniversityHiggs Searches in Run II at CDF Prospects for Higgs Searches at CDF in Run II DPF2000.
Measurements of Top Quark Properties at Run II of the Tevatron Erich W.Varnes University of Arizona for the CDF and DØ Collaborations International Workshop.
Higgs Reach Through VBF with ATLAS Bruce Mellado University of Wisconsin-Madison Recontres de Moriond 2004 QCD and High Energy Hadronic Interactions.
Higgs self coupling Djamel BOUMEDIENE, Pascal GAY LPC Clermont-Ferrand.
Leptoquark Searches at DØ DDDD  Search for Leptoquark at D  using Neural Networks Silvia Tentindo-Repond Florida State University for the  D  Collaboration.
Puu Oo Cone, Hawaii Gordon Watts University of Washington For the DØ Collaboration DPF 2006.
Susan Burke DØ/University of Arizona DPF 2006 Measurement of the top pair production cross section at DØ using dilepton and lepton + track events Susan.
1 Measurement of the Mass of the Top Quark in Dilepton Channels at DØ Jeff Temple University of Arizona for the DØ collaboration DPF 2006.
Single top quark physics Peter Dong, UCLA on behalf of the CDF and D0 collaborations Les Rencontres de Physique de la Vallee d’Aoste Wednesday, February.
Kinematics of Top Decays in the Dilepton and the Lepton + Jets channels: Probing the Top Mass University of Athens - Physics Department Section of Nuclear.
La Thuile, March, 15 th, 2003 f Makoto Tomoto ( FNAL ) Prospects for Higgs Searches at DØ Makoto Tomoto Fermi National Accelerator Laboratory (For the.
Multivariate Methods in Particle Physics Today and Tomorrow Harrison B. Prosper Florida State University 5 November, 2008 ACAT 08, Erice, Sicily.
1 Reinhard Schwienhorst, MSU Top Group Meeting W' Search in the single top quark channel Reinhard Schwienhorst Michigan State University Top Group Meeting,
Session 10 on Standard-Model Electroweak Physics Combined Abstract 845 on Mass of Top: Abstract 169: Measurement of Mass of Top Quark in Lepton+Jets Abstract.
From Small-N to Large Harrison B. Prosper SCMA IV, June Bayesian Methods in Particle Physics: From Small-N to Large Harrison B. Prosper Florida State.
SEARCH FOR DIRECT PRODUCTION OF SUPERSYMMETRIC PAIRS OF TOP QUARKS AT √ S = 8 TEV, WITH ONE LEPTON IN THE FINAL STATE. Juan Pablo Gómez Cardona PhD Candidate.
Single Top Quark Production at D0, L. Li (UC Riverside) EPS 2007, July Liang Li University of California, Riverside On Behalf of the DØ Collaboration.
Statistical Tools In Dzero PHYSTAT Workshop 2005 Harrison B. Prosper1 Statistical Software In DØ The Good, the Bad and the Non-Existent Harrison B. Prosper.
Viktor Veszpremi Purdue University, CDF Collaboration Tev4LHC Workshop, Oct , Fermilab ZH->vvbb results from CDF.
A Presentation on Adaptive Neuro-Fuzzy Inference System using Particle Swarm Optimization and it’s Application By Sumanta Kundu (En.R.No.
Search for Standard Model Higgs in ZH  l + l  bb channel at DØ Shaohua Fu Fermilab For the DØ Collaboration DPF 2006, Oct. 29 – Nov. 3 Honolulu, Hawaii.
Suyong Choi (SKKU) SUSY Standard Model Higgs Searches at DØ Suyong Choi SKKU, Korea for DØ Collaboration.
Investigation on CDF Top Physics Group Ye Li Graduate Student UW - Madison.
Low Mass Standard Model Higgs Boson Searches at the Tevatron Andrew Mehta Physics at LHC, Split, Croatia, September 29th 2008 On behalf of the CDF and.
ICHEP 2002, Amsterdam Marta Calvi - Study of Spectral Moments… 1 Study of Spectral Moments in Semileptonic b Decays with the DELPHI Detector at LEP Marta.
Bayesian Within The Gates A View From Particle Physics
First Evidence for Electroweak Single Top Quark Production
Multivariate Analysis Past, Present and Future
Search for WHlnbb at the Tevatron DPF 2009
An Important thing to know.
Multidimensional Integration Part I
W boson helicity measurement
Lev Tarasov, Radford Neal, and W. R. Peltier University of Toronto
Top mass measurements at the Tevatron and the standard model fits
Measurement of the Single Top Production Cross Section at CDF
Northern Illinois University / NICADD
Presentation transcript:

Bayesian Neural Networks Pushpa Bhat Fermilab Harrison Prosper Florida State University

9/14/05PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper Outline  Introduction  Bayesian Learning  Simple Examples  Summary

9/14/05PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper Multivariate Methods  Since the early 1990’s, we have used multivariate methods extensively in Particle Physics  Some examples  Particle ID and signal/background discrimination  Optimization of cuts for top quark discovery at DØ  Precision measurement of top mass  Searches for leptoquarks, technicolor,..  Neural network methods have become popular due to ease of use, power and successful applications

9/14/05PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper  Improve several aspects of analysis  Event selection  Triggering, Real-time Filters, Data Streaming  Event reconstruction  Tracking/vertexing, particle ID  Signal/Background Discrimination  Higgs discovery, SUSY discovery, Single top, …  Functional Approximation  Jet energy corrections, tag rates, fake rates  Parameter estimation  Top quark mass, Higgs mass, SUSY model parameters  Data Exploration  Knowledge Discovery via data-mining  Data-driven extraction of information, latent structure analysis Why Multivariate Methods?

9/14/05PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper Multi Layer Perceptron  A popular and powerful neural network model: i j k  ji  kj Need to find  ’s and  ’s, the free parameters of the model

9/14/05PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper The Bayesian Connection Output of a feed forward neural network can approximate the posterior probability P(s|x 1,x 2 ).

9/14/05PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper The Top Quark Post-Evidence, Pre-Discovery ! Fisher Analysis of tt  e  channel One candidate event (S/B)(m t = 180 GeV) = 18 w.r.t. Z  = 10 w.r.t WW NN Analysis tt  e+jets channel tt W+jets tt160Data P. Bhat, DPF94

9/14/05PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper Measuring the Top Quark Mass Discriminant variables m t = ± 5.6(stat.) ± 6.2 (syst.) GeV/c 2 The Discriminants DØ Lepton+jets Fit performed in 2-D: (D LB/NN, m fit )

9/14/05PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper Higgs Discovery Reach  The challenges are daunting! But using NN provides same reach with a factor of 2 less luminosity w.r.t. conventional analysis  Improved bb mass resolution & b-tag efficiency crucial Run II Higgs study hep-ph/ (Oct-2000) P.C.Bhat, R.Gilmartin, H.Prosper, Phys.Rev.D.62 (2000)

9/14/05PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper Limitations of “Conventional NN”  The training yields one set of weights or network parameters  Need to look for “best” network, but avoid overfitting  Heuristic decisions on network architecture  Inputs, number of hidden nodes, etc.  No direct way to compute uncertainties

9/14/05PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper Ensembles of Networks NN 1 NN 2 NN 3 NN M X y1y1 y2y2 y3y3 yMyM Decision by averaging over many networks (a committee of networks) has lower error than that of any individual network.

9/14/05PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper Bayesian Learning  The result of Bayesian training is a posterior density of the network weights  P(w|training data)  Generate a sequence of weights (network parameters) in the network parameter space i.e., a sequence of networks. The optimal network is approximated by averaging over the last K points:

9/14/05PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper Bayesian Learning – 2  Advantages  Less prone to over-fitting, because of Bayesian averaging.  Less need to optimize the size of the network. Can use a large network! Indeed, number of weights can be greater than number of training events! p(t|x)  In principle, provides best estimate of p(t|x)  Disadvantages  Computationally demanding!

9/14/05PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper Bayesian Learning – 3  Computationally demanding because  The dimensionality of the parameter space is, typically, large.  There could be multiple maxima in the likelihood function p(t|x,w), or, equivalently, multiple minima in the error function E(x,w).

9/14/05PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper Bayesian Neural Networks – 1  Basic Idea  Compute  Then estimate p(t|x new ) by averaging over NNs LikelihoodPrior

9/14/05PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper Bayesian Neural Networks – 2  Likelihood  Where t i = 0 or 1 for background/signal  Prior

9/14/05PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper Bayesian Neural Networks – 3  Computational method  Generate a Markov chain (MC) of N points {w} from the posterior density p(w|x) and average over last K  Markov Chain Monte Carlo software from by Radford Neal

9/14/05PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper Bayesian Neural Networks – 4  Treat sampling of posterior density as a problem in Hamiltonian dynamics in which the phase space (p,q) is explored using Markov techniques

9/14/05PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper A Simple Example  Signal  ppbar  tqb (  channel)  Background  ppbar  Wbb  NN Model  (1, 15, 1)  MCMC  5000 tqb + Wbb events  Use last 20 networks in a MC chain of 500. HT_AllJets_MinusBestJets (scaled) Wbb tqb

9/14/05PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper A Simple Example Estimate of Prob(s|H T ) Blue dots: p(s|H T ) = H tqb /(H tqb +H Wbb ) Curves: (individual NNs) y(H T, w n ) Black curve:

9/14/05PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper Example: Single Top Search  Training Data  2000 events (1000 tqb-  Wbb-  )  Standard set of 11 variables  Network 391  (11, 30, 1) Network (391 parameters!)  Markov Chain Monte Carlo (MCMC)  500 iterations, but use last 100 iterations  20 MCMC steps per iteration  NN-parameters stored after each iteration  10,000 steps  ~ 1000 steps / hour (on 1 GHz, Pentium III laptop)

9/14/05PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper Signal/Bkgd. Distributions

9/14/05PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper

9/14/05PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper Weighting with NN output  Number of data events:  Create weighted histograms of variables

9/14/05PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper Weighted Distributions Magenta: Weighting signal only; Blue: Weighting signal & background Black: Un-weighted signal distribution

9/14/05PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper Summary  Bayesian learning of neural networks takes us another step closer to realizing optimal results in classification (or density estimation) problems. It allows a fully probabilistic approach with proper treatment of uncertainties.  We have started to explore Bayesian neural networks and the initial results are promising, though computationally challenging.