Download presentation
Presentation is loading. Please wait.
1
Bayesian Neural Networks Pushpa Bhat Fermilab Harrison Prosper Florida State University
2
9/14/05PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper Outline Introduction Bayesian Learning Simple Examples Summary
3
9/14/05PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper Multivariate Methods Since the early 1990’s, we have used multivariate methods extensively in Particle Physics Some examples Particle ID and signal/background discrimination Optimization of cuts for top quark discovery at DØ Precision measurement of top mass Searches for leptoquarks, technicolor,.. Neural network methods have become popular due to ease of use, power and successful applications
4
9/14/05PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper Improve several aspects of analysis Event selection Triggering, Real-time Filters, Data Streaming Event reconstruction Tracking/vertexing, particle ID Signal/Background Discrimination Higgs discovery, SUSY discovery, Single top, … Functional Approximation Jet energy corrections, tag rates, fake rates Parameter estimation Top quark mass, Higgs mass, SUSY model parameters Data Exploration Knowledge Discovery via data-mining Data-driven extraction of information, latent structure analysis Why Multivariate Methods?
5
9/14/05PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper Multi Layer Perceptron A popular and powerful neural network model: i j k ji kj Need to find ’s and ’s, the free parameters of the model
6
9/14/05PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper The Bayesian Connection Output of a feed forward neural network can approximate the posterior probability P(s|x 1,x 2 ).
7
9/14/05PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper The Top Quark Post-Evidence, Pre-Discovery ! Fisher Analysis of tt e channel One candidate event (S/B)(m t = 180 GeV) = 18 w.r.t. Z = 10 w.r.t WW NN Analysis tt e+jets channel tt W+jets tt160Data P. Bhat, DPF94
8
9/14/05PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper Measuring the Top Quark Mass Discriminant variables m t = 173.3 ± 5.6(stat.) ± 6.2 (syst.) GeV/c 2 The Discriminants DØ Lepton+jets Fit performed in 2-D: (D LB/NN, m fit )
9
9/14/05PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper Higgs Discovery Reach The challenges are daunting! But using NN provides same reach with a factor of 2 less luminosity w.r.t. conventional analysis Improved bb mass resolution & b-tag efficiency crucial Run II Higgs study hep-ph/0010338 (Oct-2000) P.C.Bhat, R.Gilmartin, H.Prosper, Phys.Rev.D.62 (2000) 074022
10
9/14/05PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper Limitations of “Conventional NN” The training yields one set of weights or network parameters Need to look for “best” network, but avoid overfitting Heuristic decisions on network architecture Inputs, number of hidden nodes, etc. No direct way to compute uncertainties
11
9/14/05PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper Ensembles of Networks NN 1 NN 2 NN 3 NN M X y1y1 y2y2 y3y3 yMyM Decision by averaging over many networks (a committee of networks) has lower error than that of any individual network.
12
9/14/05PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper Bayesian Learning The result of Bayesian training is a posterior density of the network weights P(w|training data) Generate a sequence of weights (network parameters) in the network parameter space i.e., a sequence of networks. The optimal network is approximated by averaging over the last K points:
13
9/14/05PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper Bayesian Learning – 2 Advantages Less prone to over-fitting, because of Bayesian averaging. Less need to optimize the size of the network. Can use a large network! Indeed, number of weights can be greater than number of training events! p(t|x) In principle, provides best estimate of p(t|x) Disadvantages Computationally demanding!
14
9/14/05PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper Bayesian Learning – 3 Computationally demanding because The dimensionality of the parameter space is, typically, large. There could be multiple maxima in the likelihood function p(t|x,w), or, equivalently, multiple minima in the error function E(x,w).
15
9/14/05PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper Bayesian Neural Networks – 1 Basic Idea Compute Then estimate p(t|x new ) by averaging over NNs LikelihoodPrior
16
9/14/05PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper Bayesian Neural Networks – 2 Likelihood Where t i = 0 or 1 for background/signal Prior
17
9/14/05PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper Bayesian Neural Networks – 3 Computational method Generate a Markov chain (MC) of N points {w} from the posterior density p(w|x) and average over last K Markov Chain Monte Carlo software from http://www.cs.toronto.edu/~radford/fbm.software.html by Radford Neal http://www.cs.toronto.edu/~radford/fbm.software.html
18
9/14/05PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper Bayesian Neural Networks – 4 Treat sampling of posterior density as a problem in Hamiltonian dynamics in which the phase space (p,q) is explored using Markov techniques
19
9/14/05PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper A Simple Example Signal ppbar tqb ( channel) Background ppbar Wbb NN Model (1, 15, 1) MCMC 5000 tqb + Wbb events Use last 20 networks in a MC chain of 500. HT_AllJets_MinusBestJets (scaled) Wbb tqb
20
9/14/05PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper A Simple Example Estimate of Prob(s|H T ) Blue dots: p(s|H T ) = H tqb /(H tqb +H Wbb ) Curves: (individual NNs) y(H T, w n ) Black curve:
21
9/14/05PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper Example: Single Top Search Training Data 2000 events (1000 tqb- + 1000 Wbb- ) Standard set of 11 variables Network 391 (11, 30, 1) Network (391 parameters!) Markov Chain Monte Carlo (MCMC) 500 iterations, but use last 100 iterations 20 MCMC steps per iteration NN-parameters stored after each iteration 10,000 steps ~ 1000 steps / hour (on 1 GHz, Pentium III laptop)
22
9/14/05PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper Signal/Bkgd. Distributions
23
9/14/05PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper
24
9/14/05PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper Weighting with NN output Number of data events: Create weighted histograms of variables
25
9/14/05PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper Weighted Distributions Magenta: Weighting signal only; Blue: Weighting signal & background Black: Un-weighted signal distribution
26
9/14/05PHYSTAT05 Oxford Bayesian Neural Networks Bhat/Prosper Summary Bayesian learning of neural networks takes us another step closer to realizing optimal results in classification (or density estimation) problems. It allows a fully probabilistic approach with proper treatment of uncertainties. We have started to explore Bayesian neural networks and the initial results are promising, though computationally challenging.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.