Presentation is loading. Please wait.

Presentation is loading. Please wait.

Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 1 Multivariate Methods in HEP Pushpa Bhat Fermilab.

Similar presentations


Presentation on theme: "Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 1 Multivariate Methods in HEP Pushpa Bhat Fermilab."— Presentation transcript:

1 Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 1 Multivariate Methods in HEP Pushpa Bhat Fermilab

2 Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 2 Outline Introduction/History Physics Analysis Examples Popular Methods Likelihood Discriminants Neural Networks Bayesian Learning Decision Trees Future Issues and Concerns Summary

3 Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 3 Some History In 1990 most of the HEP community was skeptical towards use of multivariate methods, particularly so in case of neural networks (NN) NN as a black box  Can’t understand weights  Nonlinear mapping; higher order correlations  Though mathematical function can’t explain in terms of physics  Can’t calculate systematic errors reliably  Uni-variate or “cut-based” analysis was the norm Some were pursuing application of neural network methods to HEP around 1990 Peterson, Lonnblad, Denby, Becks, Seixas, Lindsey, etc First AIHENP (Artificial Intelligence in High Energy & Nuclear Physics) workshop was in 1990. Organizers included D. Perret-Gallix, K.H. Becks, R. Brun, J.Vermaseren. AIHENP metamorphosed into ACAT ten years later, in 2000 Multivariate methods such as Fisher discriminants were in limited use. In 1990, I began to pursue the use of multivariate methods, especially NN, in top quark searches at Dzero.

4 Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 4 Mid-1990’s LEP experiments had been using NN and likelihood discriminants for particle-ID applications and eventually for signal searches (Steinberger; tau-ID) H1 at HERA successfully implemented and used NN for triggering (Kiesling). Hardware NN was attempted at Fermilab at CDF Fermilab Advanced Analysis Methods Group brought CDF and DØ together for discussion of these methods and applications in physics analyses.

5 Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 5 The Top Quark Post-Evidence, Pre-Discovery ! Fisher Analysis of tt  e  channel One candidate event (S/B)(m t = 180 GeV) = 18 w.r.t. Z  = 10 w.r.t WW NN Analysis tt  e+jets channel tt W+jets tt160Data P. Bhat, DPF94

6 Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 6 Cut Optimization for Top Discovery Feb. ‘95 Signal Background Jan. ’95 (Aspen) cut Mar. ’95 Discovery cut Contours: Possible NN cuts Feb. ‘95 Sig. Eff. S/B (Feb-Mar, 95 -Discovery Conventional cut) S/B reach with 2-v NN analysis for similar efficiency (Jan, 95 –Aspen mtg. Conventional cut) Neural Network Equi-probability Contour cuts from 2-variable analysis compared with conventional cuts used in Jan. ’95 and in Observation paper P. Bhat, H.Prosper, E. Amidi D0 Top Marathon, Feb. ‘95

7 Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 7 Measurement of the Top Quark Mass Discriminant variables m t = 173.3 ± 5.6(stat.) ± 6.2 (syst.) GeV/c 2 The Discriminants DØ Lepton+jets Fit performed in 2-D: (D LB/NN, m fit ) Run I (1996) result with NN and likelihood Recent (CDF+D0) m t measurement: m t = 171.4 ± 2.1 Gev/c 2 First significant physics result using multivariate methods

8 Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 8 Higgs, the Holy Grail of HEP Discovery Reach at the Tevatron The challenges are daunting! But using NN provides same reach with a factor of 2 less luminosity w.r.t. conventional analysis Improved bb mass resolution & b-tag efficiency crucial Run II Higgs study hep-ph/0010338 (Oct-2000) P.C.Bhat, R.Gilmartin, H.Prosper, Phys.Rev.D.62 (2000) 074022

9 Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 9 Then, it got easier One of the important steps in getting the NN accepted at the Tevatron experiments was to make the Bayesian connection. Another important message to drive home was “the maximal use of information in the event” for the job at hand Developed a random grid search technique that can be used as baseline for comparison Neural network methods now have become popular due to the ease of use, power and many successful applications Maybe too easy??

10 Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 10 Optimal Event Selection r(x,y) = constant defines an optimal decision boundary r(x,y) = constant defines an optimal decision boundary Feature space S =B = Conventional cuts

11 Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 11 The NN-Bayesian Connection Output of a feed forward neural network can approximate the posterior probability P(s|x 1,x 2 ).

12 Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 12 Limitations of “Conventional NN” The training yields one set of weights or network parameters Need to look for “best” network, but avoid overfitting Heuristic decisions on network architecture Inputs, number of hidden nodes, etc. No direct way to compute uncertainties

13 Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 13 Ensembles of Networks NN 1 NN 2 NN 3 NN M X y1y1 y2y2 y3y3 yMyM Decision by averaging over many networks (a committee of networks) has lower error than that of any individual network.

14 Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 14 Bayesian Learning The result of Bayesian training is a posterior density of the network weights  P(w|training data) Generate a sequence of weights (network parameters) in the network parameter space i.e., a sequence of networks. The optimal network is approximated by averaging over the last K points:

15 Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 15 Bayesian Learning – 2 Advantages Less prone to over-fitting Less need to optimize the size of the network. Can use a large network! Indeed, number of weights can be greater than number of training events! p(t|x)In principle, provides best estimate of p(t|x) Disadvantages Computationally demanding! The dimensionality of the parameter space is, typically, large There could be multiple maxima in the likelihood function p(t|x,w), or, equivalently, multiple minima in the error function E(x,w).

16 Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 16 Example: Single Top Search Training Data 2000 events (1000 tqb-  + 1000 Wbb-  ) Standard set of 11 variables Network 391(11, 30, 1) Network (391 parameters!) Markov Chain Monte Carlo (MCMC) 500 iterations, but use last 100 iterations 20 MCMC steps per iteration NN-parameters stored after each iteration 10,000 steps ~ 1000 steps / hour (on 1 GHz, Pentium III laptop)

17 Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 17 Signal:tqb; Background:Wbb Distributions Example: Single Top Search

18 Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 18

19 Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 19 Decision Trees Recover events that fail criteria in cut-based analyses Start at first “node” with a fraction of the “training sample” Select best variable and cut with best separation to produce two “branches ” of events, (F)ailed and (P)assed cut Repeat recursively on successive nodes Stop when improvement stops or when too few events are left Terminal node is called a “leaf ” with purity = N s /(N s +N b ) Run remaining events and data through the tree to derive results Boosting DT: Boosting is a recently developed technique that improves any weak classifier (decision tree, neural network, etc) Boosting averages the results of many trees, dilutes the discrete nature of the output, improves the performance DØ single top analysis

20 Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 20 Matrix Element Method Example: Top mass measurement Maximal use of information in each event by calculating event-by- event signal and background probabilities based on the respective matrix element x: reconstructed kinematic variables of final state objects JES: jet energy Scale from Mw constraint Signal and background probabilities from differential cross sections Write combined likelihood for all events Maximize likelihood w.r.t. m top, JES

21 Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 21 Summary Multivariate methods are now used extensively in HEP data analysis Neural networks, because of their ease of use and power, are favorites for particle-ID and signal/background discrimination Bayesian neural networks take us one step closer to optimization Likelihood discriminants and Decision trees are becoming popular because they are easier to “defend” (no “black-box” stigma) Many issues remain to be addressed as we get ready to deploy the multivariate methods for discoveries in HEP

22 Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 22 Nothing tends so much to the advancement of knowledge as the application of a new instrument - Humphrey Davy No amount of experimentation can ever prove me right; a single experiment can prove me wrong. - Albert Einstein

23 Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 23 CDF DØ Booster World’s Highest Energy Laboratory (for now)

24 Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 24 Our Fancy New Toys LHC Ring SPS Ring PS Circumference = 27km Beam Energy = 7.7 TeV Luminosity =1.65x10 34 cm -2 sec -1 Startup date: 2007 p p LHC Magnet LHC Tunnel TI 2 TI 8 The Large Hadron Collider CMS

25 Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 25 LHC Environment 14 TeV Proton colliding beams ParameterValue Bunch-crossing frequency40 MHz Average # of collisions / crossing 20 “interaction rate”~10 9 Average # of charged tracks1000 Radiation fieldsevere CMS ParameterValue Level-1 trigger rate100 kHz Mean time between triggers10  sec Trigger latency 3.2  sec Solenoid field4 T

26 Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 26 CMS Silicon Tracker Challenges

27 Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 27 CMS Si Tracker 5.4 m 2,4 m Inner Barrel & Disks (TIB & TID) Pixels Outer Barrel (TOB)

28 Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 28 Lots of Silicon 214m 2 of silicon sensors 11.4 million silicon strips 66 million pixels!

29 Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 29 Si Tracker Challenges Large and complex system 77.4 million total channels (out of a total of 78.2 M for experiment) Detector monitoring, data organization, data quality monitoring, analysis, visualization, interpretation all daunting! Need to monitor every channel and make sure most of the detector is working at all times (live fraction of the detector and efficiencies bound to decrease with time) Need to verify data integrity and data quality for physics Diagnose and fix problems ASAP Keep calibration and alignment parameters current

30 Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 30 Detector/Data Monitoring Monitor Environmental variables Temperatures, coolant flow rates, interlocks, radiation doses Hardware status Voltages, currents Channel Data Readout states, Errors, missing data/channels, bad ID for channel/module  many kinds to be categorized and tracked and displayed  should be able to find rare problems/errors (with low occurrence rate) that may corrupt data Problems (Rare problems may indicate a developing failure mode or hidden bad behavior)  Correlate problem/noisy channels with history, temperature, currents, etc.

31 Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 31 Data Quality Monitoring Monitor Raw Data Pedestals, noise, adc counts, occupancies, efficiencies Processed high level objects Clusters, tracks, etc. Evaluate thousands of histograms Can’t visually examine all Automatically evaluate histograms by comparing to reference histograms Adaptive, efficient, find evolving patterns over time Quantiles? q-q plots/comparison instead of KS test? A variety of 2D “heat” maps Occupancies, #of bad channels/module, #of errors/module, etc. Typical occupancy ~ 2% in strip tracker 200,000 channels written out 100 times/sec

32 Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 32 Module Assembly Precision Example of a “Heat” map

33 Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 33 Need smart approaches What are the best techniques for data-mining? To organize data for analysis and data visualization complex geometry/addressing makes visualization difficult For finding problematic channels quickly, efficiently  clustering, exploratory data-mining For finding anomalies, corrupt data, patterns of behavior  Feature-finding algorithms, superpose many events, time evolution, spatial and temporal correlations Noise Correlations Via correlation coefficients of defined groups Correlate to history (time variations), environmental variables

34 Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 34 Data Visualization Based on hierarchical/geometrical structure of the tracker Display every channel, attach objects/info to each Sub-structures Layers/rings Modules Readout Chips

35 Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 35 Multivariate Analysis Issues Dimensionality Reduction Choosing Variables optimally without losing information Choosing the right method for the problem Controlling Model Complexity Testing Convergence Validation Given a limited sample what is the best way? Computational Efficiency

36 Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 36 Multivariate Analysis Issues Correctness of modeling How do we make sure the multivariate modeling is correct? The data used for training or building PDEs represent reality. Is it sufficient to check the modeling in the mapped variable? Pair-wise correlations? Higher order correlations? How do we show that the background is modeled well? How do we quantify the correctness of modeling? In conventional analysis, we normally look for variables that are well modeled in order to apply cuts How well is the background modeled in the signal region? Worries about hidden bias Worries about underestimating errors

37 Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 37 Sociological Issues We have been conservative in the use of MV methods for discovery. We have been more aggressive in the use of MV methods for setting limits. But discovery is more important and needs all the power you can muster! This is expected to change at LHC.

38 Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 38 Summary The next generation of experiments will need to adopt advanced data mining and data analysis techniques Conventional/routine tasks such as alignment, detector performance and data quality monitoring and data visualization will be challenging and require new approaches Many issues regarding use of multivariate methods of data analysis for discoveries and measurements need to be addressed to make optimal use of data

39 Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 39 MV: Where can we use them? Almost everywhere since HEP events are multivariate Improve several aspects of analysis Event selection Triggering, Real-time Filters, Data Streaming Event reconstruction Tracking/vertexing, particle ID Signal/Background Discrimination Higgs discovery, SUSY discovery, Single top, … Functional Approximation Jet energy corrections, tag rates, fake rates Parameter estimation Top quark mass, Higgs mass, SUSY model parameters Data Exploration Knowledge Discovery via data-mining Data-driven extraction of information, latent structure analysis


Download ppt "Pushpa Bhat, Fermilab ACAT 2007 Apr 23-27, Amsterdam 1 Multivariate Methods in HEP Pushpa Bhat Fermilab."

Similar presentations


Ads by Google