Top Thinkshop-2 Nov. 10-12, 2000 Pushpa Bhat1 Advanced Analysis Algorithms for Top Analysis Pushpa Bhat Fermilab Top Thinkshop 2 Fermilab, IL November.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) ETHEM ALPAYDIN © The MIT Press, 2010
Pattern Recognition and Machine Learning
Supervised Learning Recap
Continuous simulation of Beyond-Standard-Model processes with multiple parameters Jiahang Zhong (University of Oxford * ) Shih-Chang Lee (Academia Sinica)
Lecture 3 Nonparametric density estimation and classification
Chapter 4: Linear Models for Classification
ACAT2000 Oct , 2000 Pushpa Bhat1 Advanced Analysis Techniques in HEP Pushpa Bhat Fermilab ACAT2000 Fermilab, IL October 2000 A reasonable man adapts.
Summary of Results and Projected Sensitivity The Lonesome Top Quark Aran Garcia-Bellido, University of Washington Single Top Quark Production By observing.
Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.
x – independent variable (input)
MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 
Searching for Single Top Using Decision Trees G. Watts (UW) For the DØ Collaboration 5/13/2005 – APSNW Particles I.
0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Summary of Results and Projected Precision Rediscovering the Top Quark Marc-André Pleier, Universität Bonn Top Quark Pair Production and Decay According.
Top Physics at the Tevatron Mike Arov (Louisiana Tech University) for D0 and CDF Collaborations 1.
On the Trail of the Higgs Boson Meenakshi Narain.
Bayesian Neural Networks Pushpa Bhat Fermilab Harrison Prosper Florida State University.
Machine Learning CMPT 726 Simon Fraser University
End of Chapter 8 Neil Weisenfeld March 28, 2005.
Multivariate Analysis A Unified Perspective
Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Ensemble Learning (2), Tree and Forest
Introduction to machine learning
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
Gaussian process modelling
July 11, 2001Daniel Whiteson Support Vector Machines: Get more Higgs out of your data Daniel Whiteson UC Berkeley.
Outline Separating Hyperplanes – Separable Case
Pushpa Bhat Fermilab August 6, Pushpa Bhat DPF2015  Over the past 25 years, Multivariate analysis (MVA) methods have gained gradual acceptance.
G. Cowan Lectures on Statistical Data Analysis Lecture 7 page 1 Statistical Data Analysis: Lecture 7 1Probability, Bayes’ theorem 2Random variables and.
1 g g s Richard E. Hughes The Ohio State University for The CDF and D0 Collaborations Low Mass SM Higgs Search at the Tevatron hunting....
Harrison B. Prosper Workshop on Top Physics, Grenoble Bayesian Statistics in Analysis Harrison B. Prosper Florida State University Workshop on Top Physics:
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
G. Cowan Statistical Methods in Particle Physics1 Statistical Methods in Particle Physics Day 3: Multivariate Methods (II) 清华大学高能物理研究中心 2010 年 4 月 12—16.
Comparison of Bayesian Neural Networks with TMVA classifiers Richa Sharma, Vipin Bhatnagar Panjab University, Chandigarh India-CMS March, 2009 Meeting,
Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.
Use of Multivariate Analysis (MVA) Technique in Data Analysis Rakshya Khatiwada 08/08/2007.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
D. M. J. Tax and R. P. W. Duin. Presented by Mihajlo Grbovic Support Vector Data Description.
N. Saoulidou & G. Tzanakos1 ANN Basics : Brief Review N. Saoulidou, Fermilab & G. Tzanakos, Univ. of Athens.
October 19, 2000ACAT 2000, Fermilab, Suman B. Beri Top Quark Mass Measurements Using Neural Networks Suman B. Beri, Rajwant Kaur Panjab University, India.
Lecture3 – Overview of Supervised Learning Rice ELEC 697 Farinaz Koushanfar Fall 2006.
Jakob Verbeek December 11, 2009
Measurements of Top Quark Properties at Run II of the Tevatron Erich W.Varnes University of Arizona for the CDF and DØ Collaborations International Workshop.
Chapter1: Introduction Chapter2: Overview of Supervised Learning
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Chapter 13 (Prototype Methods and Nearest-Neighbors )
Data Mining and Decision Support
Computational Intelligence: Methods and Applications Lecture 29 Approximation theory, RBF and SFN networks Włodzisław Duch Dept. of Informatics, UMK Google:
Single top quark physics Peter Dong, UCLA on behalf of the CDF and D0 collaborations Les Rencontres de Physique de la Vallee d’Aoste Wednesday, February.
Axel Naumann, DØ University of Nijmegen, The Netherlands 04/20/2002 APS April Meeting 2002 Prospects of the Multivariate B Quark Tagger for the Level 2.
Giansalvo EXIN Cirrincione unit #4 Single-layer networks They directly compute linear discriminant functions using the TS without need of determining.
Computational Intelligence: Methods and Applications Lecture 15 Model selection and tradeoffs. Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
Jessica Levêque Rencontres de Moriond QCD 2006 Page 1 Measurement of Top Quark Properties at the TeVatron Jessica Levêque University of Arizona on behalf.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
Investigation on CDF Top Physics Group Ye Li Graduate Student UW - Madison.
Low Mass Standard Model Higgs Boson Searches at the Tevatron Andrew Mehta Physics at LHC, Split, Croatia, September 29th 2008 On behalf of the CDF and.
Helge VossAdvanced Scientific Computing Workshop ETH Multivariate Methods of data analysis Helge Voss Advanced Scientific Computing Workshop ETH.
1 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Machine learning, pattern recognition and statistical data modelling.
Multivariate Analysis Past, Present and Future
Neuro-Computing Lecture 4 Radial Basis Function Network
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Pattern Recognition and Machine Learning
Top mass measurements at the Tevatron and the standard model fits
Measurement of the Single Top Production Cross Section at CDF
Presentation transcript:

Top Thinkshop-2 Nov , 2000 Pushpa Bhat1 Advanced Analysis Algorithms for Top Analysis Pushpa Bhat Fermilab Top Thinkshop 2 Fermilab, IL November 2000 A reasonable man adapts himself to the world. An unreasonable man persists to adapt the world to himself. So, all So, all progress depends on the unreasonable one. - Bernard Shaw

Top Thinkshop-2 Nov , 2000 Pushpa Bhat2 What do we gain? b-tag efficiency in Run I: DØ ~20%, CDF ~53% But, DØ was able to measure the top quark mass with a precision approaching that of CDF by using multivariate techniques to separate signal and background while minimizing the correlation of the selection with the top quark mass.

Top Thinkshop-2 Nov , 2000 Pushpa Bhat3 Optimal Analysis Methods The new generation of experiments will be a lot more demanding than the previous in data handling at all stages The time-honored procedure of choosing and applying cuts on one event variable at a time is rarely optimal! The measurements being multivariate, the optimal methods of analyses are necessarily multivariate Discriminant Analysis: Partition multidimensional variable space, identify boundaries between classes of objects Cluster Analysis: Assign objects to groups based on similarity Regression Analysis: Functional approximation/fitting

Top Thinkshop-2 Nov , 2000 Pushpa Bhat4 Data Analysis Tasks Particle Identification e-ID,  -ID, b-ID,  , q/g Signal/Background Event Classification Signals of new physics are rare and small (Finding a “jewel” in a hay-stack) Parameter Estimation t mass, H mass, track parameters, for example Function Approximation Correction functions, tag rates, fake rates Data Exploration Data-driven extraction of information, latent structure analysis

Top Thinkshop-2 Nov , 2000 Pushpa Bhat5 x1 x2 Why Multivariate Methods? x1 x2  Because they are optimal! D(x1,x2)=2.014x x2

Top Thinkshop-2 Nov , 2000 Pushpa Bhat6 Optimal Event Selection defines decision boundaries that minimize the probability of misclassification So, the problem mathematically reduces to that of calculating r(x), the Bayes Discriminant Function or probability densities Posterior probability

Top Thinkshop-2 Nov , 2000 Pushpa Bhat7 Probability Density Estimators Histogramming: The basic problem of non-parametric density estimation is very simple! Histogram data in M bins in each of the d feature variables M d bins  Curse Of Dimensionality In high dimensions, we would either require a huge number of data points or most of the bins would be empty leading to an estimated density of zero. But, the variables are generally correlated and hence tend to be restricted to a sub-space  Intrinsic Dimensionality

Top Thinkshop-2 Nov , 2000 Pushpa Bhat8 Kernel-Based Methods Akin to Histogramming but adopts importance sampling Place in d-dimensional space a hypercube of side h centered on each data point x, The estimate will have discontinuities Can be smoothed out using different forms for kernel functions H(u). A common choice is a multivariate Gaussian kernel N = Number of data points H(u) = 1 if x n in the hypercube = 0 otherwise h=smoothing parameter

Top Thinkshop-2 Nov , 2000 Pushpa Bhat9 Place a hyper-sphere centered at each data point x and allow the radius to grow to a volume V until it contains K data points. Then, density at x If our data set contains N k points in class C k and N points in total, then K nearest-neighbor Method N = Number of data points K k = # of points in volume V for class C k V for class C k

Top Thinkshop-2 Nov , 2000 Pushpa Bhat10 Discriminant Approximation with Neural Networks Output of a feed forward neural network can approximate the Bayesian posterior probability p(s|x,y) Directly without estimating class-conditional probabilities

Top Thinkshop-2 Nov , 2000 Pushpa Bhat11 Calculating the Discriminant Consider the sum Where 1 d i = 1 for signal 0 = 0 for background  = vector of parameters Then in the limit of large data samples and provided that the function n(x,y,  ) is flexible enough.

Top Thinkshop-2 Nov , 2000 Pushpa Bhat12  NN estimates a mapping function without requiring a mathematical description of how the output formally depends on the input.  The “hidden” transformation functions, g, adapt themselves to the data as part of the training process. The number of such functions need to grow only as the complexity of the problem grows. x1x1 x2x2 x3x3 x4x4 D NN Neural Networks

Top Thinkshop-2 Nov , 2000 Pushpa Bhat13 Why are NN models powerful? Neural networks are universal approximators With a sufficiently large NN, you can approximate a function to arbitrary accuracy Convergence of approximation is rapid High dimensionality is not a curse any more! Model complexity can be controlled by regularization Extrapolate gracefully

Top Thinkshop-2 Nov , 2000 Pushpa Bhat14 Also, they need to have optimal flexibility/complexity x1 x2 Mth Order Polynomial Fit M=1M=3M=10 x1 x2 x1 x2 S i mple Flexible Highly flexible

Top Thinkshop-2 Nov , 2000 Pushpa Bhat15 The Golden Rule Keep it simple As simple as possible Not any simpler - Einstein

Top Thinkshop-2 Nov , 2000 Pushpa Bhat16 Measuring the Top Quark Mass The Discriminants Discriminant variables shaded = top DØ

Top Thinkshop-2 Nov , 2000 Pushpa Bhat17 Background- rich Signal-rich Measuring the Top Quark Mass m t = ± 5.6(stat.) ± 6.2 (syst.) GeV/c 2 DØ Lepton+jets

Strategy for Discovering the Higgs Boson at the Tevatron P.C. Bhat, R. Gilmartin, H. Prosper, PRD 62 (2000) hep-ph/

Top Thinkshop-2 Nov , 2000 Pushpa Bhat19 WH Results from NN Analysis M H = 100 GeV/c 2 WH WH vs Wbb

Top Thinkshop-2 Nov , 2000 Pushpa Bhat20 WH (110 GeV/c2) NN Distributions

Top Thinkshop-2 Nov , 2000 Pushpa Bhat21 Results, Standard vs. NN A good chance of discovery up to M H = 130 GeV/c 2 with 20-30fb - 1

Top Thinkshop-2 Nov , 2000 Pushpa Bhat22 Improving the Higgs Mass Resolution 13.8% 12.2% 13.1% 11..3% 13%11% Use m jj and H T (=  E t jets ) to train NNs to predict the Higgs boson mass

Top Thinkshop-2 Nov , 2000 Pushpa Bhat23 Newer Approaches Ensembles of Networks Committees of Networks Performance can be better than the best single network Stacks of Networks Control both bias and variance Mixture of Experts Decompose complex problems

Top Thinkshop-2 Nov , 2000 Pushpa Bhat24 Bayesian Reasoning The Bayesian approach provides a well-founded mathematical procedure to make straight-forward and meaningful model comparisons. It also allows treatment of all uncertainties in a consistent manner. Examples of useful applications: Fitting binned data to multi-source models PLB 407 (1997) 73 Extraction of solar neutrino survival probability PRL 81(1998) 5056 Mathematically linked to adaptive algorithms such as Neural Networks (NN) Hybrid methods involving NN for probability density estimation and Bayesian treatment can be very powerful

Top Thinkshop-2 Nov , 2000 Pushpa Bhat25 Summary Multivariate methods have already made impact discoveries and precision measurements and will be the methods of choice in future analyses. We have only scratched the surface in our use of advanced analysis algorithms. Hybrid methods combining “intelligent” algorithms and probabilistic approach will be the wave of the future!