Exploring SUSY Parameter Space A New Bayesian Approach

Slides:



Advertisements
Similar presentations
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Advertisements

KET-BSM meeting Aachen, April 2006 View from the Schauinsland in Freiburg a couple of weeks ago View from the Schauinsland in Freiburg a couple of weeks.
Week 11 Review: Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution.
CHAPTER 8 More About Estimation. 8.1 Bayesian Estimation In this chapter we introduce the concepts related to estimation and begin this by considering.
Biointelligence Laboratory, Seoul National University
Practical Statistics for LHC Physicists Bayesian Inference Harrison B. Prosper Florida State University CERN Academic Training Lectures 9 April, 2015.
2 – In previous chapters: – We could design an optimal classifier if we knew the prior probabilities P(wi) and the class- conditional probabilities P(x|wi)
Chapter 4: Linear Models for Classification
Ai in game programming it university of copenhagen Statistical Learning Methods Marco Loog.
Visual Recognition Tutorial
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Machine Learning CMPT 726 Simon Fraser University
. PGM: Tirgul 10 Parameter Learning and Priors. 2 Why learning? Knowledge acquisition bottleneck u Knowledge acquisition is an expensive process u Often.
Visual Recognition Tutorial
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
Statistical Analysis of Systematic Errors and Small Signals Reinhard Schwienhorst University of Minnesota 10/26/99.
1 Bayesian methods for parameter estimation and data assimilation with crop models Part 2: Likelihood function and prior distribution David Makowski and.
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
Distinguishability of Hypotheses S.Bityukov (IHEP,Protvino; INR RAS, Moscow) N.Krasnikov (INR RAS, Moscow ) December 1, 2003 ACAT’2003 KEK, Japan S.Bityukov.
Bayesian Inference, Basics Professor Wei Zhu 1. Bayes Theorem Bayesian statistics named after Thomas Bayes ( ) -- an English statistician, philosopher.
Harrison B. Prosper Workshop on Top Physics, Grenoble Bayesian Statistics in Analysis Harrison B. Prosper Florida State University Workshop on Top Physics:
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
Bayesian Inversion of Stokes Profiles A.Asensio Ramos (IAC) M. J. Martínez González (LERMA) J. A. Rubiño Martín (IAC) Beaulieu Workshop ( Beaulieu sur.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Conjugate Priors Multinomial Gaussian MAP Variance Estimation Example.
Learning Theory Reza Shadmehr Linear and quadratic decision boundaries Kernel estimates of density Missing data.
G. Cowan Lectures on Statistical Data Analysis Lecture 1 page 1 Lectures on Statistical Data Analysis London Postgraduate Lectures on Particle Physics;
Practical Statistics for Particle Physicists Lecture 3 Harrison B. Prosper Florida State University European School of High-Energy Physics Anjou, France.
Center for Radiative Shock Hydrodynamics Fall 2011 Review Assessment of predictive capability Derek Bingham 1.
Learning Theory Reza Shadmehr LMS with Newton-Raphson, weighted least squares, choice of loss function.
On optimal quantization rules for some sequential decision problems by X. Nguyen, M. Wainwright & M. Jordan Discussion led by Qi An ECE, Duke University.
1. OPERATONAL! 2 ? 3 ? data’s the best companion! faithful, o, but stubborn. need some kind of natural push to get it going on and on.. 4.
Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.
Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 07: BAYESIAN ESTIMATION (Cont.) Objectives:
Conditional Probability Mass Function. Introduction P[A|B] is the probability of an event A, giving that we know that some other event B has occurred.
Practical Statistics for Particle Physicists Lecture 2 Harrison B. Prosper Florida State University European School of High-Energy Physics Parádfürdő,
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: MLLR For Two Gaussians Mean and Variance Adaptation MATLB Example Resources:
6. Population Codes Presented by Rhee, Je-Keun © 2008, SNU Biointelligence Lab,
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
From Small-N to Large Harrison B. Prosper SCMA IV, June Bayesian Methods in Particle Physics: From Small-N to Large Harrison B. Prosper Florida State.
CSC321: Lecture 8: The Bayesian way to fit models Geoffrey Hinton.
Fundamentals of Data Analysis Lecture 11 Methods of parametric estimation.
Practical Statistics for Particle Physicists Lecture 3 Harrison B. Prosper Florida State University European School of High-Energy Physics Parádfürdő,
Bayesian Estimation and Confidence Intervals Lecture XXII.
Discussion on significance
Univariate Gaussian Case (Cont.)
Lecture 1.31 Criteria for optimal reception of radio signals.
Chapter 3: Maximum-Likelihood Parameter Estimation
Bayesian Estimation and Confidence Intervals
12. Principles of Parameter Estimation
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
Confidence Intervals and Limits
Model Inference and Averaging
Course: Autonomous Machine Learning
Latent Variables, Mixture Models and EM
Statistical Learning Dong Liu Dept. EEIS, USTC.
Hidden Markov Models Part 2: Algorithms
More about Posterior Distributions
Bayesian Models in Machine Learning
Bayesian Inference, Basics
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
LECTURE 21: CLUSTERING Objectives: Mixture Densities Maximum Likelihood Estimates Application to Gaussian Mixture Models k-Means Clustering Fuzzy k-Means.
LECTURE 09: BAYESIAN LEARNING
LECTURE 07: BAYESIAN ESTIMATION
Parametric Methods Berlin Chen, 2005 References:
Mathematical Foundations of BME
Ch 3. Linear Models for Regression (2/2) Pattern Recognition and Machine Learning, C. M. Bishop, Previously summarized by Yung-Kyun Noh Updated.
12. Principles of Parameter Estimation
Data Exploration and Pattern Recognition © R. El-Yaniv
Presentation transcript:

Exploring SUSY Parameter Space A New Bayesian Approach J. Lykken3, M. Pierini2, H.B. Prosper4, C. Rogan2, S. Sekmen4, M. Spiropulu1,2 1Caltech, 2CERN, 3FNAL, 4FSU Characterization of New Physics at the LHC, II

Outline Statement of Problem General Strategy An Example Summary

Statement of Problem “the most pressing question at the LHC will be to figure out whether there is any evidence for physics beyond the standard model, and then most broadly what theoretical framework best describes the new physics” Supersymmetry and the LHC Inverse Problem, N. Arkani-Hamed, G.L.Kane, J. Thaler, L.Wang, JHEP 0608, 070 (2006)

Statement of Problem Data Basic problem: All interesting theories are multi-parameter models SM me, mμ, mτ mu, md, ms, mc, mb, mt θ12, θ23, θ13, δ g1, g2, g3 θQCD μ, λ Data MSSM Theory of the Week Basic questions: Which theories are preferred, given the data? And which parameter sub-spaces?

Statement of Problem  ★ ✚ Model degeneracy: The map from LHC signatures to parameter sub-spaces is not one-to-one. LHC Signatures MSSM Theory of the Week ★ ✚  Therefore, the most likely outcome of a discovery and its initial characterization will be a set of degenerate models. A model = a parameter sub-space

Some Remarks It is not possible to make progress without making some assumption about the nature of potential signals. For example, to model backgrounds using data one typically assumes that: signal << background in some control region. But, a priori, we do know whether the control region we have chosen enjoys this property. Therefore, as was the case with the characterization of the SM, the inverse problem will be solved iteratively.

General Strategy

General Strategy What do we wish to do…over the next decade or so? Answer the question: is there any evidence of new physics? Rank any collection of new physics models. Characterize the parameters of all viable models. Iteratively converge to the most viable new theory of physics. We propose using Bayesian methods to guide this research program. But this requires imposing probability densities p(θ) on parameter spaces.

General Strategy p(θ) is controversial (witness JE). But, p(θ) is also extremely useful. Example: Suppose we wish to distinguish the xMSSM from the SM. After a huge amount of work, the SM has been confined to a small sub-space of the SM parameter space, the blue spot. This is not the case for the xMSSM. Suppose, you have p(θ) for the xMSSM, then from a Bayesian perspective the way forward is clear: maximize the “separation” between the blue spot and the green blob, using p(θ) as a weighting function. SM xMSSM

Bayes in 60 Seconds Given a model with parameters θ, in principle, any inverse problem can solved using Bayes theorem: Posterior density of model given x Likelihood of data x Prior density of model Appealing features: General and conceptually straightforward Systematic learning from data through a recursive algorithm Coherent way to incorporate uncertainties regardless of their origin Posterior density is the complete inference for a given model Can rank models according to their concordance with observations Principal difficulty: construction of priors

The Issue with Priors The construction of p(θ) is controversial because answers depend on it. Example: Lopez-Fogliani et al. arXiv:0906.4911v1 Left plot based on prior flat in m0, m1/2 Right plot based on prior flat in log m0, m1/2 However, the issue is not that one gets different answers. Rather the issue is in how the priors are chosen.

Reference Priors In 1979, J. Bernardo introduced a method for constructing priors specifically designed to minimize their influence relative to the likelihood. He called these reference priors. By definition, a reference prior π(θ) maximizes the separation between it and the posterior density. The separation measure used is the Kullback-Leibler divergence between the posterior density p(θ|x) and the prior π(θ). In practice, one uses D[π, p] averaged over all possible data-sets from K replications of the experiment, and let K go to infinity.

Reference Priors If the posterior density p(θ | x) is asymptotically normal, the reference prior π(θ) for models with one continuous parameter reduces to the Jeffreys prior (the square-root of the Fisher information F): This simplifies calculations considerably. The reference prior construction can be extended to more than one parameter; however, the single-parameter algorithm is sufficient for our purposes.

An Example

The xMSSM Alas poor SUSY sapiens…I knew them well… WARNING SUSY PARAMETER SPACE Alas poor SUSY sapiens…I knew them well…

Is there Evidence of a Signal? Construct a likelihood p(n|s, μ) n = observed count s = expected signal μ = expected background Construct a prior π(s, μ) = π(s| μ) π(μ) Use a reference prior for π(s|μ)* Use, for example, a gamma prior for π(μ) Compute the posterior density p(s, μ|n) ~ p(n|s, μ) π(s, μ) * “Reference priors for high energy physics,” L. Demortier, S. Jain, and H.B. Prosper, Phys.Rev.D82:034002, 2010

Is there Evidence of a Signal? Given the posterior density p(s, μ|n) and a measure of the separation between the s > 0 models and the s = 0 model, this question can be answered. We again use the Kullback-Leibler divergence, KL, to quantify the separation between models. But, since we know neither s nor μ, it is necessary to average over all possible values of s and μ: where KL(s,μ) = -s + (s+μ) ln(1+s/μ) is the KL divergence of a given s > 0 model from the s = 0 model. We call B(n) the Bayes signal significance.

Initial Characterization From the posterior density p(s, μ|n), we can compute which encapsulates what we know about the signal, given the count n, Since s = f(θ), p(s|n) also contains information about the xMSSM: To proceed further, we make the simplest possible assumption: points that yield the same signal are equi-probable. xMSSM δ Δ

Example – CMSSM We illustrate this approach with a simple class of models (CMSSM): free parameters: 150 < m0 < 600 and 0 < m1/2 < 1500 fixed parameters: A0 = 0, tanβ = 10 and μ > 0 We use the CMS SUSYbenchmark point LM1 with m0 = 60, m1/2 = 250, A0 = 0, tanβ = 10, μ > 0 as the “true state of nature”, which will provide the observed count n. For LM1 and for each point in a grid in the m0-m1/2 space, we generate 1000 7 TeV LHC events (using PYTHIA and PGS), do a simple analysis, and quote results for 1pb-1, 100 pb-1 and 500 pb-1.

Example – CMSSW xMSSM δ Δ

Example – CMSSW Now include EW/flavor observables (BR(b -> sγ), R(BR(b -> τν)), BR(b -> Dτν), BR(b -> Dτν)/BR(b -> eτν), Rl23, BR(Ds -> τν), BR(Ds -> μν) and Δρ). Since the the state of nature is LM1, we use the LM1 values for the observables along with the measured uncertainties in this example.

Summary Our goal over the next decade or so is to converge to the NSM (the New Standard Model). To do so, we need a well-defined way to rank models. This requires assigning a number to each model that permits comparison. The peak of the likelihood function cannot do this and likelihood ratios are only a partial answer. A Bayesian strategy using reference analysis, of which the reference prior is the key element, provides a well-founded way forward. We are currently working on its application to realistic models.