(Day 3).

Slides:



Advertisements
Similar presentations
Practical Statistics for LHC Physicists Bayesian Inference Harrison B. Prosper Florida State University CERN Academic Training Lectures 9 April, 2015.
Advertisements

G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem, random variables, pdfs 2Functions.
Bayesian Analysis of X-ray Luminosity Functions A. Ptak (JHU) Abstract Often only a relatively small number of sources of a given class are detected in.
CF-3 Bank Hapoalim Jun-2001 Zvi Wiener Computational Finance.
Page 1 Calculating the Beam Position at the Ecal for DESY Run (Independent of Tracking) Hakan Yilmaz.
Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.
G. Cowan Lectures on Statistical Data Analysis Lecture 14 page 1 Statistical Data Analysis: Lecture 14 1Probability, Bayes’ theorem 2Random variables and.
Computer vision: models, learning and inference
A) Transformation method (for continuous distributions) U(0,1) : uniform distribution f(x) : arbitrary distribution f(x) dx = U(0,1)(u) du When inverse.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 7 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
G. Cowan RHUL Physics Bayesian Higgs combination page 1 Bayesian Higgs combination using shapes ATLAS Statistics Meeting CERN, 19 December, 2007 Glen Cowan.
Statistical aspects of Higgs analyses W. Verkerke (NIKHEF)
Discovery Experience: CMS Giovanni Petrucciani (UCSD)
PATTERN RECOGNITION AND MACHINE LEARNING
W  eν The W->eν analysis is a phi uniformity calibration, and only yields relative calibration constants. This means that all of the α’s in a given eta.
1 Introduction to Dijet Resonance Search Exercise John Paul Chou, Eva Halkiadakis, Robert Harris, Kalanand Mishra and Jason St. John CMS Data Analysis.
1 Methods of Experimental Particle Physics Alexei Safonov Lecture #23.
Lab 3b: Distribution of the mean
Likelihood function and Bayes Theorem In simplest case P(B|A) = P(A|B) P(B)/P(A) and we consider the likelihood function in which we view the conditional.
1 A Bayesian statistical method for particle identification in shower counters IX International Workshop on Advanced Computing and Analysis Techniques.
Sample variance and sample error We learned recently how to determine the sample variance using the sample mean. How do we translate this to an unbiased.
- 1 - Bayesian inference of binomial problem Estimating a probability from binomial data –Objective is to estimate unknown proportion (or probability of.
Example: Bioassay experiment Problem statement –Observations: At each level of dose, 5 animals are tested, and number of death are observed.
Bayes Theorem. Prior Probabilities On way to party, you ask “Has Karl already had too many beers?” Your prior probabilities are 20% yes, 80% no.
1 Methods of Experimental Particle Physics Alexei Safonov Lecture #24.
G. Cowan Lectures on Statistical Data Analysis Lecture 8 page 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem 2Random variables and.
Chapter 2: Frequency Distributions. Frequency Distributions After collecting data, the first task for a researcher is to organize and simplify the data.
Maximum likelihood estimators Example: Random data X i drawn from a Poisson distribution with unknown  We want to determine  For any assumed value of.
G. Cowan Lectures on Statistical Data Analysis Lecture 12 page 1 Statistical Data Analysis: Lecture 12 1Probability, Bayes’ theorem 2Random variables and.
Systematics in Hfitter. Reminder: profiling nuisance parameters Likelihood ratio is the most powerful discriminant between 2 hypotheses What if the hypotheses.
Estimating Uncertainty. Estimating Uncertainty in ADMB Model parameters and derived quantities Normal approximation Profile likelihood Bayesian MCMC Bootstrap.
Hands-on exercises *. Getting started – ROOT 5.25/02 setup Start a ROOT 5.25/02 session –On your local laptop installation, or –On lxplus (SLC4) or lx64slc5.
Wouter Verkerke, UCSB Data Analysis Exercises - Day 2 Wouter Verkerke (NIKHEF)
Hands-on Session RooStats and TMVA Exercises
Chapter 8: Estimating with Confidence
Status of the Higgs to tau tau
Lecture 1.31 Criteria for optimal reception of radio signals.
Advanced Quantitative Techniques
The asymmetric uncertainties on data points in Roofit
Data Assimilation Research Testbed Tutorial
Oliver Schulte Machine Learning 726
MCMC Output & Metropolis-Hastings Algorithm Part I
Making inferences from collected data involve two possible tasks:
Monte Carlo Simulation
12. Principles of Parameter Estimation
(5) Notes on the Least Squares Estimate
Ex1: Event Generation (Binomial Distribution)
Sampling Distributions and Estimation
Statistics: The Z score and the normal distribution
Two Interpretations of What it Means to Normalize the Low Energy Monte Carlo Events to the Low Energy Data Atms MC Atms MC Data Data Signal Signal Apply.
This Week Review of estimation and hypothesis testing
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
Grégory Schott Institute for Experimental Nuclear Physics
Bayes Net Learning: Bayesian Approaches
Lecture 4 1 Probability (90 min.)
Modelling data and curve fitting
Chi Square Distribution (c2) and Least Squares Fitting
Multidimensional Integration Part I
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
10701 / Machine Learning Today: - Cross validation,
Continuous Statistical Distributions: A Practical Guide for Detection, Description and Sense Making Unit 3.
Lecture 4 1 Probability Definition, Bayes’ theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests general.
Functions of Random variables
Learning From Observed Data
9. Data Fitting Fitting Experimental Spectrum
Lecture 4 1 Probability Definition, Bayes’ theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests general.
Timescales of Inference in Visual Adaptation
12. Principles of Parameter Estimation
MAS 622J Course Project Classification of Affective States - GP Semi-Supervised Learning, SVM and kNN Hyungil Ahn
Presentation transcript:

(Day 3)

Exercise 13 – A Bayesian interval In this exercise we will construct a Bayesian interval for an example measurement. Copy input file ex13.C, have a look at the first block of code and run it. You will see a plot of a Voigtian distribution, which is the convolution of a Breit-Wigner shape (=1/(x2+Г2)) which is typical for mass resonances and a Gaussian, which is typical the detector resolution. The convolution of these two pdfs describes what a mass resonance will look like after detection, and is analytically calculated in class RooVoigtian. The plot shows three cases: the pure Breit-Wigner distribution in red, the Voigtian distribution with Г=3 and σ=1 (blue) and with Г=3 and σ=3 (dashed). Now uncomment BLOCK 1 and rerun. We generate a toy dataset at Г=3 and σ=1 and fit that with our model. Note the strong correlation between Г and σ in the fit (which is typical if σ and Г are of similar magnitude) A simple Bayesian interval In this exercise Г(width) is the parameter of interest and σ is a nuisance parameter. We first run a scenario in which we assume that σ is known exactly (i.e. there are no nuisance parameters). Now uncomment BLOCK 2 and rerun. Here the –log(L) function is constructed and plotted versus Г Wouter Verkerke, NIKHEF

Exercise 13 – A Bayesian interval Now uncomment Block 3 and rerun. This code constructs the Likelihood function from the –log(L) function (through exponentiation) and constructs a Bayesian posterior function from the likelihood by multiplying it with a flat prior function in Г. The posterior is then plotted. A 68% central Bayesian interval can now be defined by an area that integrates 68% of the likelihood and leaves 16% on each side. Now uncomment Block 4. The additional code will create the cumulative distribution function (cdf) of the Bayesian posterior, i.e. ∫Г∞P(Г’)dГ’, which simplifies the calculation of the interval: The interval is now delimited by the values of Г where the CDF crosses values 0.16 and 0.84. Determine what the Bayesian interval is (use can select cross-hairs on the canvas to simplify this task: click the right mouse button in the area above the plot frame and select the item SetCrosshairs). Write down the interval for future comparison. A Bayesian interval with a nuisance parameter We now consider the case where we don’t know σ a priori and need to constrain it from the data. Thus σ is now a nuisance parameter. Wouter Verkerke, NIKHEF

Exercise 13 – A Bayesian interval Uncomment Block 5 and rerun. The code that is added will show the –log(L) and the L distribution versus (Г,σ). (You can again see from the L distribution that there is a significant correlation between Г and σ. Uncomment Block 6 and rerun. We now follow the Bayesian procedure for the elimination of nuisance parameters: we integrate the posterior distribution (takes as the likelihood times a flat prior in both σ and Г) over the nuisance parameter σ and plot the resulting posterior in Г as well as the corresponding cumulative distribution (the required integrations may take a few minutes). Compare the distribution of this posterior to that of the previous scenario (with fixed σ). Calculate the 68% central interval from the c.d.f. and write down the values for future reference A Bayesian interval with a nuisance parameter with non-uniform prior For illustration we can also choose a non-uniform prior for the σ parameter, e.g. a Gaussian with mean 1 and width 0.1. This represent a knowledge on the value of σ that is about three times as precise as what can be inferred from the data. Uncomment Block 7, rerun (will again take a few minutes) and calculate the 68% central interval for this case. Wouter Verkerke, NIKHEF

Exercise 14 – Profile Likelihood interval This exercise aims to calculate a interval on Г for the same problem as Ex13, but now using a Profile Likelihood method. Copy file ex14.C, look at the code and run it. The code in this exercise sets up the same model and data as Ex13, constructs the likelihood and plots the likelihood as function of Г, assuming a known fixed value of σ=1. Calculate the 68% likelihood interval by finding the values of Г that correspond to LR=+0.5. Write down the value and compare it to the corresponding 68% Bayesian interval Uncomment Block1 and rerun. Now we move to the scenario where must constrain σ from the data and will consider it a nuisance parameter. The procedure to eliminate nuisance parameters in the profile likelihood method is to make a scan of the likelihood in Г where we plot for each point in Г the best (=lowest) value of the LR for any value σ (instead of the value of the LR at σ=1). [ This will invariable widen the distribution. Try to understand why that is the case]

Exercise 14 – Profile Likelihood interval The newly added code creates the profile likelihood function (which is represented by a function object in RooFit as well (which will internally call MINUIT to perform the minimization of the likelihood w.r.t. the nuisance parameters everytime it is called) Calculate the 68% profile likelihood interval by finding the values of Г that correspond to PLR=+0.5. Write down the value and compare it to the corresponding 68% Bayesian interval Finally uncomment Block 3 and run again. The newly added code will provide a visualization of how the model shape changes in the profile likelihood as function of the parameter of interest Г. A canvas with 9 pads is created, which correspond to the situation at Г=0.5,1,...,4.5. Each pad shows the data (which is always the same), and in red the model with σ=1, and in blue the model with the value of σ that gives the best fit (the likelihood of this best fit is used in the profile likelihhod) Wouter Verkerke, NIKHEF

Exercise 15 – Testing Wilks theorem This exercise aims to test the validity of Wilks theorem for the example analysis used in Ex13 and Ex14. Wilks theorem states that the distribution of the 2 times the Likelihood ratio will be that of a chi-squared distribution in the asymptotic case (i.e. N∞). In this exercise we will generate a series of toy Monte Carlo samples, calculate the likelihood ratio for each of them, plot the distribution and compare that distribution to that of the asymptotic chi-squared distribution. Copy file ex15.C, look at the code and run it (takes 1-2 minutes). Does the data look like the asymptotic distribution? You will have to put the y-axis in a log scale – to do so right-click just above the plot frame and select option SetLogY. Up to what value of the LLR do you have enough statistics to claim agreement. What is the corresponding level of significance? (remember LLR=0.5∙Z2) The provided code generate the LLR distribution for L(width=3)/L(width=best). Change the code so that it generates the distribution for L(width=1)/L(width=best) and rerun. Does the distribution look different? Wouter Verkerke, NIKHEF

Exercise 15 – Testing Wilks theorem So far we have checked the behavior of the likelihood ratio with σ fixed to 1. Next we check if the profile likelihood also follows the prediction of Wilks theorem. To do so we need to make σ a floating parameter of the model. Change ‘sigma[1]’ into ‘sigma[1,0.1,3]’ in the factory string and rerun. How many toy data samples do you need to generate and fit to validate Wilks theorem up to Z=5? (i.e. calculate first the corresponding probability, then take 100/prob as a rough estimate of the number of toys you need). How long would it take you to do that check? (based on the time it took you to run 1000 toys). What if you had 1000 CPUs available? Wouter Verkerke, NIKHEF