Physics-Informatics Looking for Higgs Particle Counting Errors (Finished) January 30 2013 Geoffrey Fox

Slides:



Advertisements
Similar presentations
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 18 Sampling Distribution Models.
Advertisements

Probability Distributions CSLU 2850.Lo1 Spring 2008 Cameron McInally Fordham University May contain work from the Creative Commons.
Week11 Parameter, Statistic and Random Samples A parameter is a number that describes the population. It is a fixed number, but in practice we do not know.
Estimation in Sampling
Statistics : Statistical Inference Krishna.V.Palem Kenneth and Audrey Kennedy Professor of Computing Department of Computer Science, Rice University 1.
Sampling Distributions (§ )
Central Limit Theorem.
Chapter 8: Probability: The Mathematics of Chance Lesson Plan
Introduction to the Continuous Distributions
TDC 369 / TDC 432 April 2, 2003 Greg Brewster. Topics Math Review Probability –Distributions –Random Variables –Expected Values.
Statistics.
Some standard univariate probability distributions
CHAPTER 6 Statistical Analysis of Experimental Data
Part 8: Poisson Model for Counts 8-1/34 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 6-1 Chapter 6 The Normal Distribution and Other Continuous Distributions.
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
Statistical inference Population - collection of all subjects or objects of interest (not necessarily people) Sample - subset of the population used to.
Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson6-1 Lesson 6: Sampling Methods and the Central Limit Theorem.
Chapter 7 Probability and Samples: The Distribution of Sample Means
Sampling Theory Determining the distribution of Sample statistics.
Named After Siméon-Denis Poisson.  Binomial and Geometric distributions only work when we have Bernoulli trials.  There are three conditions for those.
Primer on Statistics for Interventional Cardiologists Giuseppe Sangiorgi, MD Pierfrancesco Agostoni, MD Giuseppe Biondi-Zoccai, MD.
Chapter 7: The Normal Probability Distribution
Sampling Theory Determining the distribution of Sample statistics.
Standard Statistical Distributions Most elementary statistical books provide a survey of commonly used statistical distributions. The reason we study these.
Chapter 6: Probability Distributions
880.P20 Winter 2006 Richard Kass 1 Confidence Intervals and Upper Limits Confidence intervals (CI) are related to confidence limits (CL). To calculate.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 7 Sampling Distributions.
STA Lecture 161 STA 291 Lecture 16 Normal distributions: ( mean and SD ) use table or web page. The sampling distribution of and are both (approximately)
Lecture 3 Sampling distributions. Counts, Proportions, and sample mean.
Estimation of Statistical Parameters
1 Probability and Statistics  What is probability?  What is statistics?
Dept of Bioenvironmental Systems Engineering National Taiwan University Lab for Remote Sensing Hydrology and Spatial Modeling STATISTICS Random Variables.
Modeling and Simulation CS 313
Topics: Statistics & Experimental Design The Human Visual System Color Science Light Sources: Radiometry/Photometry Geometric Optics Tone-transfer Function.
Random Sampling, Point Estimation and Maximum Likelihood.
Random Variables Numerical Quantities whose values are determine by the outcome of a random experiment.
Copyright © 2009 Pearson Education, Inc. Chapter 18 Sampling Distribution Models.
1 Chapter 18 Sampling Distribution Models. 2 Suppose we had a barrel of jelly beans … this barrel has 75% red jelly beans and 25% blue jelly beans.
Biostatistics, statistical software III. Population, statistical sample. Probability, probability variables. Important distributions. Properties of the.
1 Let X represent a Binomial r.v as in (3-42). Then from (2-30) Since the binomial coefficient grows quite rapidly with n, it is difficult to compute (4-1)
Lecture 2 Review Probabilities Probability Distributions Normal probability distributions Sampling distributions and estimation.
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 18 Sampling Distribution Models.
Chapter 8 Sampling Variability and Sampling Distributions.
EAS31116/B9036: Statistics in Earth & Atmospheric Sciences Lecture 3: Probability Distributions (cont’d) Instructor: Prof. Johnny Luo
Confidence Interval Estimation For statistical inference in decision making:
Random Variable The outcome of an experiment need not be a number, for example, the outcome when a coin is tossed can be 'heads' or 'tails'. However, we.
CY1B2 Statistics1 (ii) Poisson distribution The Poisson distribution resembles the binomial distribution if the probability of an accident is very small.
Inference: Probabilities and Distributions Feb , 2012.
Ka-fu Wong © 2003 Chap 6- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.
Review of Probability. Important Topics 1 Random Variables and Probability Distributions 2 Expected Values, Mean, and Variance 3 Two Random Variables.
Sampling and estimation Petter Mostad
1 Probability and Statistical Inference (9th Edition) Chapter 5 (Part 2/2) Distributions of Functions of Random Variables November 25, 2015.
Review Session Chapter 2-5.
Random Variables Numerical Quantities whose values are determine by the outcome of a random experiment.
Probability Theory Modelling random phenomena. Permutations the number of ways that you can order n objects is: n! = n(n-1)(n-2)(n-3)…(3)(2)(1) Definition:
Chapter 8: Probability: The Mathematics of Chance Probability Models and Rules 1 Probability Theory  The mathematical description of randomness.  Companies.
SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. SAMPLING AND SAMPLING VARIATION Sample Knowledge of students No. of red blood cells in a person Length of.
SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. SAMPLING AND SAMPLING VARIATION Sample Knowledge of students No. of red blood cells in a person Length of.
Chapter 6 Large Random Samples Weiqi Luo ( 骆伟祺 ) School of Data & Computer Science Sun Yat-Sen University :
G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1 Statistical Data Analysis: Lecture 5 1Probability, Bayes’ theorem 2Random variables and.
 Poisson process is one of the most important models used in queueing theory.  Often the arrival process of customers can be described by a Poisson.
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
Evaluating Hypotheses. Outline Empirically evaluating the accuracy of hypotheses is fundamental to machine learning – How well does this estimate its.
Introduction to Probability - III John Rundle Econophysics PHYS 250
Continuous Probability Distributions
SUR-2250 Error Theory.
Named After Siméon-Denis Poisson
Statistics and Data Analysis
Sampling Distributions (§ )
Presentation transcript:

Physics-Informatics Looking for Higgs Particle Counting Errors (Finished) January Geoffrey Fox Associate Dean for Research and Graduate Studies, School of Informatics and Computing Indiana University Bloomington 2013

Random Numbers I Would be really random if we observed nuclei decaying but in fact nearly all random numbers are “pseudo random numbers” using “principle” that complicated messy thing end up with random results – In particular low order digits of integer arithmetic are essentially random Python uses Mersenne Twister method built around Mersenne primes Pseudorandom numbers are deterministic; they are generated in order and for a given starting point – seed – the numbers are completely determined Some computers always use same seed and so always return same sequence by default Python takes seed from time of day (pretty random) and generates a different sequence each call

Random Numbers II Sometimes its good to get random numbers but always have the same e.g. if you are debugging something Use random.seed(seed=some integer) to guarantee same start random.seed(seed= ) figure("On Top of Each Other") Base = * np.random.rand(42000) plt.hist(Base, bins=15, range =(110,140), alpha = 0.5, color="blue") # random.seed(seed= ) Base2 = * np.random.rand(42000) plt.hist(Base2, bins=15, range =(110,140), alpha = 0.5, color="yellow") plt.title("Two histograms on top of each other as identical random numbers")

More on Seeds figure("Different") # random.seed(seed= ) Base4 = * np.random.rand(42000) plt.hist(Base4, bins=15, range =(110,140), alpha = 0.5, color="blue") Base1 = * np.random.rand(42000) # Note Base1 starts where Base4 ends so is a different set of random numbers plt.hist(Base1, bins=15, range =(110,140), alpha = 0.5, color="green") # random.seed(seed= ) Base3 = * np.random.rand(42000) plt.hist(Base3, bins=15, range =(110,140), alpha = 0.5, color="red")

Look at Numbers random.seed(seed= ) Base = * np.random.rand(42000) array([ , , ,..., , , ]) random.seed(seed= ) Base2 = * np.random.rand(42000) array([ , , ,..., , , ]) Base1 = * np.random.rand(42000) array([ , , ,..., , , ]) random.seed(seed= ) Base3 = * np.random.rand(42000) array([ , , ,..., , , ])

√100 = 10 Funny dip in middle

Looks better but just luck

Counting and Binomial Distribution I Suppose random variable X = 1 (event in a particular bin of histogram) and X=0 (event not in bin) Let  be probability that a particular event lies in a bin This is coin tossing problem with  probability of heads If a random variable takes just two values with probability  (for 1) and 1-  (for 0), then Average of X = {0. (1-  ) + 1.  }/{(1-  ) +  } =  Average of (X-  ) 2 = {(-  ) 2. (1-  ) + (1-  ) 2.  } = (1-  ) 

Counting and Binomial Distribution II Now in our application  is tiny So Average of X = Average of (X-  ) 2 =  i.e. standard deviation of X is square root of mean If we have a sum of random variables with a binomial distribution O =  i=1 N X i Then O has mean N  and standard deviation = √N √  = √mean

Here is a histogram with 140 bins. We have a random variable for each bin. There is an excess of events in bin from 125 to 130 GeV

Comments Note some measurements have a small signal but a small background Others have a larger signal but also a larger background If you have signal N S and background N B, then statistical error is √(N S + N B ) and one needs √(N S + N B ) much smaller than N S which is harder than √N S much smaller than N S Typically one quotes “systematic errors” as well. These reflect model-based uncertainties in analysis and will not decrease like sqrt of total event sample

“Monte-Carlo Events” The events are very complicated with ~100 particles produced and the apparatus is also complex with multiple detection devices measuring energy, momentum and particle type. To understand how an event “should look” and estimate detection efficiencies, Monte Carlo events are produced These are “random” events produced by models that build in expected physics – One both generates the “fundamental collision” and tracks particles through apparatus Although they are motivated by physics, they have many adjustable parameters that are fitted to other data to describe aspects of “theory” that are not predicted “from first principles” One analyses Monte Carlo data with EXACTLY the same process used by real data Often amount of Monte Carlo data > that of real data

Poisson Distribution The Poisson distribution is what you get when you take binomial distribution and let  get small but keep product  N fixed. – This is actually what I used in discussing histograms The Poisson distribution describes exactly nuclear decays and can be defined as distribution of events at time T where in small time interval  t, the probability of an event arriving is  t Probability (k events) = ( T) k exp(- T)/ k! (Written in Wikipedia for case T=1 and varies not fixing and varying T) For Poisson: Mean =  2 = T

Comparison of the Poisson distribution (black lines) and thebinomial distribution with n=10 (red circles), n=20 (blue circles), n=1000 (green circles). All distributions have a mean of 5. The horizontal axis shows the number of events k. Notice that as n gets larger, the Poisson distribution becomes an increasingly better approximation for the binomial distribution with the same mean. (Wikipedia)

Poisson Examples I (Wikipedia) Applications of the Poisson distribution can be found in many fields related to counting: Electrical system example: telephone calls arriving in a system. Astronomy example: photons arriving at a telescope. Biology example: the number of mutations on a strand of DNA per unit length. Management example: customers arriving at a counter or call centre. Civil engineering example: cars arriving at a traffic light. Finance and insurance example: Number of Losses/Claims occurring in a given period of Time. Earthquake seismology example: An asymptotic Poisson model of seismic risk for large earthquakes.

Poisson Examples II (Wikipedia) The Poisson distribution arises in connection with Poisson processes. It applies to various phenomena of discrete properties (that is, those that may happen 0, 1, 2, 3,... times during a given period of time or in a given area) whenever the probability of the phenomenon happening is constant in time or space. Examples of events that may be modelled as a Poisson distribution include: The number of soldiers killed by horse-kicks each year in each corps in the Prussian cavalry. This example was made famous by a book of Ladislaus Josephovich Bortkiewicz (1868–1931). The number of yeast cells used when brewing Guinness beer. This example was made famous by William Sealy Gosset (1876–1937). The number of phone calls arriving at a call centre per minute. The number of goals in sports involving two competing teams. The number of deaths per year in a given age group. The number of jumps in a stock price in a given time interval. Under an assumption of homogeneity, the number of times a web server is accessed per minute. The number of mutations in a given stretch of DNA after a certain amount of radiation. The proportion of cells that will be infected at a given multiplicity of infection. The targeting of V-1 rockets on London during World War II. These are often called “birth processes”

Central Limit Theorem The large of large numbers (which is usually most important operationally) tells you about the mean and standard deviation (error) of the sum of many IID random variables The Central Limit Theorem extends this to tell you about shape of probability distribution and says that for well behaved cases, the shape is that of Gaussian or normal distribution WHATEVER original distribution of X was i.e. probability of values far from mean fall off exponentially Pr(O = k) ∝ exp{ - (k- ) 2 / 2 } Where O (sum off IID’s) has mean

Comparison of probability density functions, p(k) for the sum of n fair 6-sided dice to show their convergence to a normal distribution with increasing n, in accordance to the central limit theorem. In the bottom-right graph, smoothed profiles of the previous graphs are rescaled, superimposed and compared with a normal distribution (black curve).

This figure demonstrates the central limit theorem. The sample means are generated using a random number generator, which draws numbers between 1 and 100 from a uniform probability distribution. It illustrates that increasing sample sizes result in the 500 measured sample means being more closely distributed about the population mean (50 in this case). It also compares the observed distributions with the distributions that would be expected for a Gaussian distribution, and shows the chi-squared values that quantify the goodness of the fit