Stick Tossing and Confidence Intervals Asilomar - December 2006 Bruce Cohen Lowell High School, SFUSD

Slides:



Advertisements
Similar presentations
Sampling: Final and Initial Sample Size Determination
Advertisements

6-1 Stats Unit 6 Sampling Distributions and Statistical Inference - 1 FPP Chapters 16-18, 20-21, 23 The Law of Averages (Ch 16) Box Models (Ch 16) Sampling.
Sampling Distributions (§ )
Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics.
Ch.18 Normal approximation using probability histograms Review measures center and spread –List of numbers (histogram of data) –Box model For a “large”
The standard error of the sample mean and confidence intervals
Ch.18 Normal approximation using probability histograms Review measures of center and spread For a “large” number of draws, a histogram of observed sums.
Sampling Distributions
The standard error of the sample mean and confidence intervals How far is the average sample mean from the population mean? In what interval around mu.
Definitions Uniform Distribution is a probability distribution in which the continuous random variable values are spread evenly over the range of possibilities;
HIM 3200 Normal Distribution Biostatistics Dr. Burton.
CHAPTER 6 Statistical Analysis of Experimental Data
Discrete Probability Distributions
Probability (cont.). Assigning Probabilities A probability is a value between 0 and 1 and is written either as a fraction or as a proportion. For the.
Ch. 17 The Expected Value & Standard Error Review box models –Examples 1.Pigs – assume 40% chance of getting a “trotter” -20 tosses 2.Coin toss – 20 times.
12.3 – Measures of Dispersion
Binomial Probability Distribution.
5.4 The Central Limit Theorem Statistics Mrs. Spitz Fall 2008.
Normal Probability Distributions Chapter 5. § 5.1 Introduction to Normal Distributions and the Standard Distribution.
Standard error of estimate & Confidence interval.
INFERENTIAL STATISTICS – Samples are only estimates of the population – Sample statistics will be slightly off from the true values of its population’s.
Review of normal distribution. Exercise Solution.
STAT 13 -Lecture 2 Lecture 2 Standardization, Normal distribution, Stem-leaf, histogram Standardization is a re-scaling technique, useful for conveying.
 States that any distribution of sample means from a large population approaches the normal distribution as n increases to infinity ◦ The.
© Copyright McGraw-Hill CHAPTER 6 The Normal Distribution.
Chapter 6 The Normal Probability Distribution
8.5 Normal Distributions We have seen that the histogram for a binomial distribution with n = 20 trials and p = 0.50 was shaped like a bell if we join.
HAWKES LEARNING SYSTEMS math courseware specialists Copyright © 2010 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Chapter 8 Continuous.
Chapter 6: Probability Distributions
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
16-1 Copyright  2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e Chapter 16 The.
Random Sampling, Point Estimation and Maximum Likelihood.
Section 8.1 Estimating  When  is Known In this section, we develop techniques for estimating the population mean μ using sample data. We assume that.
Random Variables Numerical Quantities whose values are determine by the outcome of a random experiment.
The normal distribution Binomial distribution is discrete events, (infected, not infected) The normal distribution is a probability density function for.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 6 Normal Probability Distributions 6-1 Review and Preview 6-2 The Standard Normal.
5.3 Random Variables  Random Variable  Discrete Random Variables  Continuous Random Variables  Normal Distributions as Probability Distributions 1.
Statistical estimation, confidence intervals
Biostatistics, statistical software III. Population, statistical sample. Probability, probability variables. Important distributions. Properties of the.
Introduction to Behavioral Statistics Probability, The Binomial Distribution and the Normal Curve.
Normal Probability Distributions Chapter 5. § 5.1 Introduction to Normal Distributions and the Standard Distribution.
Confidence Intervals: The Basics BPS chapter 14 © 2006 W.H. Freeman and Company.
Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics.
Two Main Uses of Statistics: 1)Descriptive : To describe or summarize a collection of data points The data set in hand = the population of interest 2)Inferential.
Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics Statistics & Econometrics.
Random Variables Ch. 6. Flip a fair coin 4 times. List all the possible outcomes. Let X be the number of heads. A probability model describes the possible.
Probability Theory Modelling random phenomena. Permutations the number of ways that you can order n objects is: n! = n(n-1)(n-2)(n-3)…(3)(2)(1) Definition:
Math 3680 Lecture #15 Confidence Intervals. Review: Suppose that E(X) =  and SD(X) = . Recall the following two facts about the average of n observations.
Lecture 8: Measurement Errors 1. Objectives List some sources of measurement errors. Classify measurement errors into systematic and random errors. Study.
Lecture 7: Bivariate Statistics. 2 Properties of Standard Deviation Variance is just the square of the S.D. If a constant is added to all scores, it has.
The inference and accuracy We learned how to estimate the probability that the percentage of some subjects in the sample would be in a given interval by.
Chapter 6: Descriptive Statistics. Learning Objectives Describe statistical measures used in descriptive statistics Compute measures of central tendency.
The accuracy of averages We learned how to make inference from the sample to the population: Counting the percentages. Here we begin to learn how to make.
The expected value The value of a variable one would “expect” to get. It is also called the (mathematical) expectation, or the mean.
The Normal Approximation for Data. History The normal curve was discovered by Abraham de Moivre around Around 1870, the Belgian mathematician Adolph.
The normal approximation for probability histograms.
Review Statistical inference and test of significance.
Introduction Sample surveys involve chance error. Here we will study how to find the likely size of the chance error in a percentage, for simple random.
Review Design of experiments, histograms, average and standard deviation, normal approximation, measurement error, and probability.
Review Law of averages, expected value and standard error, normal approximation, surveys and sampling.
THE NORMAL DISTRIBUTION
13-5 The Normal Distribution
Normal Probability Distributions Chapter 5. § 5.1 Introduction to Normal Distributions and the Standard Distribution.
Chapter 7 Review.
Normal Probability Distributions
The Central Limit Theorem
Econometric Models The most basic econometric model consists of a relationship between two variables which is disturbed by a random error. We need to use.
Normal Probability Distributions
Sampling Distributions (§ )
Accuracy of Averages.
Presentation transcript:

Stick Tossing and Confidence Intervals Asilomar - December 2006 Bruce Cohen Lowell High School, SFUSD David Sklar San Francisco State University Ver. 0.5

An Old Problem: When a thin stick of unit length is “randomly” tossed onto a grid of parallel lines spaced one unit apart what is the probability that the stick lands crossing a grid line? We would like to take a purely experimental and statistical approach to the problem of finding, or at least estimating, the desired probability. Estimating a Probability Our experiments will consist of tossing a stick some fixed number of times, keeping track of how many times the stick lands crossing a grid line (the data), and computing the percentage of times this event occurs (a statistic). Basic statistical theory will help us understand how to interpret these results.

Sketch of a proof of a special case of the Central Limit Theorem Where does the procedure for finding confidence intervals come from? Why does it work? A mathematical model for the data The mathematics of the model Plan Estimating a simple probability Toss sticks, gather data Estimating the probability Estimating the uncertainty in the estimate of the probability Confidence intervals and what they mean Background material The average and standard deviation of a list of numbers Histograms, what they are and what they aren’t The average and standard deviation of a histogram The normal curve Box models and histograms for the sum of the draws The Central Limit Theorem

Estimating the Probability: A Sample Calculation Result: 20 line crossings in 36 tosses Conclusions: Based on this data an approximate 68% confidence interval for the probability that the stick lands crossing a line is an approximate 95% confidence interval is 47.3% 63.9% 72.2% 39.0%

68% Confidence Intervals for 10 Experiments 47.3% 63.9% 58.8% 74.6% 44.5% 61.1% cross prob SE % 8.3% % 7.9% % 8.3% 70.9% 84.7% 55.9% 71.2% % 6.9% % 8.0% 61.7% 77.1% 58.8% 74.6% 55.9% 71.2% % 7.7% % 7.9% % 7.2% % 8.0% % 8.2% 67.8% 82.2% 50.1% 66.5% 60% 70% 80% 40% 50% estimated (36 tosses per experiment)

70.0% 60.0% 62.5% 67.5% Pooling the data Result: 234 line crossings in 360 (independent) tosses Conclusions: Based on this data an approximate 68% confidence interval for the probability that the stick lands crossing a line is an approximate 95% confidence interval is

68% Confidence Intervals for 10 Experiments 47.3% 63.9% 58.8% 74.6% 44.5% 61.1% 70.9% 84.7% 55.9% 71.2% cross prob error % 8.3% % 7.9% % 8.3% % 6.9% % 8.0% 61.7% 77.1% 58.8% 74.6% 67.8% 82.2% 55.9% 71.2% % 7.7% % 7.9% % 7.2% % 8.0% 60% 70% 80% 40% 50% estimated (36 tosses per experiment) 62.5% 67.5% % 8.2% 50.1% 66.5% % 2.5%

Some 95% Confidence Intervals

Where Does the Procedure for Finding Confidence Intervals Come From? As with all “real world” applications of mathematics we begin with a Mathematical Model. 1 0 ? ?? Box Model The number of line crossings in n tosses of the stick is like the Sum of values of n draws at random with replacement from a box with two kinds of numbered tickets. Those numbered 1 correspond to the stick landing crossing a line, and those numbered 0 to not crossing. The percentage of tickets numbered 1 in the box is not known. This unknown percentage corresponds to the probability that a stick lands crossing a line. The n drawn tickets are a sample, and the % of 1’s in the sample is a statistic. The set of tickets in the box is called the population, and the (unknown) % of 1’s in the population is a parameter. Note: this kind of box is called a zero–one box.

The Mathematics of the Model The goal for the rest of the talk is to develop the mathematics of the box model. We first review some basic background material which we then use to understand the behavior of the sum of the draws from a box of known composition. Finally we use this understanding to see why the confidence levels come from areas under the normal curve.

The Average and Standard Deviation of a List of Numbers Example List: 21, 28, 30, 30, 34, 37 The SD measures the spread of the list about the mean. It has the same units as the values in the list. It is a natural scale for the list: we are often more interested in how many SD’s a value is from the mean than in the value itself The average is the balance point. The SD measures the spread. The mean measures the “center” of the list.

The Average and Standard Deviation of a List of Numbers For a list consisting of just 0’s and 1’s we have: and with some algebra we can show that We can now re-interpret the procedure for estimating our probability

Properties of The Average and Standard Deviation 1.If we add a constant, B, to each element of a list the average of the new list is the old average + B. 2.If we multiply each element of a list by a constant, A, the average of the new list is A times the old average. 3.If we add a constant, B, to each element of a list the SD of the new list is the old SD. 4.If we multiply each element of a list by a constant, A, the SD of the new list is |A| times the old SD.

Standard Units A list in standard units will have mean 0 and SD 1. We are often more interested in how many SD’s a value is from the mean than in the value itself. For example: 37 is 1.4 SD’s above the average or 28 is 0.4 SD’s below the average. The value of an element in Standard Units is the the number of SD’s it is above (positive), or below (negative) the mean. To convert a value to standard units use List: 21, 28, 30, 30, 34, 37 with average 30 and SD 5 Example In Standard Units: -1.8, -0.4, 0, 0, 0.8, 1.4 For many lists roughly 68% of the values lie within 1 SD of the mean and 95% lie within 2 SD’s. value in standard units Adding a constant to each element of a list or multiplying each element by a constant will not change the values of the elements in standard units.

From Lists to Histograms 23, 29, 30, 31, 35, 38, 40, 41, 42, 45, 46, 51, 52, 54, 55, 55, 57, 58, 59, 60, 61, 63, 69, 70, 70, 71, 71, 74, 75, 75, 82, 85, 86, 91, 91, 93. Note: Example: 36 Exam Scores class intervals 6 5 # % density (% /point) A Histogram represents the percentages by areas (not by heights). A histogram is not a bar chart. Av = 59.1, SD = 18.9 Endpoint convention: class intervals contain left endpoints, but not right endpoints Density (% per point) scores % 16.7% 44.4% 16.7% 8.3% (0.8) (1.0) (1.9) (1.4) (0.8)

A Histogram is Not A Bar Chart A Histogram represents the percentages by areas (not by heights). A histogram is not a bar chart. Density (% per point) scores % 16.7% 44.4% 16.7% 8.3% (0.8) (1.0) (1.9) (1.4) (0.8) Histogram of Scores scores 13.9% 16.7% 44.4% 16.7% % of total papers Bar Chart of Scores 8.3%

Density (% per point) scores 13.9% 16.7% 44.4% 16.7% 8.3% (0.8) (1.0) (1.9) (1.4) (0.8) The Average and Standard Deviation of a Histogram To find the mean or average of a histogram first list the center of each class interval then multiply each by the area of the block above it and finally sum. Class intervals: 20 to 38, 38 to 50, 50 to 74, 74 to 90, 90 to 100 To find the standard deviation of a histogram find the squared deviations of the center of each class interval, then multiply each by the area of its corresponding block, then sum, and finally take the square root. SD = 19 Av = 60.5 For many histograms roughly 68% of the area lies within 1 SD of the mean and 95% lies within 2 SD’s. [Note for the original data: Av = 59.1, SD = 18.9] List of midpoints: 29, 44, 62, 82, 95

Histograms and Standard Units Density (% per point) scores Av = 60.5 SD = 19 Standard Units

The Normal Curve From: Freedman, Pisani, and Purves, Statistics, 3 rd Ed. The normal curve was discovered by Abraham De Moivre around Around 1870 Adolph Quetelet had the idea of using it as an ideal histogram to which histograms for data could be compared. Many histograms follow the normal curve and many do not. Height (% per Std.U.) Area (percent)

Density (% per point) scores Av = 60.5 SD = 19 Standard Units Histograms, Standard Units, and the Normal curve

Data Histograms and Probability Histograms Discrete data convention From: Freedman, Pisani, and Purves, Statistics, 3 rd ed.

Data Histograms and Probability Histograms for the Sum of the Draws

The Central Limit Theorem There are many Central Limit Theorems. We state two in terms of box models. The second is a special case of the first and it covers the model we are dealing with in our stick tossing problem. It goes back to the early eighteenth century. When drawing at random with replacement from a box of numbered tickets (with bounded range), the probability histogram for the sum of the draws will follow the standard normal curve, even if the the contents of the box do not. The histogram must be put into standard units, and the number of draws must be reasonably large. De Moivre – La Place version: When drawing at random with replacement from a zero-one box, the probability histogram for the sum of the draws will follow the standard normal curve, even if the the contents of the box do not. The histogram must be put into standard units, and the number of draws must be reasonably large.

The Normal Curve and Probability Histograms for the Sum of the Draws Histogram for the box From: Freedman, Pisani, and Purves 10 provides a box model for counting the number of heads in n tosses of a fair coin.

The Normal Curve and Probability Histograms for the Sum of the Draws From: Freedman, …

The Normal Curve and Probability Histograms for the Sum of the Draws From: Freedman, … Histogram for the box 129

The Central Limit Theorems When drawing at random with replacement from a box of numbered tickets (with bounded range), the probability histogram for the sum (and average) of the draws will follow the standard normal curve, even if the the contents of the box do not. The histogram must be put into standard units, and the number of draws must be reasonably large. De Moivre – La Place version: When drawing at random with replacement from a zero-one box, the probability histogram for the sum (and average) of the draws will follow the standard normal curve, even if the the contents of the box do not. The histogram must be put into standard units, and the number of draws must be reasonably large. The probability histogram for the average of the draws, when put in standard units is the same as for the sum because multiplying each value of the sum by 1/(# of draws) won’t change the corresponding values in standard units.

Where Does the 68% Confidence Level Come From? Sample Average True Population Average True SE for the average of the draws Estimated SE for the average Standard units 1 Since the estimated SE for the average computed from sample is, on average, about equal to the true SE a 68% confidence interval will cover the true population mean whenever the sample mean is within 1 SE of the true mean. The probability of this happening is, by the central limit theorem, the area within 1 standard unit of 0 under the normal curve, and this area is about 68%.

How to Prove The De Moivre – La Place Version of The Central Limit Theorem Show that the probability that the sum of n draws at random with replacement from a zero-one box is exactly k given by the binomial formula Then using “Stirling’s Formula”

How to Prove The De Moivre – La Place Version of The Central Limit Theorem -- continued Which implies Use the series for the log to show that, The limiting processes in these steps require some care. Both k and n must go to infinity together in a fixed relationship to each other, and we need to understand why values of x for which |x|>npq are unimportant.

Bibliography 1.Freedman, Pisani, & Purves, Statistics, 3 rd Ed., W.W. Norton, New York, W. Feller, An Introduction to Probability Theory and Its Applications, Volume I, 2 nd Ed., John Wiley & Sons, New York, London, Sydney, F. Mosteller, Fifty Challenging Problems in Probability with Solutions, Addison-Wesley, Palo Alto, R Development Core Team, R: A language and environment for statistical computing, R Foundation for Statistical Computing, Vienna, Austria, 2006,