SESSION 39 & 40 Last Update 11 th May 2011 Continuous Probability Distributions
Lecturer:Florian Boehlandt University:University of Stellenbosch Business School Domain: analysis.net/pages/vega.php
Learning Objectives 1.Population and Samples 2.Point Estimates vs. Confidence Interval Estimates 3.Calculating Confidence Intervals
Normal Probabilities Often it may be prohibitively expensive to obtain information on all member of a population. Thus, market researchers usually collect information from a sample or sub-set of the population. The sample statistics (e.g. the sample mean) are calculated and used to estimate the population parameters (e.g. the population mean). This process is know as statistical inference.
Notation The notation for sample statistics and population parameters is given in the table below: Size Mean Standard Deviation Proportion Population Parameters N N μ μ σ σ P P Sample Statistics n n x x s s p p
Inference Sample Statistic Point Estimate = Sample Statistic Confidence Interval Estimate Unknown Population Parameter A point estimator draws inferences about the population by estimating the value of an unknown parameter using a single value or point An interval estimator draws inferences about the population by estimating the value of an unknown parameter using an interval
Common confidence intervals include: - 90 % Weak statistical evidence - 95% Strong statistical evidence - 99% Overwhelming statistical evidence
Central Limit Theorem The sampling distribution of the mean of a random sample drawn from any population is approximately normal for sufficiently large sample sizes. The larger the sample size, the more closely the sampling distribution of x-bar will resemble the normal distribution. This is an important notation since it allows for using the normal distribution to describe the dispersion of sample means. Example: Tossing n dies and recording the average results
Sampling Distribution It can be shown that the sampling distribution is described as follows: If X is normal. X-bar is normal. If X is nonnormal, X-bar is approximately normal for sufficiently large sample sizes. So for the sampling distribution: Changes to:
Example Suppose that the amount of time to assemble a computer is normally distributed with a mean μ = 50 minutes and a standard deviation σ = 10 minutes. a)What is the probability that one randomly selected computer is assembled in a time less than 60 minutes? b)What is the probability that four randomly selected computers have a mean assembly time of less than 60 minutes?
Solution a)b) The associated probabilities are P(Z < 1) = and P(Z < 2) = respectively.
Sampling Distribution and Inference The 95% confidence interval (i.e. the area underneath the graph) for the standard normal distribution is expressed algebraically: With the definition of Z for the sampling distribution: Rearrangement yields: Or for the general case: If X is normal. X-bar is normal. If X is nonnormal, X-bar is approximately normal for sufficiently large sample sizes. So for the sampling distribution: Changes to: The smaller-than term is referred to as Lower- Confidence-Limit (LCL) and the larger-than term as Upper- Confidence-Limit (UCL)
Example Suppose that the average assembly time across n = 25 computers is X-bar = 50 minutes. In addition, we assume that the population standard deviation is known and is equal to σ = 10 minutes. What is the 95% confidence interval? Comment: α = 1 – CL. Here, α = 1 – 0.95 = 0.05 (or 5%). Thus, α/2 =
Solution LCL and UCL Thus, the LCL = and the UCL = The interpretation is straight-forward: For n = 25 with σ = 10, there is a 95% chance that the true population mean μ falls in between the LCL = and the UCL =
Finding z α/2 Z Since CL = 0.95, α = 1 – 0.92 = Then α / 2 = For one half of the standard normal distribution table, this corresponds to 0.5 – = = P(Z < 1.96). Thus, z α/2 = Represents 2.5% of the area underneath the chart
Normal Approximation of the Binomial Distribution The binomial distribution may be approximated using the normal distribution. A graphical derivation of this is included in most statistics textbooks and is omitted here. The upside is that the normal approximation allows us to calculate confidence intervals for the binomial distribution It can be shown that the sampling distribution is described as follows: where p-hat is the proportion of successes in a Bernoulli trial process estimated from the statistical sample.
Confidence Interval Binomial Distribution Replacing E(P-hat) for μ and the standard error σ / √n with the standard error of the proportion in the formula for the confidence interval yields:
Example In a survey including 1000 people, a political candidate received 52% of the votes cast. What is the 95% confidence interval associated with this result?
Solution LCL and UCL Thus, the LCL = and the UCL = Note that the LCL is in excess of 0.5 (i.e. from the sample, there is strong evidence to infer that the candidate may win the election).