Statistics for Business and Economics Random Variables & Probability Distributions
Learning Objectives Distinguish Between the Two Types of Random Variables Describe Discrete Probability Distributions Describe the Binomial and Poisson Distributions Describe the Uniform and Normal Distributions As a result of this class, you will be able to...
Learning Objectives (continued) Approximate the Binomial Distribution Using the Normal Distribution Explain Sampling Distributions Solve Probability Problems Involving Sampling Distributions
Thinking Challenge You’re taking a 33 question multiple choice test. Each question has 4 choices. Clueless on 1 question, you decide to guess. What’s the chance you’ll get it right? If you guessed on all 33 questions, what would be your grade? Would you pass? The ‘pass’ question is meant to be a ‘teaser’ and not answered.
Types of Random Variables
Data Types Data Quantitative Qualitative Continuous Discrete
Discrete Random Variables
Data Types Data Quantitative Qualitative Continuous Discrete
Discrete Random Variable A numerical outcome of an experiment Example: Number of tails in 2 coin tosses Discrete random variable Whole number (0, 1, 2, 3, etc.) Obtained by counting Usually a finite number of values Poisson random variable is exception ()
Discrete Random Variable Examples Possible Values Experiment Make 100 Sales Calls # Sales 0, 1, 2, ..., 100 Inspect 70 Radios # Defective 0, 1, 2, ..., 70 Answer 33 Questions # Correct 0, 1, 2, ..., 33 Count Cars at Toll Between 11:00 & 1:00 # Cars Arriving 0, 1, 2, ..., ∞
Continuous Random Variables
Data Types Data Quantitative Qualitative Continuous Discrete
Continuous Random Variable A numerical outcome of an experiment Weight of a student (e.g., 115, 156.8, etc.) Continuous Random Variable Whole or fractional number Obtained by measuring Infinite number of values in interval Too many to list like a discrete random variable
Continuous Random Variable Examples Possible Values Experiment Weigh 100 People Weight 45.1, 78, ... Measure Part Life Hours 900, 875.9, ... Amount spent on food $ amount 54.12, 42, ... Measure Time Between Arrivals Inter-Arrival Time 0, 1.3, 2.78, ...
Probability Distributions for Discrete Random Variables
Discrete Probability Distribution List of all possible [x, p(x)] pairs x = value of random variable (outcome) p(x) = probability associated with value Mutually exclusive (no overlap) Collectively exhaustive (nothing left out) 0 p(x) 1 for all x p(x) = 1
Discrete Probability Distribution Example Experiment: Toss 2 coins. Count number of tails. Probability Distribution Values, x Probabilities, p(x) 0 1/4 = .25 1 2/4 = .50 2 1/4 = .25 © 1984-1994 T/Maker Co.
Visualizing Discrete Probability Distributions Listing Table { (0, .25), (1, .50), (2, .25) } f(x) p(x) # Tails Count 1 .25 1 2 .50 Experiment is tossing 1 coin twice. Graph 2 1 .25 p(x) .50 Formula .25 x n ! .00 p ( x ) = px(1 – p)n - x 1 2 x!(n – x)!
Summary Measures Expected Value (Mean of probability distribution) Weighted average of all possible values = E(x) = x p(x) Variance Weighted average of squared deviation about mean 2 = E[(x (x p(x) population notation is used since all values are specified. 3. Standard Deviation ●
Summary Measures Calculation Table x p(x) x p(x) x – (x – )2 (x – )2p(x) Total xp(x) (x p(x)
Thinking Challenge You toss 2 coins. You’re interested in the number of tails. What are the expected value, variance, and standard deviation of this random variable, number of tails? © 1984-1994 T/Maker Co.
Expected Value & Variance Solution* p(x) x p(x) x – (x – ) 2 (x – ) 2p(x) .25 .50 = 1.0 -1.00 1.00 .25 2 = .50 = .71 1 .50 2 .25 1.00 1.00
Discrete Probability Distributions Binomial Poisson
Binomial Distribution
Discrete Probability Distributions Binomial Poisson
Binomial Distribution Number of ‘successes’ in a sample of n observations (trials) Number of reds in 15 spins of roulette wheel Number of defective items in a batch of 5 items Number correct on a 33 question exam Number of customers who purchase out of 100 customers who enter store
Binomial Distribution Properties Two different sampling methods Infinite population without replacement Finite population with replacement Sequence of n identical trials Each trial has 2 outcomes ‘Success’ (desired outcome) or ‘Failure’ Constant trial probability Trials are independent
Binomial Probability Distribution Function p(x) = Probability of x ‘Successes’ n = Sample Size p = Probability of ‘Success’ x = Number of ‘Successes’ in Sample (x = 0, 1, 2, ..., n)
Binomial Probability Distribution Example Experiment: Toss 1 coin 5 times in a row. Note number of tails. What’s the probability of 3 tails? © 1984-1994 T/Maker Co.
Binomial Distribution Characteristics n = 5 p = 0.1 Mean Distribution has different shapes. 1st Graph: If inspecting 5 items & the Probability of a defect is 0.1 (10%), the Probability of finding 0 defective item is about 0.6 (60%). If inspecting 5 items & the Probability of a defect is 0.1 (10%), the Probability of finding 1 defective items is about .35 (35%). 2nd Graph: If inspecting 5 items & the Probability of a defect is 0.5 (50%), the Probability of finding 1 defective items is about .18 (18%). Note: Could use formula or tables at end of text to get Probabilities. Standard Deviation n = 5 p = 0.5
Binomial Distribution Thinking Challenge You’re a telemarketer selling service contracts for Macy’s. You’ve sold 20 in your last 100 calls (p = .20). If you call 12 people tonight, what’s the probability of A. No sales? B. Exactly 2 sales? C. At most 2 sales? D. At least 2 sales? Let’s conclude this section on the binomial with the following Thinking Challenge.
Binomial Distribution Solution* n = 12, p = .20 A. p(0) = .0687 B. p(2) = .2835 C. p(at most 2) = p(0) + p(1) + p(2) = .0687 + .2062 + .2835 = .5584 D. p(at least 2) = p(2) + p(3)...+ p(12) = 1 – [p(0) + p(1)] = 1 – .0687 – .2062 = .7251 From the Binomial Tables: A. p(0) = .0687 B. p(2) = .2835 C. p(at most 2) = p(0) + p(1) + p(2) = .0687+ .2062 + .2835 = .5584 D. p(at least 2) = p(2) + p(3)...+ p(12) = 1 - [p(0) + p(1)] = 1 - .0687 - .2062 = .7251
Poisson Distribution
Discrete Probability Distributions Binomial Poisson
Poisson Distribution Number of events that occur in an interval events per unit Time, Length, Area, Space Examples Number of customers arriving in 20 minutes Number of strikes per year in the U.S. Number of defects per lot (group) of DVD’s Other Examples: Number of machines that break down in a day Number of units sold in a week Number of people arriving at a bank teller per hour Number of telephone calls to customer support per hour
Poisson Process Constant event probability One event per interval Average of 60/hr is 1/min for 60 1-minute intervals One event per interval Don’t arrive together Independent events Arrival of 1 person does not affect another’s arrival © 1984-1994 T/Maker Co.
Poisson Probability Distribution Function x ( ) ! e - p(x) = Probability of x given = Expected (mean) number of ‘successes’ e = 2.71828 (base of natural logarithm) x = Number of ‘successes’ per unit
Poisson Distribution Characteristics = 0.5 Mean = 6 Standard Deviation
Poisson Distribution Example Customers arrive at a rate of 72 per hour. What is the probability of 4 customers arriving in 3 minutes? © 1995 Corel Corp.
Poisson Distribution Solution 72 Per Hr. = 1.2 Per Min. = 3.6 Per 3 Min. Interval
Thinking Challenge You work in Quality Assurance for an investment firm. A clerk enters 75 words per minute with 6 errors per hour. What is the probability of 0 errors in a 255-word bond transaction? © 1984-1994 T/Maker Co.
Poisson Distribution Solution: Finding * 75 words/min = (75 words/min)(60 min/hr) = 4500 words/hr 6 errors/hr = 6 errors/4500 words = .00133 errors/word In a 255-word transaction (interval): = (.00133 errors/word )(255 words) = .34 errors/255-word transaction
Poisson Distribution Solution: Finding p(0)*
Data Types Data Quantitative Qualitative Continuous Discrete
Probability Distributions for Continuous Random Variables
Continuous Probability Density Function Mathematical formula Shows all values, x, and frequencies, f(x) f(x) Is Not Probability Value (Value, Frequency) Frequency f(x) a b x (Area Under Curve) f x dx ( ) All x a b 1 0, Properties
Continuous Random Variable Probability ( a x b ) f ( x ) dx Probability Is Area Under Curve! a f(x) x a b © 1984-1994 T/Maker Co.
Continuous Probability Distributions Uniform Normal
Uniform Distribution
Continuous Probability Distributions Uniform Normal
Uniform Distribution x f(x) d c a b 1. Equally likely outcomes 2. Probability density function 3. Mean and Standard Deviation
Uniform Distribution Example You’re production manager of a soft drink bottling company. You believe that when a machine is set to dispense 12 oz., it really dispenses 11.5 to 12.5 oz. inclusive. Suppose the amount dispensed has a uniform distribution. What is the probability that less than 11.8 oz. is dispensed? SODA
Uniform Distribution Solution f(x) 1.0 x 11.5 11.8 12.5 P(11.5 x 11.8) = (Base)(Height) = (11.8 - 11.5)(1) = .30
Normal Distribution
Continuous Probability Distributions Uniform Normal
Importance of Normal Distribution Describes many random processes or continuous phenomena Can be used to approximate discrete probability distributions Example: binomial Basis for classical statistical inference
Normal Distribution f ( x ) x ‘Bell-shaped’ & symmetrical Mean, median, mode are equal ‘Middle spread’ is 1.33 Random variable has infinite range f ( x ) x Mean Median Mode
Probability Density Function f(x) = Frequency of random variable x = Population standard deviation = 3.14159; e = 2.71828 x = Value of random variable (– < x < ) = Population mean
Effect of Varying Parameters ( & ) f(X) B A C X
Normal Distribution Probability Probability is area under curve! f ( x ) x c d
The Standard Normal Table: P(0 < z < 1.96) Standardized Normal Probability Table (Portion) .06 Z .04 .05 Z m = 0 s = 1 1.96 1.8 .4671 .4678 .4686 .4750 1.9 .4750 .4738 .4744 2.0 .4793 .4798 .4803 2.1 .4838 .4842 .4846 Shaded area exaggerated Probabilities
The Standard Normal Table: P(–1.26 z 1.26) Standardized Normal Distribution s = 1 P(–1.26 ≤ z ≤ 1.26) = .3962 + .3962 = .7924 .3962 .3962 –1.26 1.26 Z m = 0 Shaded area exaggerated
The Standard Normal Table: P(z > 1.26) Standardized Normal Distribution s = 1 P(z > 1.26) = .5000 – .3962 = .1038 .5000 .3962 1.26 Z m = 0
The Standard Normal Table: P(–2.78 z –2.00) Standardized Normal Distribution s = 1 P(–2.78 ≤ z ≤ –2.00) = .4973 – .4772 = .0201 .4973 .4772 –2.78 –2.00 Z m = 0 Shaded area exaggerated
The Standard Normal Table: P(z > –2.13) Standardized Normal Distribution s = 1 P(z > –2.13) = .4834 + .5000 = .9834 .4834 .5000 –2.13 Z m = 0 Shaded area exaggerated
Non-standard Normal Distribution Normal distributions differ by mean & standard deviation. Each distribution would require its own table. That’s an infinite number of tables! X f(X)
Standardize the Normal Distribution One table! m = 0 s = 1 Z Standardized Normal Distribution s m X
Non-standard Normal μ = 5, σ = 10: P(5 < X< 6.2) Normal Distribution X m = 5 s = 10 6.2 Z m = 0 s = 1 .12 Standardized Normal Distribution Shaded area exaggerated .0478
Non-standard Normal μ = 5, σ = 10: P(3.8 X 5) Shaded area exaggerated Normal Distribution X m = 5 s = 10 3.8 Z m = 0 s = 1 -.12 Standardized Normal Distribution .0478
Non-standard Normal μ = 5, σ = 10: P(2.9 X 7.1) Shaded area exaggerated 5 s = 10 2.9 7.1 X Normal Distribution s = 1 -.21 Z .21 Standardized Normal Distribution .1664 .0832
Non-standard Normal μ = 5, σ = 10: P(X 8) Shaded area exaggerated X m = 5 s = 10 8 Normal Distribution Z = 0 .30 Standardized Normal Distribution m s = 1 .5000 .3821 .1179
Non-standard Normal μ = 5, σ = 10: P(7.1 X 8) Shaded area exaggerated m = 5 s = 10 8 7.1 X Normal Distribution m = 0 s = 1 .30 Z .21 Standardized Normal Distribution .1179 .0347 .0832
Normal Distribution Thinking Challenge You work in Quality Control for GE. Light bulb life has a normal distribution with = 2000 hours and = 200 hours. What’s the probability that a bulb will last A. between 2000 and 2400 hours? B. less than 1470 hours? Allow students about 10-15 minutes to solve this.
Solution* P(2000 X 2400) X Z .4772 m = 2000 s = 200 2400 m = 0 s Normal Distribution X m = 2000 s = 200 2400 Standardized Normal Distribution Z m = 0 s = 1 2.0 .4772
Solution* P(X 1470) X Z .0040 m = 2000 s = 200 1470 m = 0 s = 1 Normal Distribution Z m = 0 s = 1 -2.65 Standardized Normal Distribution .5000 .0040 .4960
Finding Z Values for Known Probabilities What is Z, given P(Z) = .1217? Shaded area exaggerated Z m = 0 s = 1 ? .1217 Standardized Normal Probability Table (Portion) Z .00 0.2 0.0 .0000 .0040 .0080 0.1 .0398 .0438 .0478 .0793 .0832 .0871 .1179 .1255 .01 0.3 .1217 .31
Finding X Values for Known Probabilities Normal Distribution Standardized Normal Distribution Shaded areas exaggerated Z m = 0 s = 1 .31 .1217 X m = 5 s = 10 ? .1217 8.1
Assessing Normality
Assessing Normality Draw a histogram or stem–and–leaf display and note the shape Compute the intervals x + s, x + 2s, x + 3s and compare the percentage of data in these intervals to the Empirical Rule (68%, 95%, 99.7%) Calculate If ratio is close to 1.3, data is approximately normal
Assessing Normality Continued Draw a Normal Probability Plot Observed value Expected Z–score
Normal Approximation of Binomial Distribution
Normal Approximation of Binomial Distribution Not all binomial tables exist Requires large sample size Gives approximate probability only Need correction for continuity n = 10 p = 0.50 .0 .1 .2 .3 2 4 6 8 10 x P(x)
Why Probability Is Approximate Probability Added by Normal Curve Probability Lost by Normal Curve P(x) .3 .2 As the number of vertical bars (n) increases, the errors due to approximating with the normal decrease. .1 .0 x 2 4 6 8 10 Binomial Probability: Bar Height Normal Probability: Area Under Curve from 3.5 to 4.5
Correction for Continuity A 1/2 unit adjustment to discrete variable Used when approximating a discrete distribution with a continuous distribution Improves accuracy 4.5 (4 + .5) 3.5 (4 – .5) 4
Normal Approximation Procedure 1. Calculate the interval: If interval lies in range 0 to n, normal approximation can be used 2. Express binomial probability in form 3. For each value of interest, a, use:
Normal Approximation Example What is the normal approximation of p(x = 4) given n = 10, and p = 0.5? P(x) .3 .2 .1 .0 x 2 4 6 8 10 3.5 4.5
Normal Approximation Solution 1. Calculate the interval: Interval lies in range 0 to 10, so normal approximation can be used 2. Express binomial probability in form:
Normal Approximation Solution 3. Compute standard normal z values: (a + .5) n p 3.5 - 10(.5) Z . 95 n p ( 1 p ) 10(.5)(1 - .5) (b + .5) n p 4.5 - 10(.5) Z . 32 n p ( 1 p ) 10(.5)(1 - .5)
Normal Approximation Solution 4. Sketch the approximate normal distribution: .3289 - .1255 .2034 = 0 = 1 .1255 .3289 -.95 -.32 Z
Normal Approximation Solution 5. The exact probability from the binomial formula is .2000 (versus .2034) P(x) .3 .2 .1 .0 x 2 4 6 8 10
Sampling Distributions
Parameter & Statistic Parameter Sample Statistic Summary measure about population Sample Statistic Summary measure about sample P in Population & Parameter S in Sample & Statistic
Common Statistics & Parameters Sample Statistic Population Parameter Mean X Standard Deviation S Variance S2 2 Binomial Proportion p ^
Sampling Distribution Theoretical probability distribution Random variable is sample statistic Sample mean, sample proportion, etc. Results from drawing all possible samples of a fixed size 4. List of all possible [x, p(x)] pairs Sampling distribution of the sample mean
Developing Sampling Distributions Suppose There’s a Population ... Population size, N = 4 Random variable, x Values of x: 1, 2, 3, 4 Uniform distribution © 1984-1994 T/Maker Co.
Population Characteristics Summary Measures Population Distribution P(x) .3 .2 Have students verify these numbers. .1 .0 x 1 2 3 4
All Possible Samples of Size n = 2 2nd Observation 1 2 3 4 1st Obs 16 Sample Means 2nd Observation 1 2 3 4 1st Obs 1,1 1,2 1,3 1,4 1.0 1.5 2.0 2.5 2,1 2,2 2,3 2,4 1.5 2.0 2.5 3.0 3,1 3,2 3,3 3,4 2.0 2.5 3.0 3.5 4,1 4,2 4,3 4,4 2.5 3.0 3.5 4.0 Sample with replacement
Sampling Distribution of All Sample Means 2nd Observation 1 2 3 4 1st Obs 16 Sample Means Sampling Distribution of the Sample Mean .0 .1 .2 .3 1.0 1.5 2.0 2.5 3.0 3.5 4.0 P(x) x 1.0 1.5 2.0 2.5 1.5 2.0 2.5 3.0 2.0 2.5 3.0 3.5 2.5 3.0 3.5 4.0
Summary Measures of All Sample Means Have students verify these numbers.
Sampling Distribution Comparison Population Sampling Distribution P(x) .0 .1 .2 .3 1 2 3 4 .0 .1 .2 .3 1.0 1.5 2.0 2.5 3.0 3.5 4.0 P(x) x x
Standard Error of the Mean 1. Standard deviation of all possible sample means, x ● Measures scatter in all sample means, x Less than population standard deviation 3. Formula (sampling with replacement)
Properties of the Sampling Distribution of x
Properties of the Sampling Distribution of x Regardless of the sample size, The mean of the sampling distribution equals the population mean An estimator is a random variable used to estimate a population parameter (characteristic). Unbiasedness An estimator is unbiased if the mean of its sampling distribution is equal to the population parameter. Efficiency The efficiency of an unbiased estimator is measured by the variance of its sampling distribution. If two estimators, with the same sample size, are both unbiased, then the one with the smaller variance has greater relative efficiency. Consistency An estimator is a consistent estimator of a population parameter if the larger the sample size, the more likely it is that the estimate will come close to the parameter. The standard deviation of the sampling distribution equals
Sampling from Normal Populations
Sampling from Normal Populations Central Tendency Dispersion Sampling with replacement Population Distribution m = 50 s = 10 X Sampling Distribution n =16 X = 2.5 n = 4 X = 5 m X = 50 -
Standardizing the Sampling Distribution of x Standardized Normal Distribution m = 0 s = 1 Z
Thinking Challenge You’re an operations analyst for AT&T. Long-distance telephone calls are normally distribution with = 8 min. and = 2 min. If you select random samples of 25 calls, what percentage of the sample means would be between 7.8 & 8.2 minutes? © 1984-1994 T/Maker Co.
Sampling Distribution Solution* 8 s ` X = .4 7.8 8.2 s = 1 –.50 Z .50 .3830 Standardized Normal Distribution .1915
Sampling from Non-Normal Populations
Sampling from Non-Normal Populations Central Tendency Dispersion Sampling with replacement Population Distribution s = 10 m = 50 X Sampling Distribution n = 4 X = 5 n =30 X = 1.8 m - = 50 X X
Central Limit Theorem X As sample size gets large enough (n 30) ... sampling distribution becomes almost normal. X
Central Limit Theorem Example The amount of soda in cans of a particular brand has a mean of 12 oz and a standard deviation of .2 oz. If you select random samples of 50 cans, what percentage of the sample means would be less than 11.95 oz? SODA
Central Limit Theorem Solution* Shaded area exaggerated Sampling Distribution 12 s ` X = .03 11.95 s = 1 –1.77 Z .0384 Standardized Normal Distribution .4616
Conclusion Distinguished Between the Two Types of Random Variables Described Discrete Probability Distributions Described the Binomial and Poisson Distributions Described the Uniform and Normal Distributions Approximated the Binomial Distribution Using the Normal Distribution As a result of this class, you will be able to...
Conclusion (continued) Explained Sampling Distributions Solved Probability Problems Involving Sampling Distributions