Combining Random Variables

Slides:



Advertisements
Similar presentations
Chapter 18 Sampling distribution models
Advertisements

Chapter 6 Sampling and Sampling Distributions
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
Chapter 7 Introduction to Sampling Distributions
Sampling Distributions
Chapter 6 Introduction to Sampling Distributions
Chapter 7 Sampling and Sampling Distributions
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 6 Introduction to Sampling Distributions.
Evaluating Hypotheses
Sampling Distributions
Part III: Inference Topic 6 Sampling and Sampling Distributions
Chapter 7 ~ Sample Variability
Sampling Theory Determining the distribution of Sample statistics.
Sampling Theory Determining the distribution of Sample statistics.
Sampling Distributions
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
Copyright ©2011 Nelson Education Limited The Normal Probability Distribution CHAPTER 6.
Agresti/Franklin Statistics, 1e, 1 of 139  Section 6.4 How Likely Are the Possible Values of a Statistic? The Sampling Distribution.
The Normal Probability Distribution
Sampling Distributions Chapter 7. The Concept of a Sampling Distribution Repeated samples of the same size are selected from the same population. Repeated.
HAWKES LEARNING SYSTEMS math courseware specialists Copyright © 2010 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Chapter 9 Samples.
Lab 3b: Distribution of the mean
Chapter 7: Sample Variability Empirical Distribution of Sample Means.
Chapter 7 Sample Variability. Those who jump off a bridge in Paris are in Seine. A backward poet writes inverse. A man's home is his castle, in a manor.
Chapter 7 Sampling and Sampling Distributions ©. Simple Random Sample simple random sample Suppose that we want to select a sample of n objects from a.
February 2012 Sampling Distribution Models. Drawing Normal Models For cars on I-10 between Kerrville and Junction, it is estimated that 80% are speeding.
Chapter 7: Sampling Distributions Section 7.1 How Likely Are the Possible Values of a Statistic? The Sampling Distribution.
1 ES Chapter 11: Goals Investigate the variability in sample statistics from sample to sample Find measures of central tendency for sample statistics Find.
Random Variables Numerical Quantities whose values are determine by the outcome of a random experiment.
Chapter 18 Sampling distribution models math2200.
Introduction to Inference Sampling Distributions.
© 2010 Pearson Prentice Hall. All rights reserved Chapter Sampling Distributions 8.
Sampling Theory Determining the distribution of Sample statistics.
Sampling Distributions Chapter 18. Sampling Distributions A parameter is a number that describes the population. In statistical practice, the value of.
Chapter 6 Sampling and Sampling Distributions
Sampling Distributions
Virtual University of Pakistan
Sampling and Sampling Distributions
Introduction to Inference
THE CENTRAL LIMIT THEOREM
Chapter 7 Review.
FINAL EXAMINATION STUDY MATERIAL PART I
Understanding Sampling Distributions: Statistics as Random Variables
Chapter 7 Sampling and Sampling Distributions
Some useful results re Normal r.v
Basic Business Statistics (8th Edition)
Introduction to Sampling Distributions
Chapter 6: Sampling Distributions
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
Distribution of the Sample Means
Sampling Distributions
Chapter 5 Sampling Distributions
THE CENTRAL LIMIT THEOREM
Lecture 13 Sections 5.4 – 5.6 Objectives:
Sampling Distributions and The Central Limit Theorem
Chapter 5 Sampling Distributions
Determining the distribution of Sample statistics
Chapter 9 Hypothesis Testing.
Sampling Distribution Models
Chapter 5 Sampling Distributions
Using the Tables for the standard normal distribution
Chapter 5 Sampling Distributions
The normal distribution
Sampling Distributions
Lecture 7 Sampling and Sampling Distributions
Essential Statistics Sampling Distributions
Sampling Distributions (§ )
Essential Statistics Sampling Distributions
Sampling Distributions and The Central Limit Theorem
Chapter 5: Sampling Distributions
Presentation transcript:

Combining Random Variables Quite often we have two or more random variables X, Y, Z etc We combine these random variables using a mathematical expression. Important question What is the distribution of the new random variable?

An Example Suppose that a student will take three tests in the next three days Mathematics (X is the score he will receive on this test.) English Literature (Y is the score he will receive on this test.) Social Studies (Z is the score he will receive on this test.)

Assume that X (Mathematics) has a Normal distribution with mean m = 90 and standard deviation s = 3. Y (English Literature) has a Normal distribution with mean m = 60 and standard deviation s = 10. Z (Social Studies) has a Normal distribution with mean m = 70 and standard deviation s = 7.

Graphs X (Mathematics) m = 90, s = 3. Z (Social Studies) m = 70 , s = 7. Y (English Literature) m = 60, s = 10.

Suppose that after the tests have been written an overall score, S, will be computed as follows: S (Overall score) = 0.50 X (Mathematics) + 0.30 Y (English Literature) + 0.20 Z (Social Studies) + 10 (Bonus marks) What is the distribution of the overall score, S?

Sums, Differences, Linear Combinations of R.V.’s A linear combination of random variables, X, Y, . . . is a combination of the form: L = aX + bY + … + c (a constant) where a, b, etc. are numbers – positive or negative. Most common: Sum = X + Y Difference = X – Y Others Averages = 1/3 X + 1/3 Y + 1/3 Z Weighted averages = 0.40 X + 0.25 Y + 0.35 Z

Means of Linear Combinations If L = aX + bY + … + c The mean of L is: Mean(L) = a Mean(X) + b Mean(Y) + … + c mL = a mX + b mY + … + c Most common: Mean( X + Y) = Mean(X) + Mean(Y) Mean(X – Y) = Mean(X) – Mean(Y)

Variances of Linear Combinations If X, Y, . . . are independent random variables and L = aX + bY + … + c then Variance(L) = a2 Variance(X) + b2 Variance(Y) + … Most common: Variance( X + Y) = Variance(X) + Variance(Y) Variance(X – Y) = Variance(X) + Variance(Y) The constant c has no effect on the variance

Combining Independent Normal Random Variables If X, Y, . . . are independent normal random variables, then L = aX + bY + … is normally distributed. In particular: X + Y is normal with X – Y is normal with

Example: Suppose that one performs two independent tasks (A and B): X = time to perform task A (normal with mean 25 minutes and standard deviation of 3 minutes.) Y = time to perform task B (normal with mean 15 minutes and std dev 2 minutes.) X and Y independent so T = X + Y = total time is normal with What is the probability that the two tasks take more than 45 minutes to perform?

Example 2: A student will take three tests in the next three days X (Mathematics) has a Normal distribution with mean m = 90 and standard deviation s = 3. Y (English Literature) has a Normal distribution with mean m = 60 and standard deviation s = 10. Z (Social Studies) has a Normal distribution with mean m = 70 and standard deviation s = 7. Overall score, S = 0.50 X (Mathematics) + 0.30 Y (English Literature) + 0.20 Z (Social Studies) + 10 (Bonus marks)

Graphs X (Mathematics) m = 90, s = 3. Z (Social Studies) m = 70 , s = 7. Y (English Literature) m = 60, s = 10.

S has a normal distribution with Determine the distribution of S = 0.50 X + 0.30 Y + 0.20 Z + 10 S has a normal distribution with Mean mS = 0.50 mX + 0.30 mY + 0.20 mZ + 10 = 0.50(90) + 0.30(60) + 0.20(70) + 10 = 45 + 18 + 14 +10 = 87

Graph distribution of S = 0.50 X + 0.30 Y + 0.20 Z + 10

The distribution of averages (the mean) Let x1, x2, … , xn denote n independent random variables each coming from the same Normal distribution with mean m and standard deviation s. Let What is the distribution of

The distribution of averages (the mean) Because the mean is a “linear combination” and

Thus if x1, x2, … , xn denote n independent random variables each coming from the same Normal distribution with mean m and standard deviation s. Then has Normal distribution with

Example Suppose we are measuring the cholesterol level of men age 60-65 This measurement has a Normal distribution with mean m = 220 and standard deviation s = 17. A sample of n = 10 males age 60-65 are selected and the cholesterol level is measured for those 10 males. x1, x2, x3, x4, x5, x6, x7, x8, x9, x10, are those 10 measurements Find the probability distribution of Compute the probability that is between 215 and 225

Example Suppose we are measuring the cholesterol level of men age 60-65 This measurement has a Normal distribution with mean m = 220 and standard deviation s = 17. A sample of n = 10 males age 60-65 are selected and the cholesterol level is measured for those 10 males. x1, x2, x3, x4, x5, x6, x7, x8, x9, x10, are those 10 measurements Find the probability distribution of Compute the probability that is between 215 and 225

Solution Find the probability distribution of

Graphs The probability distribution of the mean The probability distribution of individual observations

An Excel file illustrating the sampling distribution of mean.xls

Normal approximation to the Binomial distribution Using the Normal distribution to calculate Binomial probabilities

Binomial distribution n = 20, p = 0.70 Approximating Normal distribution Binomial distribution

Normal Approximation to the Binomial distribution X has a Binomial distribution with parameters n and p Y has a Normal distribution

Approximating Normal distribution P[X = a] Binomial distribution

P[X = a]

Example X has a Binomial distribution with parameters n = 20 and p = 0.70

Where Y has a Normal distribution with: Using the Normal approximation to the Binomial distribution Where Y has a Normal distribution with:

Hence = 0.4052 - 0.2327 = 0.1725 Compare with 0.1643

Normal Approximation to the Binomial distribution X has a Binomial distribution with parameters n and p Y has a Normal distribution

Example X has a Binomial distribution with parameters n = 20 and p = 0.70

Where Y has a Normal distribution with: Using the Normal approximation to the Binomial distribution Where Y has a Normal distribution with:

Hence = 0.5948 - 0.0436 = 0.5512 Compare with 0.5357

Comment: The accuracy of the normal appoximation to the binomial increases with increasing values of n

Example The success rate for an Eye operation is 85% The operation is performed n = 2000 times Find the probability that: The number of successful operations is between 1650 and 1750. The number of successful operations is at most 1800.

where Y has a Normal distribution with: Solution X has a Binomial distribution with parameters n = 2000 and p = 0.85 where Y has a Normal distribution with:

= 0.9004 - 0.0436 = 0.8008

Solution – part 2. = 1.000

Sampling Theory sampling distributions Note:It is important to recognize the dissimilarity (variability) we should expect to see in various samples from the same population. It is important that we model this and use it to assess accuracy of decisions made from samples. A sample is a subset of the population. In many instances it is to costly to collect data from the entire population.

Statistics and Parameters A statistic is a numerical value computed from a sample. Its value may differ for different samples. e.g. sample mean , sample standard deviation s, and sample proportion . A parameter is a numerical value associated with a population. Considered fixed and unchanging. e.g. population mean m, population standard deviation s, and population proportion p.

Observations on a measurement X x1, x2, x3, … , xn taken on individuals (cases) selected at random from a population are random variables prior to their observation. The observations are numerical quantities whose values are determined by the outcome of a random experiment (the choosing of a random sample from the population).

The probability distribution of the observations x1, x2, x3, … , xn is sometimes called the population. This distribution is the smooth histogram of the the variable X for the entire population

the population is unobserved (unless all observations in the population have been observed)

A histogram computed from the observations x1, x2, x3, … , xn Gives an estimate of the population.

A statistic computed from the observations x1, x2, x3, … , xn Is also a random variable prior to observation of the sample. A statistic is also a numerical quantity whose value is determined by the outcome of a random experiment (the choosing of a random sample from the population).

The probability distribution of statistic computed from the observations x1, x2, x3, … , xn is sometimes called its sampling distribution.

It is important to determine the sampling distribution of a statistic. It will describe its sampling behaviour. The sampling distribution will be used the asses the accuracy of the statistic when used for the purpose of estimation. Sampling theory is the area of Mathematical Statistics that is interested in determining the sampling distribution of various statistics

Many statistics have a normal distribution. This quite often is true if the population is Normal It is also sometimes true if the sample size is reasonably large. (reason – the Central limit theorem, to be mentioned later)

Two important statistics that have a normal distribution The sample mean The sample proportion: X is the number of successes in a Binomial experiment

The sampling distribution of the sample mean Let x1, x2, … , xn denote n independent random variables each coming from the same Normal distribution with mean m and standard deviation s. Let What is the distribution of

The distribution of averages (the mean) Because the mean is a “linear combination” and

Thus if x1, x2, … , xn denote n independent random variables each coming from the same Normal distribution with mean m and standard deviation s. Then has Normal distribution with

Example Suppose we are measuring the cholesterol level of men age 60-65 This measurement has a Normal distribution with mean m = 220 and standard deviation s = 17. A sample of n = 10 males age 60-65 are selected and the cholesterol level is measured for those 10 males. x1, x2, x3, x4, x5, x6, x7, x8, x9, x10, are those 10 measurements Find the probability distribution of Compute the probability that is between 215 and 225

Solution Find the probability distribution of

The sampling distribution of the mean Graphs The sampling distribution of the mean The probability distribution of individual observations

Standard Error of the Mean In practice, the population standard deviation s is rarely known, so we cannot compute the standard deviation of , s.d.( ) = . In practice, we only take one random sample, so we only have the sample mean and the sample standard deviation s. Replacing s with s in the standard deviation expression gives us an estimate that is called the standard error of . s.e.( ) = . For a sample of n = 25 weight losses, the standard deviation is s = 4.74 pounds. So the standard error of the mean is 0.948 pounds.

The Central Limit Theorem The Central Limit Theorem (C.L.T.) states that if n is sufficiently large, the sample means of random samples from a population with mean m and finite standard deviation s are approximately normally distributed with mean m and standard deviation . Technical Note: The mean and standard deviation given in the CLT hold for any sample size; it is only the “approximately normal” shape that requires n to be sufficiently large.

Graphical Illustration of the Central Limit Theorem Original Population 30 Distribution of x: n = 2 30 Distribution of x: n = 10 Distribution of x: n = 30

Applications of the sampling distribution of the sample mean is and the Central Limit Theorem When the sampling distribution of the sample mean is (exactly) normally distributed, or approximately normally distributed (by the CLT), we can answer probability questions related to the sample mean.

Example 1) 2) 1) 2) m = 50 s n = 15 9 3 5 P x ( ) 45 60 £ . 47 5 Example: Consider a normal population with m = 50 and s = 15. Suppose a sample of size 9 is selected at random. Find: P x ( ) 45 60 £ . 47 5 1) 2) Solutions: Since the original population is normal, the distribution of the sample mean is also (exactly) normal 1) m x = 50 s x n = 15 9 3 5 2)

Example P x z ( ) . 45 60 50 5 1.00 2.00) 8413 0228 8185 £ = - æ è ç ö 2.00 P x z ( ) . 45 60 50 5 1.00 2.00) 8413 0228 8185 £ = - æ è ç ö ø ÷ z = ; x -  s n

Example P x z ( . ) £ = - æ è ç ö ø ÷ 47 5 50 5000 1915 3085 z = ; 47.5 50 x -0.50 P x z ( . ) £ = - æ è ç ö ø ÷ 47 5 50 5000 1915 3085 z = ; x -  s n

Example Example: A recent report stated that the day-care cost per week in Boston is $109. Suppose this figure is taken as the mean cost per week and that the standard deviation is known to be $20. 1) Find the probability that a sample of 50 day-care centers would show a mean cost of $105 or less per week. 2) Suppose the actual sample mean cost for the sample of 50 day-care centers is $120. Is there any evidence to refute the claim of $109 presented in the report? Solutions: The shape of the original distribution is unknown, but the sample size, n, is large. The CLT applies. The distribution of is approximately normal x m = m = 109 s = s n = 20 50 » 2 . 83 x x

Example 1) x P z ( ) . £ = - æ è ç ö ø ÷ 105 109 2 83 1 41 0793 z = ; 0793 z = ; x -  s n

Example 2) z = ; x -  s n P x z ( ) . ³ = - æ è ç ö ø ÷ 120 109 2 83 To investigate the claim, we need to examine how likely an observation is the sample mean of $120 Consider how far out in the tail of the distribution of the sample mean is $120 P x z ( ) . ³ = - æ è ç ö ø ÷ 120 109 2 83 3 89 1.0000 - 0.9999 = 0.0001 z = ; x -  s n Since the probability is so small, this suggests the observation of $120 is very rare (if the mean cost is really $109) There is evidence (the sample) to suggest the claim of  = $109 is likely wrong

Summary s = s n x m = m x x x x x The mean of the sampling distribution of is equal to the mean of the original population: x m = m x The standard deviation of the sampling distribution of (also called the standard error of the mean) is equal to the standard deviation of the original population divided by the square root of the sample size: Notes: The distribution of becomes more compact as n increases. (Why?) The variance of : x s = s n x x x s 2 = s 2 n x The distribution of is (exactly) normal when the original population is normal x The CLT says: the distribution of is approximately normal regardless of the shape of the original distribution, when the sample size is large enough! x

Sampling Distribution for Any Statistic Every statistic has a sampling distribution, but the appropriate distribution may not always be normal, or even approximately bell-shaped. Construct an approximate sampling distribution for a statistic by actually taking repeated samples of the same size from a population and constructing a relative frequency histogram for the values of the statistic over the many samples.

Sampling Distribution for Sample Proportions Let p = population proportion of interest or binomial probability of success. Let = sample proportion or proportion of successes. is a normal distribution with

Example Sample Proportion Favoring a Candidate Suppose 20% all voters favor Candidate A. Pollsters take a sample of n = 600 voters. Then the sample proportion who favor A will have approximately a normal distribution with

Using the Sampling distribution: Suppose 20% all voters favor Candidate A. Pollsters take a sample of n = 600 voters. Determine the probability that the sample proportion will be between 0.18 and 0.22

Solution:

Sample proportion within 5 percentage points Determine the probability that the sample proportion will be between 0.15 and 0.25

Solution: