Presentation is loading. Please wait.

Presentation is loading. Please wait.

A statistic from a random sample or randomized experiment is a random variable. The probability distribution of this random variable is called its sampling.

Similar presentations


Presentation on theme: "A statistic from a random sample or randomized experiment is a random variable. The probability distribution of this random variable is called its sampling."— Presentation transcript:

1 A statistic from a random sample or randomized experiment is a random variable. The probability distribution of this random variable is called its sampling distribution. Let X = the number of occurrences of a particular outcome in a random sample of size n. Then X is a count and is the sample proportion. Often, X is produced in a very standard way, called the binomial setting… (see the box on page 311 (5.1, 2/9 in eBook))

2 there are a fixed number n of observations
the n observations are independent each observation is either a S ("success") or F ("failure"); i.e., only two possible outcomes for each observation the probability of S, call it p, is the same for each obs. X is the number of S’s in the n observations Examples: the number of Hs in 4 tosses of a fair coin the number in a sample of 100 students who think Pres. Obama is doing a great job the # of color blind men in a sample of 30 etc., etc., etc.

3 So if we let X = the number of S's in the n observations X is said to have a binomial distribution with parameters n and p. n is the number of observations and p is the probability of S on any one observation. Values of X: … n P(X): ? ? ? ? ? ? ? The probabilities depend on n and p and are computed in various ways - a formula is given at the end of the chapter (see starting at the bottom of page 327 (5.1, 9/9 in eBook)) We write X is B(n,p) when we mean X is Binomial with parameters n and p. Let’s go over some examples...

4 Examples are given below:
X= # Hs in 10 tosses of a fair coin. X is B(10,.5) X= # of children in a family of 5 children who have type O blood. X is B(5,.25) And from Ex. 5.8 on p. 317 (5.1, 4/9 eBook) X = # of sales records out of 15 that are misclassified. X is B(15,.08) Recall: 800 of the 10,000 in the population are misclassified (see Ex. 5.6) FACT: When the population is much larger than the sample size, then the count X of successes in a SRS of size n is approximately B(n,p) if the population proportion of Ss is p. Use this rule when the population is at least 10 times as large as the sample…

5 Go over this example: X= # of the 15 sales records that are misclassified. X is B(15,.08)
Use Table C or TI-83 or JMP to compute binomial probabilities; e.g., what is the probability that no more than 1 of the records is misclassified? Use JMP…Create X (0 to 15) and a binomial probability column… Graph -> Chart gives:

6 If we consider the sample proportion phat = X/n then its mean = p and
X is a random variable and as such has a mean and standard deviation. We saw earlier that the mean of a r.v. is computed as a weighted average of the values, the weights being the probabilities… it turns out that when X is B(n,p), we have the following: mX = np and sX = sqrt(np(1-p)) This is shown in the box on pages 320 (5.1, 5/9), but for our purposes, just learn these… If we consider the sample proportion phat = X/n then its mean = p and its s.d. = sqrt(p(1-p)/n) (see box on page (5.1,6/9)) If X is binomial, then these are correct. If we're doing SRS from a large population, make sure pop. is 10 times bigger than the sample…

7 phat is approx. N(p, sqrt(p(1-p)/n))
If our sample sizes are large enough, then the sampling distributions of X and phat are approximately normal: X is approx. N( np, sqrt(np(1-p))) phat is approx. N(p, sqrt(p(1-p)/n)) As a rule of thumb, we will use these approximations whenever, np>= 10 and n(1-p) >= 10.

8 Let's work a problem using the normal approximation to the binomial X
Let's work a problem using the normal approximation to the binomial X. Try Example 5.14 on page 325 (5.1,7/9): first, define the random variable X in words. Make sure you know what a S is, what n is, and what p is. next, compute the mean and standard deviation of X using the formulas mean=np, s.d.=sqrt(np(1-p)) now set up the normal approximation. Sketch the correct normal curve, shade it properly, do the standardization and use Table A and whatever arithmetic is required to get the probabilities…

9 HW: Carefully read sections 5. 1 & 5. 2, omitting the final
HW: Carefully read sections 5.1 & 5.2, omitting the final *ed section on the Binomial Formulas – look at the section on the Continuity Correction however... Go over all the examples in this section – ask me questions on these if you need to! Work on # , , 5.21, 5.25, 5.27,5.33

10 Reminder: What is a sampling distribution?
The sampling distribution of a statistic is the distribution of all possible values of the statistic when all possible samples of a fixed size n are taken from the population. It is a theoretical idea — we do not actually build it with data... The sampling distribution of a statistic is the probability distribution of that statistic.

11 Sampling distribution of x bar
We take many random samples of a given size n from a population with mean m and standard deviation s. Some sample means will be above the population mean m and some will be below, making up the sampling distribution. Sampling distribution of “x bar” Histogram of some sample averages

12 Sampling distribution of x bar
For any population with mean m and standard deviation s: The mean, or center of the sampling distribution of x bar, is equal to the population mean m : The standard deviation of the sampling distribution of x bar is where n is the sample size. Sampling distribution of x bar s/√n m

13 Application Hypokalemia is diagnosed when blood potassium levels are low, below 3.5mEq/dl. Let’s assume that we know a patient whose measured potassium levels vary daily according to a normal distribution N(m = 3.8, s = 0.2) If only one measurement is made, what is the probability that this patient will be misdiagnosed hypokalemic? z = −1.5, P(z < −1.5) = ≈ 7% If instead measurements are taken on 4 separate days and they are averaged, what is the probability of such a misdiagnosis? z = −3, P(z < −3) = ≈ 0.1% Note: Be sure to standardize (z) using the standard deviation of the variable being standardized (X in first case, X-bar in second case)!!

14 This fact is called the Central Limit Theorem
The shape of Xbar tends to be normal. Even if the population is not normal, if the size (n) of the SRS is large enough and it is taken from from any population with mean = m and standard deviation = s, then: Xbar is approximately N(m,s/sqrt(n)). This fact is called the Central Limit Theorem

15 How large a sample size is required to achieve normality of X-bar?
… depends on the population distribution. More observations are required if the population distribution is far from being normal. A sample size of 25 is generally enough to obtain a normal sampling distribution for X-bar from a strong skewness or even mild outliers. A sample size of 40 will typically be good enough to overcome extreme skewness and outliers and make Xbar look normal In many cases, n = 25 isn’t a huge sample. Thus, even for strange population distributions we can assume a normal sampling distribution of the sample mean and work with it to solve problems.

16 Population distribution Dist. of X-bar for n=2 Dist. of X-bar for n=10 Dist. of X-bar for n=25

17 HW: Read section 5.2 thru p. 242; don’t worry too much about how the book derives the formulas... instead make sure you know the Central Limit Theorem and what’s found in the boxes on p. 337, 338, and 339 and in the Summary on page 346. Do problems # , 5.44, 5.47, 5.48, 5.51, 5.53, 5.55, 5.66, 5.70, 5.73


Download ppt "A statistic from a random sample or randomized experiment is a random variable. The probability distribution of this random variable is called its sampling."

Similar presentations


Ads by Google