Presentation is loading. Please wait.

Presentation is loading. Please wait.

 At the beginning of the term, we talked about populations and samples  What are they?  Why do we take samples?

Similar presentations


Presentation on theme: " At the beginning of the term, we talked about populations and samples  What are they?  Why do we take samples?"— Presentation transcript:

1

2  At the beginning of the term, we talked about populations and samples  What are they?  Why do we take samples?

3  Generally, we want to know about the population  But, studying/surveying the entire population is problematic! ▪ Too costly ▪ May be impossible!

4  So, we typically study samples rather than entire populations  But, we are not usually interested in the sample itself  We hope that the sample will give us insight into the population

5  Starting here, we will look at the relationship between samples and populations  What we can learn  How precise/reliable the information is

6  Suppose we were interested in knowing the average travel time for students coming to Seneca  We don’t want to ask every Seneca student  So, we take a sample  We hope that the sample mean will give us insight into the population mean

7  Will the sample mean be exactly equal to the population mean?

8  No, because it depends on exactly who winds up in our sample

9  Will the sample mean be the same same for every sample?

10  No, because it depends on exactly who winds up in our sample

11  Get into groups (samples) of two, and calculate your average travel time

12 1. The sample mean is RANDOM  Depends on exactly who winds up in the sample

13  Do these samples give us reliable estimates of the population mean?

14  VERY SMALL -> Subject to a great deal of randomness

15  Groups of 3

16  Groups of 5

17  Groups of 10

18 1. The sample mean is RANDOM  Depends on exactly who winds up in the sample 2. The larger the sample, the more likely that the sample mean will be close to the population mean  In larger samples, the randomness tends to ‘average out’, meaning less random fluctuation from sample to sample  Larger samples give more reliable results

19  Because the sample mean is random, we can describe it using a probability distribution  I.e., for any given sample mean, there is some probability  And, we can talk about, ‘what is the probability that we get a sample mean in the range ______?’  Called the ‘sampling distribution’

20  Depending on the actual raw data distribution, the distribution of the sample mean can have many different shapes  In the next slide, we look at three different data distributions, and what the distribution of the sample means looks like ▪ When sample size, n, =2

21 Source: Dawson B, Trapp RG: Basic & Clinical Biostatistics, 4 th edition Raw Data Distribution of Sample Mean, n=2

22  Those distributions look strange!  But, as sample size increases, wonderful things happen:  First, the sample mean gets more accurate ▪ The distribution gets narrower ▪ I.e., the probability of getting a sample mean far from the real population mean is low  Second, the distribution changes shape

23 Source: Dawson B, Trapp RG: Basic & Clinical Biostatistics, 4 th edition Raw Data Distribution of Sample Mean, When n=2 When n=10 When n=30

24  As we take larger samples, the distribution of the sample mean approaches the normal distribution!  (Almost) regardless of the shape of the actual data!  Because of this, we can use what we have learned about the normal distribution to, e.g., judge how reliable/accurate our sample results are!

25  As discussed, if the sample size is large, the sampling distribution approaches the normal distribution  But, its not exactly equal to the normal distribution ▪ Especially if n is small!  For this reason, we have another distribution that we use, which is closely related

26  T distribution takes sample size into account  T is wider and flatter than normal  The smaller the sample, the wider and flatter! ▪ Reflecting that the information is less reliable ▪ I.e., that we are more likely to get a result far from the real population mean

27  T use the t-distribution we need to provide degrees of freedom  This is just n – 1 ▪ (Sample size – 1)

28  We can use the t-distribution to determine the probability of getting a mean in a given range, in the same way we used the normal distribution to find the probability of getting a value in a certain range

29  When using t, no built-in ‘one-step’ like norm.dist  2-step process 1. Convert the x-value(s) into t-scores ▪ Like z-scores! 2. Use the t-score(s) to look up the probability ▪ Using t.dist ▪ And the same structure: ‘Less than’ -> t.dist; ‘Greater than’ -> 1-t.dist; ‘Between’ -> t.dist(big) – t.dist(small)

30  Recall:  z = (value – mean)/SD  T-score:  t = (value – mean)/(SD/sqrt(n)) Divide standard deviation by square root of sample size The bigger the sample size, the bigger number you divide SD by -> Smaller SD -> less spread out/more accurate!

31  =t.dist(t-score, degrees of freedom, True)

32  I will walk you through an example, but first, we note that we cover this primarily so you will understand what comes later  Direct business applications (or at least, marketing applications) aren’t as common as for other techniques

33  Heights for a particular segment are normally distributed, with an average of 176 cm, and a standard deviation of 7.1 cm.  If you select an individual at random, what is the probability that he has a height greater than 180 cm?

34  Heights for a particular segment are normally distributed, with an average of 176 cm, and a standard deviation of 7.1 cm.  If you select an individual at random, what is the probability that he has a height greater than 180 cm?  =1 – norm.dist(180, 176, 7.1, true) ≈ 0.287

35  Heights for a particular segment are normally distributed, with an average of 176 cm, and a standard deviation of 7.1 cm.  If you select a random sample of size 5, what is the probability that the mean height is greater than 180 cm?

36  Heights for a particular segment are normally distributed, with an average of 176 cm, and a standard deviation of 7.1 cm.  If you select a random sample of size 5, what is the probability that the mean height is greater than 180 cm?  t = (180-176)/(7.1/sqrt(5)) = 1.259756607  prob =1 – t.dist(1.259756607, 7.1, true) ≈ 0.138

37  Repeat, with:  Sample size of 15  Sample size of 30  What happens to the probability?  Why?


Download ppt " At the beginning of the term, we talked about populations and samples  What are they?  Why do we take samples?"

Similar presentations


Ads by Google