Outline.

Outline

Count data Sometimes, the data we have to analyze are produced by counting things. How many people choose each of Brands A, B, and C of coffee? Usually, we count things in a sample in order to make an inference to a population. E.g., are the proportions of people choosing each brand different from one another or from some hypothetical proportions in the population?

Count data To answer such questions, we need to know approximately how much difference between the various counts could be produced by sampling error. We determine that quantity using the ‘multinomial probability distribution,’ an extension of the binomial probability distribution.

Properties of the Multinomial Experiment
There are n identical trials There are k possible outcomes on each trial The probabilities of the outcomes are the same across trials Trials are all independent of each other The multinomial random variables are the k values n1, n2, …, nk.

Testing the null hypothesis
We often want to test the null hypothesis that all the categories are equal in frequency. If we asked 60 people which of Brands A, B, and C they prefer, equal frequency would look like this: A B C

At other times, we might want to test a specific null hypothesis, such as that B and C are equally popular, but A is twice as popular as either: A B C In both cases, we call the values shown the “expected values.”

The null hypothesis can be tested using the statistic Χ2. Χ2 = Σ[ni – E(ni)]2 E(ni) Χ2 increases as the observed values, ni, get further from the expected values E(ni).

Chi-square – example Suppose we want to know whether there is any population preference for brands of coffee among brands A, B, and C. If there is no preference, each brand should be chosen by ⅓ of people asked. A B C Suppose we ask 90 people for their preference: A B C

Expected vs. Observed Values
A B C Expected values – each value = ⅓ * 90 Expected values are the values expected under the null hypothesis (i.e., if the null hypothesis is true). Observed values are the values actually recorded when you ask people for their preference. Observed values

Chi-square – example Χ2 = Σ[ni – E(ni)]2 E(ni)
Χ2 = (15-30) (42-30) (33-30)2 = 12.6

Chi-square – the formal hypothesis test
HO: PA = PB = PC = ⅓ HA: Something different – at least one P > ⅓ Test statistic: Χ2 = Σ[ni – E(ni)]2 E(ni) where d.f. = (k-1; k = number of categories)

Chi-square – the formal hypothesis test
Rejection region: Χ2obt > Χ2crit = Χ2(.05, 2) = (note: rejection region is always > Χ2crit) Decision: since Χ2obt > Χ2crit, reject HO. Brands are not equally popular

Chi-square – Example 1 At a recent meeting of the Coin Flippers Society, each member flipped three coins simultaneously and the number of tails occurring was recorded. Shown below are the numbers of members who had certain numbers of tails. Is there evidence that the coin flipping outcomes were different from what would be expected if all the coins used were fair? (α = .01) Number of Tails Number of Members

Chi-square – Example 1 Shown below are the numbers of members who had certain numbers of tails. Number of tails = the categories people fall into Number of members = number of people in each category. Number of members is the dependent variable. Do you see why?

Chi-square – Example 1 To begin, we need to compute the expected values for each of the categories. That is, we need to figure out how many of our 500 members would fall into each category if all the coins used were fair. To do that, we use our knowledge of probability…

Chi-square – Example 1 How many possible outcomes are there for one trial? HHH HHT HTH THH HTT THT TTT

Chi-square – Example 1 Of these eight possible outcomes, how many involve getting 0 tails? Just one – HHH. How many involve getting 1 tail? 3 – HHT, HTH, THH. How many involve getting 2 tails? 3 – HTT, THT, TTH. How man involve getting 3 tails? 1 - TTT

Chi-square – Example 1 Thus, we have the following probabilities for the null hypothesis: HO: P0 = .125, P1 = .375, P2 = .375, P3 = .125 HA: At least one P is different from the value specified in HO. Test statistic: Χ2 = Σ[ni – E(ni)]2 E(ni)

Chi-square – Example 1 Rejection region: Χ2obt > Χ2crit = Χ2(.01, 3) = Now we compute the expected values using (a) the probabilities in HO and (b) our sample n: P0 * 500 = P3 * 500 = .125 * 500 = 62.5 P1 * 500 = P2 * 500 = .375 * 500 = 187.5

Chi-square – Example 1 Χ2 = [65–62.5]2 + [182–187.5]2 + [194–187.5]2 + [59–62.5]2 = Decision: Do not reject. There is no evidence that the coin flipping outcomes were different from what would be expected if all the coins used were fair.

Chi-square – Example 2 There is an “old wives’ tale” that babies don’t tend to be born randomly during the day but tend more to be born in the middle of the night, specifically between the hours of 1 AM and 5 AM. To investigate this, a researcher collects birth-time data from a large maternity hospital. The day was broken into 4 parts: Morning (5 AM to 1 PM), Mid-day (1 PM to 5 PM), Evening (5 PM to 1 AM), and Mid-night (1 AM to 5 AM). The number of births at these times for the last three months (January to March) are shown on the next slide.

Chi-square – Example 2 Morning 110 Mid-day 50 Evening 100
Mid-night 100 Does it appear that births are not randomly distributed throughout the day? (α = .01)

Chi-square – Example 2 The critical thing about a chi-square question is usually the expected values. In the previous example, we computed the expected values on the basis of probabilities of various outcomes for a fair coin. In this question, expected values for the number of births in each segment of the day will be based on one variable: how long in hours is each segment.

Chi-square – Example 2 Morning: 5 AM to 1 PM = 8 hours
Mid-day: 1 PM to 5 PM = 4 hours Evening: 5 PM to 1 AM = 8 hours Mid-night: 1 AM to 5 AM = 4 hours These periods are not all equal in length!

Chi-square – Example 2 If time of day was irrelevant to when babies are born, we would expect every period of, say, 4 hours to produce the same number of babies. Since the Morning and Evening segments each contain two 4-hour periods and the Mid-day and Midnight segments each contain one 4-hour period, our expected values will be: Morning Mid-day Evening Midnight 1/ / / /6

Chi-square – Example 2 Our sample totals 360 babies. In 1/6 of a day (4 hours) we would expect 360/6 = 60 babies to be born, under the null hypothesis, giving these expected values for the four segments of the day: Morning Mid-day Evening Midnight

Chi-square – Example 2 HO: Pmorn = 1/3, Pmidday = 1/6, Peven = 1/3, Pmidnight = 1/3 HA: At least one P different from value specified in HO. Test statistic: Χ2 = Σ[ni – E(ni)]2 E(ni)

Chi-square – Example 2 Rejection region: Χ2obt > Χ2crit = Χ2(.05, 3) = 7.81 Χ2obt = [ ]2 + … + [100-60]2 =

Chi-square – Example 2 Χ2obt = 32.50
Decision: Reject HO. Births are not randomly scattered throughout the day.

Outline.

Similar presentations

Presentation on theme: "Outline."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Outline.

Similar presentations

Presentation on theme: "Outline."— Presentation transcript:

Similar presentations

About project

Feedback