Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture #27 Tuesday, November 29, 2016 Textbook: 15.1

Similar presentations


Presentation on theme: "Lecture #27 Tuesday, November 29, 2016 Textbook: 15.1"— Presentation transcript:

1 Lecture #27 Tuesday, November 29, 2016 Textbook: 15.1
Statistics 200 Lecture #27 Tuesday, November 29, 2016 Textbook: 15.1 Objectives: • Review hypothesis testing for a relationship between two categorical variables in a population. • Extend techniques beyond the 2x2 table case. • Predict how larger samples will influence chi-square statistic.

2 Motivating Example: Have you ever missed class because of alcohol?
1. Have you ever missed class because you were drinking the night before? We asked a random sample of PSU students the following questions: What kind of variables are these? Quantitative Categorical 2. What year are you in school? a. Freshman b. Sophomore c. Junior d. Senior

3 2 Way Contingency Tables
Summarizing responses: Generic notation: (explanatory)×(response) table Example: (4 × 2) table implies: explanatory variable: ___ levels response variable: ____ levels Yr Yes No Total Fr 15 85 100 So 25 75 Jr 30 70 Se 50 4 2

4 Have you ever missed class because of alcohol?
PSU Students n1 = 100 Fr n2 = 100 So n3 = 100 Jr n4 = 100 Se Which is the explanatory variable? Year in school. Missed class or not Student Yr Yes No Total Fr 15 85 100 So 25 75 Jr 30 70 Se 50 Response Variable: _______ answer k = _____ samples Samples are… Dependent Independent Y / N 4 ___ x ____ table

5 Null Hypothesis: H0 Starting Position: __________ is happening Nothing
Null: With (2 way Tables): In the ______________ there is: ____ relationship between the two variables ____ difference in the groups population No No Statistical Hypotheses: never include statements about ________ sample Never trying to prove the ______ is true null

6 Alternative Hypothesis: Ha
Challenging Position: __________ is happening Something Goal: hope the _____ support the alternative data Alternative: With (2 × way Tables): In the population, there is: _____ relationship between the two variables; _____ difference in the groups some some

7 We can state hypotheses two ways
Null Hypothesis: H0: In the population, no relationship between school year and the likelihood of missing class because of alcohol. H0: p1 = p2 = p3 = p4 Alternative Hypothesis: Ha: In the population, a relationship exists between school year and the likelihood of missing class because of alcohol Ha: at least one pi differs

8 Do descriptive results show support for Ha?
Row Percents Yr Yes No Total Fr 15 (15%) 85 100 So 25 (25%) 75 Jr 30 (30%) 70 Se 50 (50%) 50 120 280 400 Yes – those percents look different for the 4 groups

9 We quantify evidence for the alternative using…
Chi-Square statistic Next step: Find chi-square statistic using sample data This statistic measures the difference in the __________ and ___________ counts in the contingency table. Calculate using sample data observed expected

10 Chi-square Contribution =
Expected count = Yr Yes No Total Fr 15 (15%) E: Chi: 85 100 So 25 (25%) 75 Jr 30 (30%) 70 Se 50 (50%) 50 120 280 400 Chi-square Contribution = Get Chi-square statistic by summing contributions from all the cells

11 Check the necessary conditions for using a chi-square test (page 603)
Yr Yes No Total Fr 15 (15%) E: Chi: 85 100 So 25 (25%) 75 Jr 30 (30%) 70 Se 50 (50%) 50 120 280 400 Are all expected counts at least 1? Are 80% or more of the expected counts at least 5? If the answer is yes to both, we’re safe; the sample is large enough to make the test work well.

12 How many df for the chi-Square statistic?
Yr Yes No Total Fr 15 (15%) 85 100 So 25 (25%) 75 Jr 30 (30%) 70 Se 50 (50%) 50 120 280 400 The chi-square statistic follows a chi-square distribution with a specific number of degrees of freedom (df ). df = (r – 1)(c – 1) r = # row variable categories d = # column variable categories For our example, df = 1 B. 2 C D. 4

13 Chi-Square Results df = (r – 1)x (c – 1)
Rows: SchoolYr Columns: Missed_Class Yes No All Fr So Jr Se All Cell Contents: Count Expected count Contribution to Chi-square Pearson Chi-Square = , DF = 3, P-Value = 0.000 df = (r – 1)x (c – 1)

14 Analysis & Conclusion one differs H0: p1 = p2 = p3 = p4
Ha: at least one pi differs Chi-Sq = & p-value = 0.000 Conclusions: We can claim that a relationship exists between school year and the likelihood of having missed class because of alcohol in the population. at least _____ population proportion of yeses _________ when comparing the four school years one differs

15 New Example: Was PSU your first choice? * no * yes How many college applications did you submit? * (1-2) * (3-5) * (6-8) * (≥ 9)

16 Example 2: Two Different Categorical Variables
Variable 4 levels Not a binomial Identify Variables: explanatory: PSU first choice response: number of applications submitted This is an example of a ___ x ___ table 2 4 PSU First Choice (1 – 2) (3 – 5) (6 – 8) (≥ 9) Total No 12 60 29 19 120 Yes 100 128 41 11 280 112 188 70 30 401

17 Row percents for each level of explanatory variable
(1 – 2) (3 – 5) (6 – 8) (≥ 9) Total No (10%) (50%) (24%) (16%) (100%) Yes (36%) (46%) (15%) (4%) Descriptively: what do the row percents suggest? They suggest a difference

18 State Hypotheses Null Hypothesis:
Ho: no relationship between PSU choice and number of college applications submitted in the population H0: distributions are _____ the same for the two PSU choices both Alternative Hypothesis: Ha: relationship exists between PSU choice and number of college applications submitted in the population Ha: distributions are ___ ____the same for two the PSU choices not both

19 Minitab Output: Stat > Tables > Cross tabulation & Chi-square
Is there a statistically significant relationship found? ________ _________ distributions are found Rows: PSU_Choice Columns: College_Applied (1-2) (3-5) (6-8) (=>9) All No Yes All Cell Contents: Count Expected count Pearson Chi-Square = , DF = 3, P-Value = 0.000 Yes! Different

20 Example 2: Two Different Categorical Variables
PSU First Choice (1 – 2) (3 – 5) (6 – 8) (≥ 9) Total No 12 60 29 19 120 Yes 100 128 41 11 280 112 188 70 30 401 Pearson chi-square = Double all counts to get: PSU First Choice (1 – 2) (3 – 5) (6 – 8) (≥ 9) Total No 24 120 58 38 240 Yes 200 256 82 22 560 224 376 140 60 802 Pearson chi-square = , which is exactly doubled!

21 Yet another pair of categorical variables:
How satisfied are you with your overall appearance? • (very) • (somewhat) • (not much) What do you best like to wear when going to and from classes? • shorts • athletic wear • jeans • other Explanatory Variable: Response Variable: Table: __ x __ Satisfaction Which clothes

22 What does the pattern of row percents show us?
Very similar? Very different? Look for significance using Chi-square statistic to decide!

23 Inferentially no support for a relationship in the population
H0: No relationship, or no difference among the three distributions. Ha: Relationship exists, or some difference exists among the three distributions. Pearson Chi-Square = 6.34, DF = 6, P-Value = 0.386 NOT Can _____ claim that there is a statistically significant relationship or that different distributions are found We cannot rule out randomness as a plausible explanation for what has happened.

24 If you understand today’s lecture…
15.1, 15.3, 15.11, 15.17, 15.19, 15.47, 15.49, 15.59 Objectives: • Review hypothesis testing for a relationship between two categorical variables in a population. • Extend techniques beyond the 2x2 table case. • Predict how larger samples will influence chi-square statistic.


Download ppt "Lecture #27 Tuesday, November 29, 2016 Textbook: 15.1"

Similar presentations


Ads by Google