Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chi-Square Tests Categorical data 1-sample, compared to theoretical distribution –Goodness-of-Fit Test 2+ samples, 2+ levels of response variable –Chi-square.

Similar presentations


Presentation on theme: "Chi-Square Tests Categorical data 1-sample, compared to theoretical distribution –Goodness-of-Fit Test 2+ samples, 2+ levels of response variable –Chi-square."— Presentation transcript:

1 Chi-Square Tests Categorical data 1-sample, compared to theoretical distribution –Goodness-of-Fit Test 2+ samples, 2+ levels of response variable –Chi-square Test Chi-Square TestsSlide #1

2 Chi-squareSlide #2 Chi-Square -- Examples Does the dominant plants in plots differ between two locations? Does the frequency of females in majors differ between majors in the natural sciences, social sciences, and humanities? Does the occurrence of a food item in the stomachs of lake trout and chinook salmon differ?

3 Chi-squareSlide #3 What do those examples have in common? A categorical response variable –dominant plant in a plot –sex of student (male or female) –occurrence of a food item (Y/N) Compare response frequencies among >2 groups –between two locations –among three divisions –between lake trout and chinook salmon

4 Chi-squareSlide #4 An Illustrative Example When Chinook Salmon were first introduced to Lake Superior there was concern that they would compete with native Lake Trout for Lake Herring. Preliminarily, fisheries biologists classified the diets of 50 Lake Trout and 40 Chinook Salmon as containing Lake Herring or not. They found 36 Lake Trout and 24 Chinook Salmon contained Lake Herring. Test (at the 10% level) if there is a difference in the proportion of Lake Trout and Chinook Salmon that had Lake Herring.

5 Chi-squareSlide #5 Observed Table –Recall – “… the diets of 50 Lake Trout and 40 Chinook Salmon … found 36 Lake Trout and 24 Chinook Salmon contained Lake Herring” 50 40 3614 24 16 306090

6 Chi-squareSlide #6 Observed Table If there is no difference between rows (i.e., the H o ) then the total row could represent either row. Thus, the proportion of predator (regardless of type) that consumed Lake Herring is estimated to be 60/90 or 0.67

7 Chi-squareSlide #7 Expectations if H o is true If there is no difference and the common proportion is estimated by 0.67 then how many …. LT do we expect to have LH = 50*0.67 LT … … to not have LH = 50*0.33 CS … … to have LH = 40*0.67 CS … … to not have LH = 40*0.33

8 Chi-squareSlide #8 Create Expected Table LT to have LH = = 33.3 33.3

9 Chi-squareSlide #9 Create Expected Table LT to NOT have LH = = 16.7 16.7 26.7 33.3 13.3 16.7 Expected counts are the product of the marginal totals divided by the table total.

10 Chi-Square TestsSlide #10 A New Test Statistic df = (rows-1)*(cols-1)

11 Chi-Square TestsSlide #11 Chi-Square Distribution Right-skewed (all values are positive) Less sharply skewed with increasing df –df are related to the size of the table, not n All p-values are “right-ofs” – no “one- tailed” tests with chi-square Examine HO – page 1 01020304050 Chi-square Chi(3) Chi(10) Chi(20)

12 Chi-squareSlide #12 Chi-Square Test H o : “distribution of individuals into the levels is same for each population” H A : “distribution of individuals into levels is different for at least one pair of populations” Assume: at least 5 in each cell of expected table Statistic: Observed frequency table Test Statistic: df: (rows-1)*(columns-1) When: categorical variable, 2+ populations/groups

13 Chi-squareSlide #13 A Full Example When Chinook Salmon were first introduced to Lake Superior there was concern that they would compete with native Lake Trout for Lake Herring. Preliminarily, fisheries biologists classified the diets of 50 Lake Trout and 40 Chinook Salmon as containing Lake Herring or not. They found 36 Lake Trout and 24 Chinook Salmon contained Lake Herring. Test (at the 10% level) if there is a difference in the proportion of Lake Trout and Chinook Salmon that had Lake Herring.

14 Chi-squareSlide #14 Modification -- the researchers recorded what the dominant food item was. Do the dominant food items in Lake Trout and Chinook Salmon differ at the 5% level? See R HO Page 2. Another Full Example

15 Examine HO – Page 3 Chi-Square TestsSlide #15

16 R - Chi-SquareSlide #16 Example Data On the GSS, respondents were asked to state their opinion on how true the following statement was “All radioactivity is made by humans.” Respondents were also categorized by their highest educational degree. Use the results in the SciTest1.txt data file to determine, at the 5% level, if the response to the question differs among levels of education.

17 Chi-Square TestsSlide #17 Goodness-of-Fit Test Compare observed to theoretical frequencies of individuals in categories. Examples – –Test whether responses are “random” (e.g., preference) –Test Mendelian genetics (e.g., 3:1 and 9:3:3:1 theories). –Test use of available resources (e.g., compare habitat usage to availability).

18 Chi-Square TestsSlide #18 An Illustrative Example Determine, at the 10% level, if Northland students prefer the Chris Duarte Group (CDG), Ronnie Baker Brooks (RBB), or Bernard Allison (BA). Hypotheses? H a : “different # of students prefer each artist” H o : “same # of students prefer each artist”

19 Chi-Square TestsSlide #19 Under H o, what proportion prefer each artist? If n=78, how many students prefer each artist if H o is true? ArtistCDGRBBBA Freq26 1/3 26 An Illustrative Example Expected Table

20 Chi-Square TestsSlide #20 Suppose these results were obtained: ArtistCDGRBBBA Freq243816 Is there a preference – i.e., are these observations significantly different from what was expected when assuming no preference? An Illustrative Example Observed Table

21 Chi-Square TestsSlide #21 A New Test Statistic df = cells - 1

22 Chi-Square TestsSlide #22 ArtistCDGRBBBA #243816 ArtistCDGRBBBA #26  2 =  2 = 0.15 + 5.54 + 3.85 = 9.54 df = (3-1) = 2p-value = 0.00848 Conclusion? An Illustrative Example Observed Table Expected Table

23 Chi-Square TestsSlide #23 Goodness-of-Fit Test H o : distribution of individuals into levels follows the theoretical distribution H A : distribution of individuals into levels does NOT follow the theoretical distribution Sample: randomized, single variable of size n Assume: at least 5 in each cell of expected table Statistic: Observed frequency table

24 Chi-Square TestsSlide #24 Goodness-of-Fit Test Test Statistic: df: cells-1 Confidence Region: – where is sample proportion in level of interest

25 Examine HO – Page 5 Chi-Square TestsSlide #25

26 R - Chi-SquareSlide #26 A particular type of corn is known to have one of four types of kernels: purple-smooth, purple-wrinkled, yellow- smooth, and yellow-wrinkled. The cross between heterozygous individuals 1 should produce a 9:3:3:1 ratio (in same order of types). Of the kernels on a random cob 32 were purple-smooth 14 were purple-wrinkled 8 were yellow-smooth 4 were yellow-wrinkled Use the results to determine, at the 5% level, if the theoretical 9:3:3:1 ratio is upheld with these data. Example Data – Corn Genetics 1 i.e., PpSs where the purple (P) and smooth (S) alleles are dominant.

27 Chi-Square TestsSlide #27 A Full Example The leader of a local lakes association conducted a survey of all members of the association. One question on the survey was “What is your preferred method of receiving notices from the lakes association: by regular mail, by e-mail, by phone, by poster (at the local boat landing), or other?” Of the surveys returned, 47 respondents preferred regular mail, 63 e-mail, 17 phone, 73 by poster, and 8 some other method. OF THE RESPONDENTS THAT DID NOT PREFER SOME OTHER METHOD, is there evidence, at the 5% level, of a difference in the preferred method of contact?

28 Chi-Square TestsSlide #28 A Full Example In a randomly selected national sample of 1,007 adults, aged 18 and older, conducted Aug. 22-25, 2005, Gallup polls found that that 403 respondents approved of the way that George W. Bush was handling his presidency. In a previous sample (Aug. 8-11, 2005), 45% of the respondents approved of George W. Bush’s handling of the presidency. Assuming that this earlier value was true for the entire population, determine, at the 5% level, if the approval rating has changed by the Aug. 22-25, 2005 sample.


Download ppt "Chi-Square Tests Categorical data 1-sample, compared to theoretical distribution –Goodness-of-Fit Test 2+ samples, 2+ levels of response variable –Chi-square."

Similar presentations


Ads by Google