4 normal probability plots at once par(mfrow=c(2,2)) for(i in 1:4) { qqnorm(dataframe[,1] [dataframe[,2]==i],ylab=“Data quantiles”) title(paste(“yourchoice”,i,sep=“”))}

4 normal probability plots at once par(mfrow=c(2,2)) for(i in 1:4) { qqnorm(dataframe[,1] [dataframe[,2]==i],ylab=“Data quantiles”) title(paste(“yourchoice”,i,sep=“”))} These plots can be produced by going to “file” and “new” and “script file”. Paste the commands into the script file window, press “F10” and the four plots are produced automatically. 4 histograms all at once Same as above, but instead of qqnorm, use hist, and you only need one column rather than dataframe 1 and 2. Also, don’t forget to change your label.

Lab: Chi-Squared Test (X 2 ) Lack of Fit November 10, 2000

History n Invented in 1900 n Oldest inference procedure still used in its original form n English statistician Karl Pearson

The X 2 Test n When you have data values for two categorical variables n Also called a two-way table n For example: men/women and NSOE track; regenerated seaweed (yes/no) and access level (limpet only/limpet and fish/etc).

Example: Why do Men and Women Participate in Sports? n Desire to win or do better than others –called social comparison n Desire to improve one’s skills or to do one’s best –called mastery

Data n Collected from 67 male and 67 female undergraduate students at a large university n Survey given asking about students’ sports goals. n Students were all categorized either high or low with regard to both of the questions: –high or low social comparison –high or low mastery Duda, Joan L., Leisures Sciences, 10(1988), pp. 95-106

Groups n This leads to four groups: –High social comparison, high mastery. –High social comparison, low mastery. –Low social comparison, high mastery –Low social comparison, low mastery n We want to compare this for men and women.

1. Add Totals Column: In this case, what population the observation comes from.. Row: Categorical response variable Grand total

A Cell A table with r rows and c columns contains r x c cells

X 2 is really an analysis of 5 things in this table: n Frequency (actual count) n Percent of overall total n Percent of row n Percent of column n Expected count

Frequency: Just the cell count

Overall Percent: Cell count divided by grand total 14/134=0.10 5. That is, 10.5% of all those studied were HSC- HM and female.

Row Percent: Cell count divided by row total 14/45=0.31 1 That is, of all those students reporting HSC- HM,31% were female.

Column Percent: Cell count divided by column total 14/67=0.20 9 That is, of all female student participant s, 21% were HSC- HM..

Expected Count n Coming later to a slide near you...

These percents are useful in graphical analysis. n Overall, row, and column percent can be calculated for each cell n Then questions of interest can be asked n We are interested in the effect of sex on sports goals. n In this case, we would examine the column percents

Column percents for sports goals

Surprise, surprise - we want to ask whether these apparently obvious differences are significant. n Can these differences be attributed to chance? n Calculate the chi-square and compare to a chi-square distribution n Determine the p-value n A low p-value means we reject our null hypothesis (sound familiar?)

The hypotheses: Null n No association exists between our row and our column variables –No association exists between sex and sports goals –The distributions of sports in the male and female populations are the same.

The hypotheses: Alternative n Alternative: An association exists between the row and column variables –No particular direction (not one- or two- sided) –The distributions of sports goals in the male and female populations are not all the same. –Includes many kinds of possible associations –“Men rate social comparison higher as a goal than do women”

OK: Now back to the Expected Count n If the null hypothesis were true, what would the count in each cell be? n For women in the HSC-HM cell, it would work like this: –33.6% of all respondents are HSC-HM –We have 67 women –So, if no sex difference exists (our null), we would expect that 33.6% of our 67 women would be HSC-HM --> 22.5 women.

Expected Count 1. 45/134=33.6 % of all respondent s are HSC- HM. 2. 33.6% of 67 women is 22.5.

Finally: The Chi-Squared Statistic Itself n Compare the entire set of observed counts with the set of expected counts. n Take the difference in each cell between observed and expected n Square each difference n Normalize these (divide by the expected count) n Sum over all cells.

The Formula: n Large values of X 2 provide evidence against the null hypothesis n A chi-square distribution is used to obtain the p-value n Degrees of freedom are (r-1)(c-1)

In this case... n Chi-squared = 24.898 on 3 df. n The p-value is less than 0.0005. n The chance of obtaining a chi-squared value greater than or equal to this due to chance alone is very small n Clear evidence against the null hypothesis n Strong evidence that female and male students have different distributions of sports goals.

Is that all you can say? n No, you can and should combine the test with a description that shows the relationship. –Percents in our earlier table and our graph –Summary comments: the percent fo males in each of the HSC goal classes is more than twice the percent of females. –The HSC-HM group contains 46% of the males, but only 21% of the females –The HSC-LM group contains 27% of the males and only 10% of the females –We conclude that males are more likely to be motivated by social comparison goals and females are more likely to be motivated by mastery goals.

Important to remember: n The approximation of the population chi- square by our estimate becomes more accurate as the cell counts increase. n For 2 x 2 tables, the expected count in each of the 4 cells must be five or higher. n For tables larger than 2 x 2, the average of the expected counts must be 5 or higher, and the smallest expected count must be 1 or more.

Important to remember: n This is sometimes called the chi-squared test for homogeneity or the chi-squared test of independence. n Although this is is one of the most widely used of statistical tools, it is also one of the least informative. –The only thing you produce is a p-value and there is no associated parameter to describe the degree of dependence –the alternative hypothesis is very general (that row and columns are not independent)

4 normal probability plots at once par(mfrow=c(2,2)) for(i in 1:4) { qqnorm(dataframe[,1] [dataframe[,2]==i],ylab=“Data quantiles”) title(paste(“yourchoice”,i,sep=“”))}

Similar presentations

Presentation on theme: "4 normal probability plots at once par(mfrow=c(2,2)) for(i in 1:4) { qqnorm(dataframe[,1] [dataframe[,2]==i],ylab=“Data quantiles”) title(paste(“yourchoice”,i,sep=“”))}"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

4 normal probability plots at once par(mfrow=c(2,2)) for(i in 1:4) { qqnorm(dataframe[,1] [dataframe[,2]==i],ylab=“Data quantiles”) title(paste(“yourchoice”,i,sep=“”))}

Similar presentations

Presentation on theme: "4 normal probability plots at once par(mfrow=c(2,2)) for(i in 1:4) { qqnorm(dataframe[,1] [dataframe[,2]==i],ylab=“Data quantiles”) title(paste(“yourchoice”,i,sep=“”))}"— Presentation transcript:

Similar presentations

About project

Feedback