Presentation is loading. Please wait.

Presentation is loading. Please wait.

Proportion Data Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.

Similar presentations


Presentation on theme: "Proportion Data Harry R. Erwin, PhD School of Computing and Technology University of Sunderland."— Presentation transcript:

1 Proportion Data Harry R. Erwin, PhD School of Computing and Technology University of Sunderland

2 Resources Crawley, MJ (2005) Statistics: An Introduction Using R. Wiley. Freund, RJ, and WJ Wilson (1998) Regression Analysis, Academic Press. Gentle, JE (2002) Elements of Computational Statistics. Springer. Gonick, L., and Woollcott Smith (1993) A Cartoon Guide to Statistics. HarperResource (for fun).

3 Introduction These four demonstration sessions of this class address special types of data: –Counts –Proportions (this lecture) –Survival analysis –Binary responses

4 Frequencies and Proportions With frequency data, we know how often something happened, but not how often it didn’t happen. With proportion data, we know both. Applied to: –Mortality and infection rates –Response to clinical treatment –Voting –Sex ratios –Proportional response to experimental treatments

5 Working With Proportions Traditionally, proportion data was modelled by using the percentage as the response variable. This is bad for four reasons: 1.Errors are not normally distributed. 2.Non-constant variance. 3.Response is bounded by 0.0 and 1.0. 4.The size of the sample, n, is lost.

6 General Approach Use a general linear model ( glm ). family = binomial (i.e., unfair coin flip) Uses two vectors, one of the success counts and the other of the failure counts. number of failures + number of successes = binomial denominator, n y<-cbind(successes, failures) model<-glm(y~whatever,binomial)

7 How R Handles Proportions Weighted regression (weighted by the individual sample sizes). logit link to ensure linearity If percentage cover data –Do an arc-sine transformation, followed by conventional modelling (normal errors, constant variance). If percentage change in a continuous measurement –ANCOVA with final weight as the response and initial weight as a covariate, or –Use the relative growth rate (log(final/initial)) as response. –Both produce normal errors.

8 Tests To compare a single binomial proportion to a constant, use binom.test. To compare two samples, use prop.test. Only use the following methods for complex models: –Regression tables –Contingency tables

9 Count Data on Proportions R supports the usual arcsine and probit transformations: –arcsine makes the error distribution normal –probit linearises the relationship between percentage mortality and log(dose) However, it is usually better to use the logit transformation and assume you have binomial data.

10 Odds The logistic model for p as a function of x is: p = exp(a+bx)/(1 + exp(a+bx)) The book notes that this is obviously non-linear. To linearise it, consider instead the odds p/q (as in gambling, where q is 1-p): p/q = exp(a+bx) Or: ln(p/q) = a + bx ln(p/q) is called the logit transformation of p

11 R and logit R does not simply do a linear regression of ln(p/q) against x. It also handles: –non-constant binomial variance –logit(p) going to -  and + . –differences between sample sizes using weighted regression.

12 Over-dispersion and Hypothesis Testing Everything addressed earlier is still available for proportions data. This includes ANOVA, ANCOVA, and regression analysis. Significance is assessed using  2 tests. Hypothesis testing with binomial errors is less clear- cut than normal errors. Large samples (>30) are necessary. The degree to which the approximation is satisfactory is unknown. p will not be exactly known. Over-dispersion must usually be addressed. The residual scaled deviance should be about the residual df. Use family = quasibinomial for over-dispersion.

13 Book Examples See discussion of how to model with binomial errors. Logistic regression example. Categorical explanatory variables example. ANCOVA example.


Download ppt "Proportion Data Harry R. Erwin, PhD School of Computing and Technology University of Sunderland."

Similar presentations


Ads by Google