Quantitative methods and R – (2) LING115 December 2, 2009.

Quantitative methods and R – (2) LING115 December 2, 2009

Two sample t-test Check if the means of two samples are different – Calculate the difference between the two means – Normalize it by the standard error Which standard error to use becomes an issue since there are two samples

Two sample t-test – (2) If the two sample variances are roughly the same, pool the two sample variances and then estimate SE – Weighted average of the two variances, each of which is weighted by its degrees of freedom (n-1) df = n a + n b -2

Two sample t-test – (3) If the two sample variances are not the same, SE is estimated by the following formula: Degrees of freedom is calculated differently depending on the sample size – If both samples consist of more than 30 data points, we can use the normal distribution as the distribution of t-scores – If not, estimate the degrees of freedom by the following formula:

Two sample t-test in R $ cd /home/ling115/r $ R

Two sample t-test in R – (2)

Paired t-test Some data make more sense when paired – F1 of a set of vowels from males and females – Difference in frequency of the same set of words between two corpora – Can control for the variation due to the factor by which observations are paired (e.g. what the vowel is) Calculate the difference in score for each pair Run one sample t-test to see if the mean difference value is different from zero

Paired t-test in R

Parametric vs. Nonparametric Parametric test (of statistical significance) – Assumes normal distribution – Data are measured in interval scales – Makes use of parameters such as mean, variance Nonparametric test – Does not assume normal distribution – Knowledge of parameters is not necessary – e.g. Wilcoxon test instead of t-test Shapiro-Wilk test for normality

ANOVA Data points are grouped by a factor with more than two levels – F1 of a set of vowels produced by speakers from five different dialect groups – Difference in frequency of the same set of words among ten corpora The goal of Analysis of Variance is to check if the differences among the means of different groups is greater than the differences among the observations in the data set generally

Variance among groups Calculate the mean for each group Calculate the overall mean of the data pooled from all groups Calculate the squared deviation of each group mean from the overall mean Multiply the squared deviation by the number of data points in each group so that the amount reflects the size of each group Add up the values (SS_group) Normalize SS_group by the degrees of freedom – df = number of groups minus one

Variance within the entire data On first thought, this may be the sample variance But this variance includes variance due to group difference So we want to get the variance of the whole with the variance due to group difference removed

Variance within the entire data – (2) Sum of squares of error (SS_error) – Method 1 Calculate the sum of squared deviations over all data points (SS_total) Subtract the sum of squared deviations due to group difference (SS_group), i.e. SS_total – SS_group – Method 2 Calculate the sum of squared deviations within each group Add up the SS Normalize SS_error by the degrees of freedom – df = number of data points minus the number of groups

ANOVA and F-ratio If variance among groups is different from variance within the entire data, we assume the group means are different F = VAR_group / VAR_error – F = 1 if the two variances are exactly the same – The more F is farther away from 1, the less likely it is that the two variances are the same

F-distribution Probability distribution of ratio of variances – Note that each variance will have its own degrees of freedom F=1 if two variances are the same The farther away F is from 1, the less likely it is that the two variances are the same F-distribution is sensitive to whether the population distribution is normal

F-distribution graph (from wikimedia)

Comparison of variance in R var.test(x,y) We can also calculate F-ratio by var(x)/var(y)

Two factors ANOVA we discussed so far assumed there is a single factor which divides the data points into multiple groups There may be more than one factor – Number of adjectives in each sentence – Length of each sentence

Interaction With two factors, assuming they are meaningful, there are three ways the factors can affect the value of an observation – Factor 1 – Factor 2 – Interaction of factor 1 and factor 2

Repeated measures ANOVA discussed so far assumes the data points are independent from each other As in the case of paired t-test, some data make more sense when observations are matched

Quantitative methods and R – (2) LING115 December 2, 2009.

Similar presentations

Presentation on theme: "Quantitative methods and R – (2) LING115 December 2, 2009."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Quantitative methods and R – (2) LING115 December 2, 2009.

Similar presentations

Presentation on theme: "Quantitative methods and R – (2) LING115 December 2, 2009."— Presentation transcript:

Similar presentations

About project

Feedback