Analyzing Reading time data LabSyntax, 03/01/06 T. Florian Jaeger.

Analyzing Reading time data LabSyntax, 03/01/06 T. Florian Jaeger

[2] Self-paced RT studies A measure of processing complexity Say we have a hypothesis that some supposedly ungrammatical wh-orders are actually just hard to process (cf. superiority violations). As part of this hypothesis we predict that accessibility of the wh-fillers and accessibility of interveners result in more processing at the integration site (the verb)

[3] Copyright The following slides refer to a data set (downloadable along with these slides) that has been collected by the WH-Research Group, Linguistics Department, Stanford University. Please do not use, cite, or distribute any results based on that dataset (data-accessibility.rtm) without our permission. Email tiflo@stanford.edu or sag@stanford.edu for more details.tiflo@stanford.edu sag@stanford.edu

[4] Follow along You can follow along this tutorial presentation in R by downloading the dataset and the R script from: http://www.stanford.edu/~tiflo/?teaching/LabSyntax2006/materials/ The.cnd file is used to extract the results from linger and to define the regions of interest The.rtm file contains all the reading time data (including the practice items (see Lingeralyzer documentation)Lingeralyzer documentation The.r file contains the employed R script. I haven’t documented things carefully, but with some R experience you should be able to figure things out.

[5] Input file # prin2n3 1 BARE_BARE Mary wondered what who read but later the teacher told her. ?Did Mary want to know what was painted? N # prin2n3 1 BARE_WHICH Mary wondered what which student read but later the teacher told her. ?Did Mary want to know what was painted? N # prin2n3 1 WHICH_BARE Mary wondered which book who read but later the teacher told her. ?Did Mary want to know what was painted? N # prin2n3 1 WHICH_WHICH Mary wondered which book which student read but later the teacher told her. ?Did Mary want to know what was painted? N Stimulus identifier: # experimentID itemID conditionID Stimulus (regions separated by “|”; default: word-by- word Content question and answer (Y/N)

[6] Extracting results (.cnd) set COND_NAME "prin2n3 BARE_BARE" set ANOVA_FACTORS "WH1 WH2" set REGIONS {1:1-2 2:3-4 3:5-8 4:9-99} addCondition set COND_NAME "prin2n3 BARE_WHICH" set ANOVA_FACTORS "WH1 WH2" set REGIONS {1:1-2 2:3-5 3:6-9 4:10-99} addCondition set COND_NAME "prin2n3 WHICH_BARE" set ANOVA_FACTORS "WH1 WH2" set REGIONS {1:1-2 2:3-5 3:6-9 4:10-99} addCondition … 1 2 3 4 5 6 7 8 9 10 11 12 Mary wondered what which student read but later the teacher told her.

[7] Output file

[8] Import into R data <- read.table("C:/Documents and Settings/tiflo/Desktop/CLASS/RT-example/data- accessibility.rtm") colnames(data) <- c("expt","extraction","attachment","item","subj","orde r","position","word","region","rt","rtz","resrt","resrtz"," qa") Let’s do some data exploration and cleaning

[9] Testing the assumptions of ANOVA Homogeneity of variances: The variances of all conditions (and the variance of the error) are assumed to be identical.  Violations of this assumption are tolerable as long as the variances are correlated (cf. Howell, 1995:340-1) Normality: The dependent variable is assumed to be normally distributed within each condition.  ANOVA is relative robust against violations of normality Independence of observations: This assumption forces us to include subject and items as factors  repeated measures ANOVA; mixed effect models

Normality

[11] Outlier exclusion

[12] Transformations Reading times should be log- transformed (works also for magnitude estimation judgment)

[13] Normality check Within each condition, the dependent variable (logRT) is approximately normally distributed

Independence

[15] A simple regression data.verb <- subset(data.oe.clean, region== "3" & expt == "prin2n3") lm <- lm(logRT ~ filler*intervener, data= data.verb) summary(lm) Output: Estimate Std. Error t value Pr(>|t|) (Intercept) 5.75258 0.01590 361.910 < 2e-16 *** fillerBARE 0.06698 0.02248 2.980 0.00291 ** intervenerBARE 0.05232 0.02244 2.331 0.01980 * filBARE:intBARE -0.03980 0.03176 -1.253 0.21019 --- Multiple R-Squared: 0.004493, Adjusted R-squared: 0.003564 NB: Coefficients are given for logRT

[16] Overview

Overview

[18] Clusters in your data The assumption of independence is violated if clusters in your data are correlated  Several trials by the same subject  Several trials of the same item Do subjects really differ?

[19] Some example subjects lms1 <- lm(logRT ~ filler*intervener, data= data.verb, subset= subj== "1") lms2 <- lm(logRT ~ filler*intervener, data= data.verb, subset= subj== "2") lms3 <- lm(logRT ~ filler*intervener, data= data.verb, subset= subj== "3") coefficients(lms1) coefficients(lms2) coefficients(lms3)

[20] Three random subjects > coefficients(lms1) (Intercept) fillerBARE intBARE fillerBARE:intBARE 5.69316570 0.26279856 0.09832092 -0.12438628 > coefficients(lms2) (Intercept) fillerBARE intBARE fillerBARE:intBARE 5.76799982 -0.07026181 -0.03666451 0.21004255 > coefficients(lms3) (Intercept) fillerBARE intBARE fillerBARE:intBARE 6.23218256 0.15147899 0.01664294 -0.03748124

[21] Plotting data for all subjects (from Fox, 2002) trellis.device(color=F) xyplot(logRT ~ filler | subj, data=data.verb, main="Verb logRTs", ylim=c(5,7), panel=function(x, y){ panel.xyplot(x, y) #panel.loess(x, y, span=1) panel.lmline(x, y, lty=2) } )

[23] A more convenient way lmList (in package lme4) lmList(formula = logRT ~ filler * intervener | subj, data = data.verb) Coefficients: (Intercept) fillerBARE intBARE fillerBARE:intBARE 1 5.693166 0.262798559 0.098320919 -0.124386276 2 5.768000 -0.070261811 -0.036664515 0.210042552 3 6.232183 0.151478990 0.016642943 -0.037481241 4 5.835349 0.178951680 0.080414896 -0.318668839 5 5.717801 -0.006879702 0.035657065 0.264302142 6 5.569169 0.017304250 0.192764048 -0.048537682 7 5.299747 0.054687350 0.021888357 -0.184764732 8 5.667252 -0.013897366 -0.031297567 0.106345849 9 …

[24] Conclusion That’s why we do repeated measures or mixed effect analyses (to capture the differences between subjects as well as the commonalities of all trials by the same participant)

[25] Repeated Measures ANOVA in R data.verb.F1 <- aggregate(data.verb, by= list(subj= data.verb$subj, filler= data.verb$filler, intervener= data.verb$intervener), FUN= mean) data.verb.F2 <- aggregate(data.verb, by= list(item= data.verb$item, filler= data.verb$filler, intervener= data.verb$intervener), FUN= mean) F1 <- aov(logRT ~ filler*intervener + Error(subj/(filler*intervener)), data.verb.F1) F2 <- aov(logRT ~ filler*intervener + Error(item/(filler*intervener)), data.verb.F2) summary(F1) summary(F2)

Analyzing Reading time data LabSyntax, 03/01/06 T. Florian Jaeger.

Similar presentations

Presentation on theme: "Analyzing Reading time data LabSyntax, 03/01/06 T. Florian Jaeger."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Analyzing Reading time data LabSyntax, 03/01/06 T. Florian Jaeger.

Similar presentations

Presentation on theme: "Analyzing Reading time data LabSyntax, 03/01/06 T. Florian Jaeger."— Presentation transcript:

Similar presentations

About project

Feedback