Some statistical methods on syntactic variables in L1 writing Report from an ongoing study Bård Uri Jensen PhD student UiB / Hedmark University College (Hamar) Solstrand
Contents Introducing the project The ELEV corpus vs the ASK corpus Extracting data Analysing data
My doctoral project Research question – Do people tend to make different grammatical choices when they type on keyboard rather than write by hand? Hypotheses – Higher production speed affects the choices in a ”spontaneous” direction – Skilled writers may utilise the enhanced functionality and shift features in the opposite direction – Other psychological factors may affect the choices motivational factors social media norms
The ELEV corpus A ”parallel” corpus of hand-written and keyboarded texts – Two texts by each pupil The ASK corpus system Manual syntactic segmentation – t-units – clauses – fragments No error tags
Alle mennesker er forskjellige, Kvinnfolk driver på data og gutter leser bøker Jeg liker å få på ski. Fordi det gir meg bedre kondisjon. All humans are different, Women use computers and boys read books I like cross-country skiing. Because it gives me better stamina.
drikk deg full. Er dette en sunn utvikling? get (yourself) drunk. Is this a healthy development?
Politiet vet det er folk under 18 som drikker der, The police know there are people under 18 who drink there,
Men hva med andre bøker? men veit da om flere jenter som ikke gjør det også! But what about other books? but [I] know about several girls who don’t do it also!
Er dette en sunn utvikling? Is this a healthy development?
Corpus searches [features='.* subst.*']; []* ; []{5,10} ; ([lemma='\$.']*[!lemma='\$.']){5,10} [lemma='\$.']* ;
Corpus searches : frontal subclauses [features='.* konj.*']? ( | | ) [];
Corpus searches : embedding [!clause]+ []* [!clause]+ ;
Corpus searches : lexical distribution [lemma!='\$.']; [features=".* verb.*"];
Statistics : Three examples Some simple analyses – differences of mean – correlations Classification analysis Clustering
Mean & correlation
Classification analysis Independent variables (parameters) – writing mode hand ~ keyboard – writing skills medium ~ high – gender – essay question Dependent variable – freq of attributive adjectives – subclause freq
YES
Cluster analysis About 50 dependent variables
References Baayen 2008: Analyzing linguistic : A practical introduction to statistics using R Dodge 2010: The concise encyclopedia of statistics Gries 2009: Statistics for linguistics with R : a practical introduction Zuur et al. 2009: A beginner’s guide to R
Bård Uri Jensen Hedmark University College (Hamar)