Some statistical methods on syntactic variables in L1 writing Report from an ongoing study Bård Uri Jensen PhD student UiB / Hedmark University College.

Slides:



Advertisements
Similar presentations
Syntactic variables in pupils' writings: a comparison of hand-written and PC-written texts Bård Uri Jensen University of Bergen / Hedmark University College.
Advertisements

Syntactic variables in pupils' writings: distinctive features of keyboard-typed texts? Bård Uri Jensen University of Bergen / Hedmark University College.
F A C U L T Y O F H U M A N I T I E S U N I V E R S I T Y O F C O P E N H A G E N A Dictionary of Spoken Danish (ODT) The concept The dictionary ODT is.
Small differences. Two Proportion z-Interval and z-Tests.
 I would do school face to face because if I have any questions then I can ask the teacher. Also, because I learn more with a person I front of me.
Why Take EXPLORE? EXPLORE shows your strengths and weaknesses in English, mathematics, reading, and science. EXPLORE helps you search for careers and.
The Maryland Common Core Frameworks for Braille: Identifying the Next Generation Grade Level Braille Literacy Needs of Students Lisa Wright & Heather Johnson.
Simple Statistics for Corpus Linguistics Sean Wallis Survey of English Usage University College London
Internal Assessment Part II Only write notes for slides that have this moon on them.
Programming for Linguists
Unit 17 Avoiding gridlock
Tracking L2 Lexical and Syntactic Development Xiaofei Lu CALPER 2010 Summer Workshop July 14, 2010.
«A chi-square test showed that...» – or did it really? Bård Uri Jensen
David Palfreyman. Outline Qualitative data and how to analyze it. Your data Nvivo 8 2 March 2007David Palfreyman.
Passive comprehension by Slovak typically and atypically developing children Master thesis in psychology, Author: Radka Antalíková, Supervisor: Kristine.
An Analysis of Statistical Models and Features for Reading Difficulty Prediction Michael Heilman, Kevyn Collins-Thompson, Maxine Eskenazi Language Technologies.
Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.
Predicting Text Quality for Scientific Articles AAAI/SIGART-11 Doctoral Consortium Annie Louis : Louis A. and Nenkova A Automatically.
Statistics MP Oakes (1998) Statistics for corpus linguistics. Edinburgh University Press.
Extracting Social Meaning Identifying Interactional Style in Spoken Conversation Jurafsky et al ‘09 Presented by Laura Willson.
Chapter 14 Analyzing Quantitative Data. LEVELS OF MEASUREMENT Nominal Measurement Nominal Measurement Ordinal Measurement Ordinal Measurement Interval.
1 Practicals, Methodology & Statistics II Laura McAvinue School of Psychology Trinity College Dublin.
A hypothesis is a statement of expected relationship between two or more variables. - Theoretical and empirical justifications. - Testable. - Brief wording.
Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur Part-of-Speech Tagging and Chunking with Maximum Entropy Model Sandipan Dandapat.
Lecture 15: ANOVA Interactions
Research methods in corpus linguistics Xiaofei Lu.
Cracking the English Test. General Hints Do the questions in order, leaving the tougher rhetorical questions for the end. If you’re having trouble with.
Metaphor Analysis in Social Science: The problem Lynne Cameron and Rob Maslen.
McEnery, T., Xiao, R. and Y.Tono Corpus-based language studies. Routledge. Unit A 2. Representativeness, balance and sampling (pp13-21)
UAM CorpusTool: An Overview Debopam Das Discourse Research Group Department of Linguistics Simon Fraser University Feb 5, 2014.
Crosswalk Data Analysis Lynn White’s Stats Class Spring 2011 Add the names of all the team members to this first slide.
 What is the BNC?  What is Xaira?  How to use the BNC for: › Language teaching and learning › Research.
LREC 2010, Malta Maj Centre for Language Technology The DAD corpora and their uses Costanza Navarretta Funded by Danish Research.
MA in English Linguistics Experimental design and statistics Sean Wallis Survey of English Usage University College London
This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number.
Agenda 1.Why do we have the Common Core? 2.What are the literacy components of the Common Core? 3.How is the Common Core a change from past standards?
A Word at a Time: Computing Word Relatedness using Temporal Semantic Analysis Kira Radinsky (Technion) Eugene Agichtein (Emory) Evgeniy Gabrilovich (Yahoo!
Tracking Language Development with Learner Corpora Xiaofei Lu CALPER 2010 Summer Workshop July 12, 2010.
Grammatical Noriegas interaction in corpora and treebanks ICAME 30 Lancaster May 2009 Sean Wallis Survey of English Usage University College London.
1 Determining query types by analysing intonation.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging I Introduction Tagsets Approaches.
Structural Levels of Language Lecture 1. Ferdinand de Saussure  "Language is a system sui generis “ = a system where everything holds together  The.
STATISTICS FOR HIGH DIMENSIONAL BIOLOGICAL RECORDINGS Dr Cyril Pernet, Centre for Clinical Brain Sciences Brain Research Imaging Centre
Digital Image Processing
Qualitativ Research Interview. Interview Inter view – (from French “Entrevue”) an exchange of views between two people in a conversation about a topic.
LING 6520: Comparative Topics in Linguistics (from a computational perspective) Martha Palmer Jan 15,
NLP. Introduction to NLP (U)nderstanding and (G)eneration Language Computer (U) Language (G)
Differences between Spoken and Written Discourse
Putting it All Together Xiaofei Lu APLNG 596D July 17, 2009.
Grammar Bellringer #11 What is a sentence and fragment?
Pastra and Saggion, EACL 2003 Colouring Summaries BLEU Katerina Pastra and Horacio Saggion Department of Computer Science, Natural Language Processing.
Your ISU Thesis and Outline. Different Ways of Reading » You could “read” both novels with a literary theory. » For example… » Marxist » Psychological.
Introduction to Language and Society August 25. Areas in Linguistics Phonetics (sound) Phonology (sound in mind) Syntax (sentence structure) Morphology.
Personality Classification: Computational Intelligence in Psychology and Social Networks A. Kartelj, School of Mathematics, Belgrade V. Filipovic, School.
DATA ANALYSIS Data analysis helps discover and substantiate patterns and relationships, test our expectations, and draw inferences that make our research.
Identifying Expressions of Opinion in Context Eric Breck and Yejin Choi and Claire Cardie IJCAI 2007.
Practical use of TA elements
Assessing the impact on intercultural competencies when engineering students solve problems in multicultural teams Lars Peter Jensen, Associate Professor,
Security As a Service Value Proposition
Cracking the English Test
Cracking the English Test
What would be the IV and DV in each of these examples?
Universal Dependencies
Tabulations and Statistics
Universal Dependencies
Business Full Name: Biuro Uslug Turystycznych GRUPA DE-PL Contact Person: Szymon Kurkiewicz Full Business Address: Lodowa 31/1, Poznan, Poland Contact.
Using GOLD to Tracking L2 Development
Serious Privacy Game Workshop
What is a WebQuest? Guided search for information
Meni Adler and Michael Elhadad Ben Gurion University COLING-ACL 2006
Presentation transcript:

Some statistical methods on syntactic variables in L1 writing Report from an ongoing study Bård Uri Jensen PhD student UiB / Hedmark University College (Hamar) Solstrand

Contents Introducing the project The ELEV corpus vs the ASK corpus Extracting data Analysing data

My doctoral project Research question – Do people tend to make different grammatical choices when they type on keyboard rather than write by hand? Hypotheses – Higher production speed affects the choices in a ”spontaneous” direction – Skilled writers may utilise the enhanced functionality and shift features in the opposite direction – Other psychological factors may affect the choices motivational factors social media norms

The ELEV corpus A ”parallel” corpus of hand-written and keyboarded texts – Two texts by each pupil The ASK corpus system Manual syntactic segmentation – t-units – clauses – fragments No error tags

Alle mennesker er forskjellige, Kvinnfolk driver på data og gutter leser bøker Jeg liker å få på ski. Fordi det gir meg bedre kondisjon. All humans are different, Women use computers and boys read books I like cross-country skiing. Because it gives me better stamina.

drikk deg full. Er dette en sunn utvikling? get (yourself) drunk. Is this a healthy development?

Politiet vet det er folk under 18 som drikker der, The police know there are people under 18 who drink there,

Men hva med andre bøker? men veit da om flere jenter som ikke gjør det også! But what about other books? but [I] know about several girls who don’t do it also!

Er dette en sunn utvikling? Is this a healthy development?

Corpus searches [features='.* subst.*']; []* ; []{5,10} ; ([lemma='\$.']*[!lemma='\$.']){5,10} [lemma='\$.']* ;

Corpus searches : frontal subclauses [features='.* konj.*']? ( | | ) [];

Corpus searches : embedding [!clause]+ []* [!clause]+ ;

Corpus searches : lexical distribution [lemma!='\$.']; [features=".* verb.*"];

Statistics : Three examples Some simple analyses – differences of mean – correlations Classification analysis Clustering

Mean & correlation

Classification analysis Independent variables (parameters) – writing mode hand ~ keyboard – writing skills medium ~ high – gender – essay question Dependent variable – freq of attributive adjectives – subclause freq

YES

Cluster analysis About 50 dependent variables

References Baayen 2008: Analyzing linguistic : A practical introduction to statistics using R Dodge 2010: The concise encyclopedia of statistics Gries 2009: Statistics for linguistics with R : a practical introduction Zuur et al. 2009: A beginner’s guide to R

Bård Uri Jensen Hedmark University College (Hamar)