Exploring word order in learner corpora: The WOSLAC Project Corpus Research Seminar Department of Linguistics.

Slides:



Advertisements
Similar presentations
Haiyang Ai, Gong Peng Graduate University, Chinese Academy of Sciences
Advertisements

Unit 9 Saving the earth Grammar--Inversion.
1 Word Order at the lexicon-syntax, syntax-discourse and syntax-phonology interfaces: L2 Acquisition of Verb-Subject Structures. GOING ROMANCE 2007 University.
Interlanguage IL LEC. 9.
The Chinese Room: Understanding and Correcting Machine Translation This work has been supported by NSF Grants IIS Solution: The Chinese Room Conclusions.
ROLE PLAY, DISCUSSION AND DRAMA.  How is group work organised so that students do not get too noisy or simply speak their L1? What about students who.
The 7th Annual Graduate Student Forum at the 41 st Annual TESOL Convention EFL College Student Comprehension Strategies Olga M. Galarraga Sánchez Universidad.
Postverbal subjects in L2 English: a corpus-based study ICLC, Santiago de Compostela 19 th September 2005 Amaya Mendikoetxea
BBN-ANG-253 Advanced Syntax Lecture Course Autumn, 2014/15
Second Language Acquisition
Chapter 4 Key Concepts.
Syntactic Processing in Second Language Production
Contrastive Analysis, Error Analysis, Interlanguage
Language Use and Understanding BCS 261 LIN 241 PSY 261 CLASS 12: BRANIGAN ET AL.: PRIMING.
Why study grammar? Knowledge of grammar facilitates language learning
Movement Markonah : Honey buns, there’s something I wanted to ask you
Project Proposal.
Syntax Lecture 10: Auxiliaries. Types of auxiliary verb Modal auxiliaries belong to the category of inflection – They are in complementary distribution.
Syntax Lecture 9: Verb Types 2.
Word Order Choices Chapter 12
Using corpora in SLA research: investigating word order Universidad Autónoma de Madrid WOSLAC project: 2 learner corpora CEDEL2WriCLE.
Linguistic Theory Lecture 8 Meaning and Grammar. A brief history In classical and traditional grammar not much distinction was made between grammar and.
Word Order in Second Language Acquisition Corpora
1 Introduction to Computability Theory Lecture12: Reductions Prof. Amos Israeli.
Some Linguistic Tools. Linguistic features are analysed at the sentence level often to explore: (i) Interpersonal meaning (ii) Ideational meaning (iii)
1 Three conditions for Verb-Subject order in non-native English: A corpus-based study TALC7 Université Paris 7 – Denis Diderot 3rd July 2006 Amaya Mendikoetxea.
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
Data-Driven South Asian Language Learning SALRC Pedagogy Workshop June 8, 2005 J. Scott Payne Penn State University
The lexicon-syntax interface and the syntax-discourse interface:
Psycholinguistics 12 Language Acquisition. Three variables of language acquisition Environmental Cognitive Innate.
Fundamentals: Linguistic principles
Lecture 1 Introduction: Linguistic Theory and Theories
1. Introduction Which rules to describe Form and Function Type versus Token 2 Discourse Grammar Appreciation.
Linguistic Theory Lecture 3 Movement. A brief history of movement Movements as ‘special rules’ proposed to capture facts that phrase structure rules cannot.
Emergence of Syntax. Introduction  One of the most important concerns of theoretical linguistics today represents the study of the acquisition of language.
Albert Gatt LIN 3098 Corpus Linguistics. In this lecture Some more on corpora and grammar Construction Grammar as a theoretical framework Collostructional.
CHAPTER 3: DEVELOPING LITERATURE REVIEW SKILLS
McEnery, T., Xiao, R. and Y.Tono Corpus-based language studies. Routledge. Unit A 2. Representativeness, balance and sampling (pp13-21)
Kinds of Sentence:. Kinds of Sentences: Sentences can be classified into five categories according to the meaning or function(s). They are:- 1.Assertive.
Linguistics, Pragmatics & Natural Grammar
UAM CorpusTool: An Overview Debopam Das Discourse Research Group Department of Linguistics Simon Fraser University Feb 5, 2014.
Prof. Karīna Aijmere ( Karin Aijmer ) Gēteborgas Universitāte, Zviedrija „Valodas apguvēju korpuss – tā veidošana un izmantošana valodu apguvē, mācību.
Syntax Lecture 8: Verb Types 1. Introduction We have seen: – The subject starts off close to the verb, but moves to specifier of IP – The verb starts.
IV. SYNTAX. 1.1 What is syntax? Syntax is the study of how sentences are structured, or in other words, it tries to state what words can be combined with.
Essay and Report Writing. Learning Outcomes After completing this course, students will be able to: Analyse essay questions effectively. Identify how.
The idea of transitivity: Relations and collaborations Tim Moore, Language and Learning Lab, Swinburne University Glenda Ballantyne, Sociology, Swinburne.
Tracking Language Development with Learner Corpora Xiaofei Lu CALPER 2010 Summer Workshop July 12, 2010.
Time, Tense and Aspect Rajat Kumar Mohanty Centre For Indian Language Technology Department of Computer Science and Engineering Indian.
Elaine Ménard & Margaret Smithglass School of Information Studies McGill University [Canada] July 5 th, 2011 Babel revisited: A taxonomy for ordinary images.
Capturing patterns of linguistic interaction in a parsed corpus A methodological case study Sean Wallis Survey of English Usage University College London.
C. Lawrence Zitnick Microsoft Research, Redmond Devi Parikh Virginia Tech Bringing Semantics Into Focus Using Visual.
Lina Bikelienė Vilnius University 3 September, 2010 Connector usage in advanced Lithuanian learners’ English writing.
Rules, Movement, Ambiguity
Introduction Chapter 1 Foundations of statistical natural language processing.
Topic and the Representation of Discourse Content
1 Syntax 1. 2 In your free time Look at the diagram again, and try to understand it. Phonetics Phonology Sounds of language Linguistics Grammar MorphologySyntax.
Passive Generalizations Li, Charles N. & Thompson, Sandra A. (1981). Mandarin Chinese - A Functional Reference Grammar. Los Angeles: University of California.
X-Bar Theory. The part of the grammar regulating the structure of phrases has come to be known as X'-theory (X’-bar theory'). X-bar theory brings out.
Chapter 3 Language Acquisition: A Linguistic Treatment Jang, HaYoung Biointelligence Laborotary Seoul National University.
Lec. 10.  In this section we explain which constituents of a sentence are minimally required, and why. We first provide an informal discussion and then.
Chapter 10 Language acquisition Language acquisition----refers to the child’s acquisition of his mother tongue, i.e. how the child comes to understand.
Behaviorism Until 1960s: contrastive analysis &
Use of Literature in Language Teaching
Collecting Written Data
Syntax Lecture 9: Verb Types 1.
Searching corpora.
Computational and Statistical Methods for Corpus Analysis: Overview
Using GOLD to Tracking L2 Development
Structure of a Lexicon Debasri Chakrabarti 13-May-19.
Presentation transcript:

Exploring word order in learner corpora: The WOSLAC Project Corpus Research Seminar Department of Linguistics & English Language Lancaster University 20/11/2006 Amaya Mendikoetxea, Universidad Autónoma de Madrid/Lancaster University

AIMS OF THE PRESENTATION To present the WOSLAC project: (i) its motivation and objectives, (ii) data collection, (iii) annotation and query software and (iv) data analysis. To inform on the results of a preliminary study on the production of inverted subjects in non-native English (Spanish learners).

PART I The WOSLAC Project: objectives To determine the properties that constrain word order in non-native grammars (L2): Spanish L1 – English L2 & English L1 – Spanish L2. a)Lexicon-syntax interface: how the lexical properties of verbs are represented in the syntax (syntactic realization of arguments and adjuncts). b)Syntax-discourse interface: the relevance of information structure notions such as topic (given/old/retrievable information) and focus (new/non-retrievable information) in word order in L2 grammars ENGLISH and SPANISH differ in devices employed for constituent ordering: English fixed order is determined by lexico-syntactic properties and Spanish free order is determine by information structure, syntax-discourse properties.

DATA COLLECTION (1):WriCLE WriCLE Written Corpus of Learner English L1 Spanish - L2 English Target: 1 million words So far: 250 essays = words Learners: 1st and 3rd yr. students of English at the UAM. Essays: around words written for the EAP course. Data gathered: a) Essay, b) Learner profile, c) Essay profile and d) Oxford Quick Placement Test

DATA COLLECTION (2): CEDEL2 CEDEL2 Corpus Escrito del Español L2 L1 English - L2 Spanish Target: 1 million words So far: words Learners: University students of Spanish in USA, UK, Australia & Spain. Essays: descriptive and argumentative essays from about 500 words. Data gathered: online collection of a) Essay, c) Learning background and d) Spanish Placement Test (Wisconsin)

CEDEL2 (online)

SOFTWARE: UAM CorpusTool UAM CorpusTool (Mick ODonnell) can be used as a coder and a searcher The tool allows a analyst to select a text from the corpus, and annotate it in various ways. For instance, the analyst can highlight a segment (e.g., an it-cleft) and then assign features to that segment. The tool produces an XML-encoded version of the text file, including the features assigned to the segments. Because hand-annotation is slow, the tool will allow the analyst to associate lexico-syntactic patterns with each feature, allowing the tool to automatically detect instances of the pattern. For instance, a pattern like: it be# NP that would match sentences in the corpus like It was John that we saw, and tentatively mark them with the feature it-cleft. The tool would then ask the user to eliminate false matches. This approach eliminates much of the corpus annotation effort.

SOFTWARE: UAM CorpusTool

DATA ANALYSIS: STRUCTURES Word-order phenomenon Left periphery Preposing Left dislocation Right periphery Postposing Right dislocation Other Passive Inversion There- construction Dative alternation Phrasal verb Cleft Extraposition

DATA ANALYSIS: FRAMEWORK Comparative Framework : to determine the role of L1 in L2 acquisition (transfer) in the areas under study: L1 properties L2 properties Universal Grammar We adopt some methodological aspects of CIA: Contrastive Interlanguage Approach (see, e.g. Granger 1996 and Gilquin 2001) (a) NNS vs. NS: non-native vs. native data. It involves a detailed analysis of linguistic features in native and non-native corpora to uncover and study non-native features in the speech and writing of (advanced) non-native speakers. This includes errors, but it is conceptually wider as it seeks to identify overuse and underuse of certain linguistic features and patterns. (b) NNS vs. NNS: different non-native data. By comparing learner data from different L1 backgrounds, we can gain a better understanding of interlanguage processes and features, such as those which are the result of transfer or those which are developmental, common to learners with different L1. –Descriptive and inferential statistics

DATA ANALYSIS: FRAMEWORK Formal and functional features interact in the structures under consideration. Formal and functional approaches are therefore essential for the understanding of SLA data. At the same time, data from non-native grammars is potentially significant for the understanding of linguistic phenomena in native grammars

CONTRIBUTIONS TO THE FIELD Linguistic Theory: better understanding of interfaces (lexicon-syntax, syntax-discourse and syntax-phonology). L2 acquisition: better understanding of transfer and non-transfer phenomena. Corpus studies: use of corpora for the study of formal features. Creation of the first Spanish learner corpus. Pedagogy: better understanding of word order errors.

PART II Postverbal subjects in learner English Lozano & Mendikoetxea (in press) Postverbal subjects at the interfaces in Spanish and Italian learners of L2 English: a corpus analysis. In G. Guilquin, M.B. Díaz-Bedmar and S. Papp, Linking Contrastive and Learner Corpus Research. Amsterdam: Rodopi. Postverbal subjects L1 Spanish/L1 Italian – L2 English ICLE (International Corpus of Learner English) Interfaces: lexicon-syntax syntax-discourse syntax-phonology What are the conditions under which learners produce inverted subjects, regardless of problems to do with grammaticalisation?

Word Order in L1 English (1) Fixed SV(O) order- Restricted use of postverbal subjects: a) XP V S (i) XP is an adverbial element, typically expressing time or place and linking the sentence to the prior discourse (ii) V is an intransitive verb, typically expressing existence or appearance on the scene (= unaccusative) (iii) S is often syntactically/phonologically heavy consisting of a noun and a variety of pre and/or postmodifiers, which introduce new information in the discourse. (1) Michael puts loose papers like class outlines in the large file-size pocket. He keeps his checkbook handy in one of the three compact pockets. The six pen and pencil pockets are always full and go. [Lands End March 1989 catalog. p. 95] (Birner 1994: 254)

Word Order in L1 English (2) b) There-constructions (2) a. Somewhere deep inside [there] arose a desperate hope that he would embrace her [FICT ] b. In all such relations [there] exists a set of mutual obligations in the instrumental and economic fields [ACAD] c. [There] came a roar of pure delight as…. [FICT] [Biber et al. 1999: 945]

Word order in L1 English (sum) Lexicon-syntax interface (Levin & Rappaport-Hovav, etc): –Unaccusative Hypothesis (Burzio 1986, etc) *There sang four girls at the opera. [ unergative verb] There arrived four girls at the station. [ unaccusative verb] Syntax-discourse interface (Biber et al, Birner 1994, etc): –Postverbal material tends to be focus (new info) We have complimentary soft drinks and coffee. Also complimentary is red and white wine. Syntax-Phonological Form (PF) interface (Arnold et al, etc) –Heavy material is sentence-final (Principle of End-Weight, Quirk et al. 1972): That money is important is obvious. It is obvious that money is important. Subjects which are focus, long and complex tend to occur postverbally in those structures which allow them.

Word Order in L1 Spanish (1) Postverbal subjects are produced freely with all verb classes (as part of the cluster or properties associated with the Null Subject Parameter) : (3)a.Ha telefoneado María al presidente. (transitive). Has phoned Mary the president b. Ha hablado Juan. (unergative) has spoken Juan c. Ha llegado Juan. (unaccusative) has arrived Juan

Word Order in Spanish (2) Inversion as focalisation: preverbal subjects are topics (given information) and postverbal subjects are focus (new information) (4) ¿Quién ha llegado/hablado? Who has arrived/spoken? i.Ha llegado/hablado Juan ii.#Juan ha llegado/hablado The occurrence of postverbal subjects in Spanish is determined by syntax-discourse properties (they are focus) and syntax-phonology properties (heavy subjects show a tendency to be postposed– a universal language processing mechanism: placing complex elements at the end reduces the processing burden)

Previous L2 findings Production of postverbal subjects in L2 English (Rutherford 1989, Oshita 2004) L1 Spanish – L2 English: (6) …it arrived the day of his departure … (7) And then at last comes the great day. (8) In every country exist criminals (9) …after a few minutes arrive the girlfriend with his family too. Only with unaccusative verbs (never with unergatives). Unaccusatives: arrive, happen, exist, come, appear, live… Explanation: syntax-lexicon interface (Unaccusative Hypothesis) Previous studies focused on ERRORS, thus emphasising the differences between native and non-native structures. Our study emphasises the similarities between native and non-native structures licensing conditions are the same.

Hypotheses GENERAL HYPOTHESIS: –Conditions licensing VS in L2 Eng are the same as those in Native Eng, DESPITE differences in grammaticalisation. SPECIFIC HYPOTHESES: –H1: Lexicon-syntax interface : Postverbal subjects with unaccs (never with unergs) –H2: Syntax-PF interface : Postverbal subjects: heavy (NOT light) –H3: Syntax-Discourse interface : Postverbal subjects: focus (NOT topic)

Method Learner corpus: L1 Spa – L2 Eng –ICLE Spanish subcorpus (Granger et al. 2002) –UAM-ICLE corpus [ICLE] Problem: proficiency level?? WordSmith v. 4.0 (Scott 2004) Excel, SPSS v Concordance queries can be performed automatically with WordSmith, by targetting specific verbs BUT there is a lot of manual work (filtering out unusable data, coding data in Excel, analysing data in SPSS, etc).

Data analysis Based on Levin (1993) and Levin & Rappaport-Hovav (1995): –Unergatives: cough, cry, shout, speak, walk, dance… [TOTAL: 41] –Unaccusatives: exist, live, appear, emerge, happen, arrive… [TOTAL: 34] WordSmith: query searches: –For every lemma (e.g., APPEAR, ARISE), we searched for: All possible native forms: –appear, appears, appearing, appeared –arise, arises, arising, arose, arisen All posible overregularised and overgeneralised learner forms: –arised, arosed,arisened, arosened (So arised the Saint Inquisition) All possible forms with probable L1 transfer of spelling: –apear, apears, apearing, apeared All other possible misspelled forms: –appeard, apeard

Data analysis (contd) CONCORDANCES: RAW OUTPUT –Thousands of concordances, BUT approx. ¾ were unusable. –Filtering criteria had to be applied manually.

Data analysis (contd) CONCORDANCES: 6 BASIC FILTERING CRITERIA: The verb must be intransitive (unergative or unaccusative). In the screen of the television one or two rombos should appear. [unac] Leontes cries and the statue talks. [unerg] This governments movement has created several opinions. [trans] The verb must be finite, with(out) aux. …also it exists the psychological agresssions… [finite no aux] … the cases of men mistreated do not appear in the media. [finite aux] This contradiction could disappear [finite modal] Theres no reason for it to exist. [for clause + to inf] Poor people cross borders to escape from poverty. [to-inf clause] …let time pass… [let constructions] …make everyones life go ahead [causative + infinitive] Returning to the title of this paper,… [gerundive clauses] …they go away in order to escape to France. [in order to clauses] …women have to live with the agressor [have to/ought to/able to] …prudence was beginning to disappear. [verbal/aspectual periphrases] Before entering the argumentation,… [small clauses] …instead of following… [complement of P] …likely to happen… [complement of A] The tests to enter the army are quite difficult now. [complement of N]

9. Data analysis (contd) The verb must be in the active voice. This contradiction could disappear. [active unaccusative] This situation has already been happened. [passivised unaccusative] The subject must be an NP. …it arose [diverse social ranks, the rich and the poor that depended on the property they had]. [inverted NP subject] …it only remains [to add that nowadays we live in a world…] [extraposition] It happened [that the countries which make the weapons are…] [extraposition] The sentence can be either grammatical or ungrammatical in native English. This contradiction could disappear. [gram] …it wont exist nothing of what people dont get bored or tired. [ungram] The subject can appear either postverbally (VS) or preverbally (SV). …the real problem appears when they have to look for their first job. [SV] So arised the Saint Inquisition. [VS]

10. Data analysis (contd) OTHER FILTERING CRITERIA Target V + V (verbal coordination) – Families without father exist and work well. Coordinator + target V – …we can manage to obtain it and live in a better world. Interrogatives (only if V is the target) – How could they live? – Does exist then a manipulation of television? Formulaic & Set expressions in English – As sometimes happens… – …fall victim to… – …the world we live in. Set expressions transferred from the L1 – …it happens the same. – …they fall into account that they have treated very badly Mr Hardcastle. Phrasal verbs: – …a scientist come up with an intention… Quotes (literary or other): – To what purpose, April, do you return again? – Feminism has to evolved or die, Friedan said in 1982…

11. Data analysis (contd) OTHER FILTERING CRITERIA (CONTD) Transitive alternants (unacs): – Rosamond lived a very comfortable life. – …once you have passed this stage. – …the University of Pennsylvania developed the electronic calculator. Causativizations (unacs): – …how parents grew their children. – But this idea could rise the question of… Verbs that do not belong to the proposed semantic criteria by Levin & Rappaport-Hovav: – …social classes appear to be broken. [appearance] – …we come to know about his personality… [inherently directed motion] Subject relative clauses: – …those fantastic relatives that still survive. –..events of this kind which occurred in Spain. Free relative clauses: – …trying to imagine what will remain… – Hastings realizes what is happening… Predicative complements: – Theatres remained closed. – …men appear completely subordinated to the womens desires.

Data coding/analysis: EXCEL

Data analysis: preliminary descriptive stats - EXCEL

Result: VS and specific unaccusative verbs

Results: types of VS structures produced Locative inversion: –In the main plot appear the main characters: Volpone and Mosca. There-insertion: –There exist positive means of earning money. AdvP-insertion: –… and here emerges the problem. * it-insertion: –*In the name of religion it had occurred many important events… * XP-insertion: –*In 1760 occurs the restoration of Charles II in England. * Ø-insertion: –…*because exist the science technology and the industrialisation. GRAMM. UNGRAM.

Result: Type of VS structures

Data analysis – inferential stats: SPSS

H1: Results: VS and unaccusativity

H2: Result: VS and weight HEAVY Against this society drama emerged an opposition headed by Oscar Wilde and Bernard Shaw. …so came the decline of the theatre. Then come the necessity to earn more. LIGHT So arised the Saint Inquisition … …and from there began a fire. Still today … exists the bloody fights. Syntactic weight has to be measured manually according to some theoretical criteria

H2: Result: SV and weight HEAVY … the cases of men mistreated do not appear in the media… … a disintegration of culture, tradition and society would begin… … the utopian societies created by the early socialists appeared. LIGHT …but they may appear everywhere. …since the day eventually came… … these people should exist, …

H3: Result: VS and discourse FOCUS …there also exists a wide variety of optional channels which have to be paid. So arised the Saint Inquisition. In 1880 it begun the experiments whose result was the appearance of the television some years later. TOPIC …our modern world, dominated by science and technology and industrialisation …because exist the science technology and the industrialisation. Discourse status (topic/focus) has to be measured manually by establishing theoretical criteria and then by checking the context (or even the essay) manually

H3: Result: SV and discourse TOPIC I use the Internet … I find windows … if they press on any of these windows … these windows cannot appear because a child could enter easily… …the world of drugs: mafias … problems with mafias finished … dangerous people making money … no reason why these people should exist.

Summary/Conclusion Lexicon-syntax V unacc NP subj Syntax-discourseFOCUS Syntax-PF HEAVY NP subj V unacc Syntax-discourseTOPIC Syntax-PFLIGHT V S S V

TO DO LIST Extend our search to the V be (the most commonly found V in inversion structures). Compare our results with those obtained from an equivalent native English corpus:LOCNESS, LANCAWE. Compare our results with those obtained from an equivalent native Spanish corpus (non- existent)

Thank you!