Capturing patterns of linguistic interaction in a parsed corpus A methodological case study Sean Wallis Survey of English Usage University College London.

Slides:



Advertisements
Similar presentations
Z-squared: the origin and use of χ² - or - what I wish I had been told about statistics (but had to work out for myself) Sean Wallis Survey of English.
Advertisements

An investigation into Corpus-based learning about language inin the primary-school: CLLIP Corpus evidence of the features of childrens literature.
Troy University Troy, AL Trio - Student Support Services.
Simple Statistics for Corpus Linguistics Sean Wallis Survey of English Usage University College London
Key Stage 3 National Strategy Scientific enquiry Science.
Using the Crosscutting Concepts As conceptual tools when meeting an unfamiliar problem or phenomenon.
Introduction to phrases & clauses
Capturing linguistic interaction in a grammar A method for empirically evaluating the grammar of a parsed corpus Sean Wallis Survey of English Usage University.
What is a corpus?* A corpus is defined in terms of  form  purpose The word corpus is used to describe a collection of examples of language collected.
Natural Language Processing - Feature Structures - Feature Structures and Unification.
Term 1 Week 9 Syntax.
Corpus 06 Discourse Characteristics. Reasons why discourse studies are not corpus-based: 1. Many discourse features cannot be identified automatically.
Chapter Nine The Linguistic Approach: Language and Cognitive Science.
Basic Scientific Writing in English Lecture 3 Professor Ralph Kirby Faculty of Life Sciences Extension 7323 Room B322.
Corpora and Language Teaching
PRAGMATICS. 3- Pragmatics is the study of how more gets communicated than is said. It explores how a great deal of what is unsaid is recognized. 4.
PRESENTING NEW LANGUAGE STRUCTURE LANGUAGE STUDENTS ARE NOT ABLE TO USE YET LANGUAGE SHOULD BE PRESENTED IN CONTEXT CHARACTERISTICS TYPES SHOWS WHAT LANGUAGE.
Lecture 1 Introduction: Linguistic Theory and Theories
Linguistic Theory Lecture 3 Movement. A brief history of movement Movements as ‘special rules’ proposed to capture facts that phrase structure rules cannot.
The problem of sampling error in psychological research We previously noted that sampling error is problematic in psychological research because differences.
The ‘London Corpora’ projects - the benefits of hindsight - some lessons for diachronic corpus design Sean Wallis Survey of English Usage University College.
Albert Gatt LIN 3098 Corpus Linguistics. In this lecture Some more on corpora and grammar Construction Grammar as a theoretical framework Collostructional.
GRAMMAR APPROACH By: Katherine Marzán Concepción EDUC 413 Prof. Evelyn Lugo.
Literacy Secretariat Literacy is everyone’s business Introduction to the Australian Curriculum: English Literacy as a general capability.
What counts as evidence in linguistics?. WHAT IS UNIVERSAL GRAMMAR? A system of grammatical rules and constraints believed to underlie all natural languages.
Linguistics, Pragmatics & Natural Grammar
English Corpus Linguistics Introducing the Diachronic Corpus of Present-Day Spoken English (DCPSE) Sean Wallis UCL.
MA in English Linguistics Experimental design and statistics Sean Wallis Survey of English Usage University College London
Part One: Language Function “Reading, writing, speaking and listening, while different in many respects, are but parallel manifestations of the same vital.
Knowledge-rich approaches for text summarization Minna Vasankari
Incremental sentence production and ellipsis Incremental sentence production reduces the working memory capacity needed for advance planning: The planning.
3rd International Symposium on Teaching English at Tertiary Level Hong Kong, 9-10 June 2007 Jointly organised by: Department of English, The Hong Kong.
1 Statistical Parsing Chapter 14 October 2012 Lecture #9.
IV. SYNTAX. 1.1 What is syntax? Syntax is the study of how sentences are structured, or in other words, it tries to state what words can be combined with.
Scientific Inquiry & Skills
THE BIG PICTURE Basic Assumptions Linguistics is the empirical science that studies language (or linguistic behavior) Linguistics proposes theories (models)
Sequencing and Feedback in Teaching Grammar. Problems in Sequencing ► How do we sequence the grammar in a teaching programme? ► From easy to difficult?
Prof Cecilia Montorsi UNIT 1 SOME BASIC CONCEPTS BASED ON LOCK, Graham. Functional English Grammar. USA. CUP Pp 1-11.
Modifier (grammar) Definition: A word, phrase, or clause that functions as an adjective oradverb to provide additional information about another word or.
Grammatical Noriegas interaction in corpora and treebanks ICAME 30 Lancaster May 2009 Sean Wallis Survey of English Usage University College London.
MA in English Linguistics Experimental design and statistics II Sean Wallis Survey of English Usage University College London
An ICALL writing support system tunable to varying levels of learner initiative Karin Harbusch 1 & Gerard Kempen 2,3 1 University of Koblenz-Landau, Koblenz,
GrammaticalHierarchy in Information Flow Translation Grammatical Hierarchy in Information Flow Translation CAO Zhixi School of Foreign Studies, Lingnan.
Prof Cecilia Montorsi UNIT 1 SOME BASIC CONCEPTS BASED ON LOCK, Graham. Functional English Grammar. USA. CUP Pp 1-11.
Linguistics The third week. Chapter 1 Introduction 1.3 Some Major Concepts in Linguistics.
REFERENTIAL CHOICE AS A PROBABILISTIC MULTI-FACTORIAL PROCESS Andrej A. Kibrik, Grigorij B. Dobrov, Natalia V. Loukachevitch, Dmitrij A. Zalmanov
Ideas for 100K Word Data Set for Human and Machine Learning Lori Levin Alon Lavie Jaime Carbonell Language Technologies Institute Carnegie Mellon University.
Workshop: Corpus (1) What might a corpus of spoken data tell us about language? OLINCO 2014 Olomouc, Czech Republic, June 7 Sean Wallis Survey of English.
What you have learned and how you can use it : Grammars and Lexicons Parts I-III.
1 And yeah, it was really good! Positive stance in native and learner speech Sylive De Cock Centre for English Corpus Linguistics Université catholique.
E BERHARD- K ARLS- U NIVERSITÄT T ÜBINGEN SFB 441 Coordinate Structures: On the Relationship between Parsing Preferences and Corpus Frequencies Ilona Steiner.
Question paper 1997.
Engaging with data Choices and decisions. Seeing or looking at? The advance of corpus linguistics has certainly changed the way that we can look at our.
Introduction Chapter 1 Foundations of statistical natural language processing.
Communicative and Academic English for the EFL Professional.
Syntactic Annotation of Slovene Corpora (SDT, JOS) Nina Ledinek ISJ ZRC SAZU
 Chapter 4 Noun Phrases Transformational Grammar Engl 424 Hayfa Alhomaid.
Corpus search What are the most common words in English
Statistics for variationists - or - what a linguist needs to know about statistics Sean Wallis Survey of English Usage University College London
PSY 219 – Academic Writing in Psychology Fall Çağ University Faculty of Arts and Sciences Department of Psychology Inst. Nilay Avcı Week 9.
Genre and cultural purpose We recognize a genre when a text does something with language that we’re familiar with. Very often we are able state what kind.
10/31/00 1 Introduction to Cognitive Science Linguistics Component Topic: Formal Grammars: Generating and Parsing Lecturer: Dr Bodomo.
Capturing patterns of linguistic interaction in a parsed corpus A methodological case study Sean Wallis Survey of English Usage University College London.
E303 Part II The Context of Language Research
What is linguistics?.
Grammar Workshop Thursday 9th June.
A CORPUS-BASED STUDY OF COLLOCATIONS OF HIGH-FREQUENCY VERB —— MAKE
An ICALL writing support system tunable to varying levels
Survey of English Usage University College London
Linguistic aspects of interlanguage
Presentation transcript:

Capturing patterns of linguistic interaction in a parsed corpus A methodological case study Sean Wallis Survey of English Usage University College London

Capturing linguistic interaction... Parsed corpus linguistics Intra-structural priming Experiments –Attributive AJPs before a noun –Embedded postmodifying clauses –Sequential postmodifying clauses –Speech vs. writing Conclusions The handout explains the analytical method in more detail (so read it later!)

Parsed corpus linguistics An example tree from ICE-GB (spoken) S1A-006 #23

Parsed corpus linguistics Three kinds of evidence may be obtained from a parsed corpus  Frequency evidence of a particular known rule, structure or linguistic event  Coverage evidence of new rules, etc.  Interaction evidence of the relationship between rules, structures and events This evidence is necessarily framed within a particular grammatical scheme –How might we evaluate this grammar?

Intra-structural priming Priming effects within a structure –Study repeating an additive step in structures Consider –a phrase or clause that may (in principle) be extended ad infinitum e.g. an NP with a noun head N

Intra-structural priming Priming effects within a structure –Study repeating an additive step in structures Consider –a phrase or clause that may (in principle) be extended ad infinitum e.g. an NP with a noun head –a single additive step applied to this structure e.g. add an attributive AJP before the head N AJP

Intra-structural priming Priming effects within a structure –Study repeating an additive step in structures Consider –a phrase or clause that may (in principle) be extended ad infinitum e.g. an NP with a noun head –a single additive step applied to this structure e.g. add an attributive AJP before the head –Q. What is the effect of repeatedly applying this operation to the structure? ship N N AJP

Intra-structural priming Priming effects within a structure –Study repeating an additive step in structures Consider –a phrase or clause that may (in principle) be extended ad infinitum e.g. an NP with a noun head –a single additive step applied to this structure e.g. add an attributive AJP before the head –Q. What is the effect of repeatedly applying this operation to the structure? ship NAJP tall N AJP

Intra-structural priming Priming effects within a structure –Study repeating an additive step in structures Consider –a phrase or clause that may (in principle) be extended ad infinitum e.g. an NP with a noun head –a single additive step applied to this structure e.g. add an attributive AJP before the head –Q. What is the effect of repeatedly applying this operation to the structure? ship NAJP very greentall AJP N

Intra-structural priming Priming effects within a structure –Study repeating an additive step in structures Consider –a phrase or clause that may (in principle) be extended ad infinitum e.g. an NP with a noun head –a single additive step applied to this structure e.g. add an attributive AJP before the head –Q. What is the effect of repeatedly applying this operation to the structure? ship NAJP very greentall AJP N old

Experiment 1: analysis of results Sequential probability analysis –calculate probability of adding each AJP –error bars: Wilson intervals –probability falls second < first third < second –decisions interact –Every AJP added makes it harder to add another probability

Experiment 1: explanations? Feedback loop: for each successive AJP, it is more difficult to add a further AJP  logical-semantic constraints tend to say the tall green ship do not tend to say tall short ship or green tall ship  communicative economy once speaker said tall green ship, tends to only say ship  memory/processing constraints unlikely: this is a small structure, as are AJPs

Experiment 1: speech vs. writing Spoken vs. written subcorpora –Same overall pattern –Spoken data tends to have fewer attributive AJPs Support for communicative economy or memory/processing hypotheses? –Significance tests Paired 2x1 Wilson tests (Wallis 2011) first and second observed spoken probabilities are significantly smaller than written probability written spoken

Experiment 2: preverbal AVPs Consider adverb phrases before a verb –Results very different Probability does not fall significantly between first and second AVP Probability does fall between third and second AVP –Possible constraints (weak) communicative (weak) semantic –Further investigation needed probability

Experiment 3: postmodifying clauses Another way to specify nouns in English –add clause after noun to explicate it the ship [that was in the port] the ship [called Ariadne] –may be embedded the ship [that was in the port [we visited last week]] –or successively postmodified the ship [called Ariadne][that was in the port]

Experiment 3: (i) embedding Probability of adding a further embedded postmodifying clause falls with size –All data second < first third < first –Spoken second < first –Written third < second Compare with effect of sequential postmodification of same head

Experiment 3: (ii) sequential Probability of sequential postmodifying falls - and - for spoken data, falls, then rises –All data second < first –Spoken third > second

Experiment 3: (ii) sequential Probability of sequential postmodifying falls - and - for spoken data, falls, then rises –All data second < first –Spoken third > second –Option: count conjoins separately or treat as single item Either way, results show similar pattern –Negative feedback: the ‘in for a penny’ effect

Experiment 3: (iii) embed vs. seq Embedded vs. sequential postmodification embedding > sequence (second level) –It is slightly easier to modify the latest head than a more remote one: semantic constraints? backtracking cost? –Third level embedding < sequence (if counting conjoins) long sequences seem to be easier to construct than comparable layers of embedding

Conclusions A method for evaluating interactions along grammatical axes –General purpose, robust, structural –More abstract than ‘linguistic choice’ experiments –Depends on a concept of grammatical distance along an axis, based on the chosen grammar Method has philosophical implications –Grammar viewed as outcome of linguistic choices –Linguistics as an evaluable observational science Signature (trace) of language production decisions –A unification of theoretical and corpus linguistics?

Potential applications Corpus linguistics –Optimising existing grammatical framework e.g. coordination, compound nouns –Comparing genres/languages/periods Theoretical linguistics –Comparing different grammars, same language Psycholinguistics –Search for evidence of language production constraints in spontaneous speech corpora speech and language therapy language acquisition and development

References Nelson, G., Wallis, S. & Aarts, B. (2002) Exploring natural language. Benjamins. Pickering, M. & Ferreira, V. (2008) Structural priming. Psychological Bulletin 134, 427–459. Wallis, S.A. (2011) Comparing χ² tests for separability. Survey of English Usage. For explanation of the analysis method see the handout! For more detail and a draft of the full paper see