Corpora in grammatical studies

Slides:



Advertisements
Similar presentations
Haiyang Ai, Gong Peng Graduate University, Chinese Academy of Sciences
Advertisements

Diachronic study and language change Corpus Linguistics Richard Xiao
Corpora in language variation studies
Corpus Linguistics Richard Xiao
Corpora in lexical studies
Using an enhanced MDA model in study of World Englishes
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING A comparative study of the tagging of adverbs in modern English corpora.
An investigation into Corpus-based learning about language inin the primary-school: CLLIP Corpus evidence of the features of childrens literature.
First year undergraduate courses in Language and Linguistics Louise Mullany School of English Studies University of Nottingham 29th October 2004 Subject.
Grammar: Meaning and Contexts * From Presentation at NCTE annual conference in Pittsburgh, 2005.
Introduction to General Linguistics
ST-TT Analysis Descriptive-explanatory approaches.
IA Session One Introductions Phonetics. Introductions Please follow the instructions on the piece of paper that Mark gives you. Be prepared to.
ENG 626 CORPUS APPROACHES TO LANGUAGE STUDIES language teaching (1) Bambang Kaswanti Purwo
Rita Juknevičienė Department of English Philology Vilnius University
Why study grammar? Knowledge of grammar facilitates language learning
Diachronic study and language change Corpus Linguistics Richard Xiao
Uses of a Corpus “[E]xplore actual patterns of language use”
Lengua Inglesa II Grammar Topics Tom Morton IV bis 205
Introduction: A discourse perspective on grammar
Using an Enhanced MDA Model in study of World Englishes Richard Xiao
Recent Developments in Technological Tools for the Purpose of Facilitating SLA.
What is a corpus?* A corpus is defined in terms of  form  purpose The word corpus is used to describe a collection of examples of language collected.
1/26 Corpus Linguistics. 2/26 Varieties of English Relevance of corpus linguistics to this course –Previously studies of stylistics were largely informal.
LELA English Corpus Linguistics
1/23 LELA Lecture 2 Corpus-based research in Linguistics See esp. Meyer pp
Corpus Linguistics Lexicography. Questions for lexicography in corpus linguistics How common are different words? How common are the different senese.
Presented by Jennifer Robison TexTESOL II March 12, 2010 San Antonio, TX.
1. Introduction Which rules to describe Form and Function Type versus Token 2 Discourse Grammar Appreciation.
A Brief Introduction to Stylistics By:Dr.K.T.KHADER
Corpus Linguistics Case study 2 Grammatical studies based on morphemes or words. G Kennedy (1998) An introduction to corpus linguistics, London: Longman,
Chapter 3: An Introduction to Corpus Linguistics Compiled by: Sajjad Ghadamyari Farhad Ghiasvand Presentation Date: Dec. 8, Monday.
Memory Strategy – Using Mental Images
Albert Gatt LIN 3098 Corpus Linguistics. In this lecture Some more on corpora and grammar Construction Grammar as a theoretical framework Collostructional.
Teaching Stance -taking in Academic Writing Second Language Writing Interest Section “Designing academic writing tasks using corpus findings” TESOL 2008.
McEnery, T., Xiao, R. and Y.Tono Corpus-based language studies. Routledge. Unit A 2. Representativeness, balance and sampling (pp13-21)
U SING C ORPUS - BASED R ESEARCH FOR L ANGUAGE T EACHING AND L EARNING ENGLISH 510 Hee Sung (Grace) Jun & Kimberly LeVelle.
Representatıvness, balance and samplıng ın a corpus Lınguistıcs.
Linguistics and Language
Researching language with computers Paul Thompson.
GREENBAUM, S & QUIRK, R. (1990) A
Corpus linguistics and language teaching The next nexus? Doug Biber Northern Arizona University.
Adverbials Chapter 11 Longman Student Grammar of Spoken and Written English Biber; Conrad; Leech (2009, p )
SPEECH AND WRITING. Spoken language and speech communication In a normal speech communication a speaker tries to influence on a listener by making him:
How Can Corpora Help Me To Be Successful in CO150?
Translation Studies 9. The use of corpora in TS Krisztina Károly, Spring, 2006 Sources: Olohan, 2004; Tirkkonen-Condit, 2005.
Workshop: Corpus (1) What might a corpus of spoken data tell us about language? OLINCO 2014 Olomouc, Czech Republic, June 7 Sean Wallis Survey of English.
1 And yeah, it was really good! Positive stance in native and learner speech Sylive De Cock Centre for English Corpus Linguistics Université catholique.
Corpus approaches to discourse
1 Branches of Linguistics. 2 Branches of linguistics Linguists are engaged in a multiplicity of studies, some of which bear little direct relationship.
New Englishes. Global English  ‘[…] the English language ceased to be the sole possession of the English some time ago’ (Rushdie, 1991)  Loss of ownership.
Corpus search What are the most common words in English
Corpus Linguistics MOHAMMAD ALIPOUR ISLAMIC AZAD UNIVERSITY, AHVAZ BRANCH.
What is a Corpus? What is not a corpus?  the Web  collection of citations  a text Definition of a corpus “A corpus is a collection of pieces of language.
Welcome to All S. Course Code: EL 120 Course Name English Phonetics and Linguistics Lecture 1 Introducing the Course (p.2-8) Unit 1: Introducing Phonetics.
LING 306.  Last week – informalisation  Ongoing change in spoken English  This week – written English  Is written English being “informalised” as.
CORPUS LINGUISTICS 1) A revision of corpus linguistics 2) Language corpora in the ESL/EFL classroom.
Approaches to teaching English The differences between EAP and General EFL Louis Rogers.
Use of Concordancers A corpus (plural corpora) – a large collection of texts, written or spoken, stored on a computer. A concordancer – a computer programme.
PRIMENJENA LINGVISTIKA I NASTAVA JEZIKA II 3 rd class.
Esther Daborn, Anneli Williams & Louis Harrison
CORPUS LINGUISTICS Corpus linguistics is the study of language as expressed in samples (corpora) or "real world" text. An approach to derive at a set of.
Textuality across linguistics and literature

The Grammatical Basis of Writing Development
Exploring the BNC Corpus
Introduction to Corpus Linguistics: Applications Lexicography
Intro to corpus linguistics: Data Driven Grammar
Corpus-Based ELT CEL Symposium Creating Learning Designers
If and only if…: a corpus-based investigation of lexical bundles use by expert and novice mathematics writers By Abdullah Alasmary Assistant professor.
Presentation transcript:

Corpora in grammatical studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Aims of this session Lecture Lab session Corpus-based grammar: Scope and principles The state of the art of using corpora in grammatical studies Using corpora to improve grammatical descriptions: Infinitival complementation of help Lab session Position of if-clauses in ICE-GB

Corpus revolution Like lexicographic and lexical studies, grammar is another area which has frequently exploited corpus data A balanced representative corpus provides a reliable basis for quantifying grammatical categories and syntactic features It is also useful in testing hypotheses derived from grammatical theory There has been increasing consensus that non-corpus-based grammars can contain biases while corpora can help to improve grammatical descriptions (McEnery & Xiao 2005) Corpora have had a strong influence on recently published reference grammar books (at least for English) ‘even people who have never heard of a corpus are using the product of corpus-based investigation’ (Hunston 2002: 96)

Principles of corpus grammar (Leech 2000) Data-oriented grammar allowing the combination of a quantitative and a qualitative description of the data a grammar accountable to observed data of attested language use Functional Grammar establishing a relation between phenomena that are external to the language system and system-internal phenomena (form vs. meaning) their explanation of grammar in terms of the wider context of human psychology and behaviour Variety Grammar allowing the description of the full range of varieties (e.g. conversation, fiction writing, news writing, academic writing) Integrative Grammar allowing an integrated description of syntactic, lexical, and discourse features close to communicative grammar as opposed to ‘autonomous syntax’ view of grammar

A new milestone in English grammar Longman Grammar of Spoken and Written English (i.e. LGSWE, Biber et al 1999) A new milestone following Quirk et al (1985) Comprehensive Grammar Based entirely on the 40-million-word Longman Spoken and Written English Corpus Giving “a thorough description of English grammar, which is illustrated throughout with real corpus examples, and which gives equal attention to the ways speakers and writers actually use these linguistic resources” (Biber et al 1999: 45)

Features of corpus-based grammars Paying attention to the differences in speech and writing Taking account of register/genre variations Providing frequency information Treating lexis as an integral part of grammatical description Giving authentic examples

Some examples of corpus grammars Corpus-based English grammars focusing on speech Carter, R. and McCarthy, M. (1997) Exploring Spoken English. Cambridge: Cambridge University Press. McCarthy, M. (1998) Spoken Language and Applied Linguistics. Cambridge: Cambridge University Press.

Some examples of corpus grammars Corpus-based grammars with a focus on lexis Francis, G., Hunston, S. and Manning, E. (1996) Collins COBUILD Grammar Patterns 1: Verbs. London: HarperCollins. Francis, G., Hunston, S. and Manning, E. (1998) Collins COBUILD Grammar Patterns 2: Nouns and Adjectives. London: HarperCollins. Hunston, S. and Francis, G. 2002. Pattern Grammar. Amsterdam: John Benjamins.

Some examples of corpus grammars Corpus-based grammar exploring taking account of register variation Biber, D., Johansson S., Leech G., Conrad S. and Finegan, E. (1999) Longman Grammar of Spoken and Written English. London: Longman.

A case study Using corpora to improve grammatical descriptions Infinitival complementation of HELP

A commonly used word In the 100-million-word BNC 245th most frequent word 529 instances per million words 72nd most frequent verb as a lemma

A verb with a distinctive syntax English has two main-clause verbs that can control either a full or a bare infinitive: dare and help (Biber et al 1999: 735) The choice between a full and bare infinitive is only available when dare is used as a lexical verb (as a modal verb, always followed by a bare infinitive) HELP is the only English verb that can control either a full or bare infinitive AND occur either with or without an intervening NP HELP to V Perhaps the book helped to prevent things from getting even worse. HELP NP to V I thought I could help him to forget. HELP V Savings can help finance other Community projects. HELP NP V We helped him get to his feet and into the chair. Dare can occur with or without an intervening NP, but it cannot control a bare infinitive when such an intervening NP is present Ernest <…> dared Archie to punch him in the stomach.

A unique verb of great interest A verb that has often been given prominence in textbooks, grammars and dictionaries E.g. Chalker (1984); Murphy (1985); Quirk et al (1972, 1985); Eastwood (1992); Biber et al (1999) A verb that has aroused much interest and debate Language variety Language change Register variation Semantic distinction Syntactic conditions

The corpora

Language variety: AmE vs. BrE Bare infinitives are much more common in AmE (cf. Biber et al 1999) 80% (AmE) vs. 52% (BrE) LL=23 (1 df), p<0.001 British preference for full infinitives You’re going to help me make to make a birthday cake for Jim remember. (BNC) A construction of American provenance, which has penetrated rapidly into BrE Zandvoort (1966): ‘except in American English, however, to help usually takes an infinitive with to’ No longer valid

Language change: 1961-1991 Changing labels for bare infinitives (OED,1933) “vulgar” -> (Vallins 1951) “not seriously questioned now…” -> (Mair 1995) “lost the informal ring” An increase in the proportions of bare infinitives over the three decades in both AmE and BrE AmE: 68% -> 82% (+14%) LL=10.6 (1 df), p=0.001 BrE: 22% -> 60% (+38%) LL=47.5 (1 df), p<0.001 A greater shift towards the use of bare infinitives in BrE because AmE was already more “tolerant” of bare infinitives in the 1960s

Spoken vs. written Bare infinitives are slightly more frequent in speech than in writing, in both AmE and BrE The differences are not statistically significant AmE: LL=2.71 (1 df), p=0.10 BrE: LL=2.16 (1 df), p=0.142 No predictable distribution pattern for bare infinitives in 15 written genres Common in some formal genres (e.g. official documents) but infrequent in other formal genres (e.g. academic writing)

Semantic distinction The debate has a long history Some “pre-corpus” arguments Wood (1962: 107-8): to ‘can be omitted only when the helper does some of the work, or shares in the activity jointly with the person that is helped’ – Wood’s “unacceptable” examples These tablets will help you sleep. But tablets do not sleep Writing out a poem will help you learn it. But writing does no learning According to Quirk et al (1972: 841), the choice ‘is conditioned by the subject’s involvement’ With a bare infinitive, ‘external help is called in’ With a full infinitive, ‘assistance is outside the action proper’

Semantic distinction Dixon (1991) Duffley (1992) Lu (1996: 813) John helped Mary eat the pudding John ate part of the pudding as Mary did John helped Mary to eat the pudding John fed the pudding to Mary Duffley (1992) A bare infinitive evokes helping as ‘direct or active involvement’ … help to V evokes help as a condition which enables the person being helped to realize the event Lu (1996: 813) When the subject of ‘help’ does not take part in the helping activity, the infinitive must take to The book helped me to see the truth. What do your intuitions tell you?

Semantic distinction Not reported in more recent corpus-based works (e.g. Longman 1993/1996; Collins 1995; Biber et al 1999) Quirk et al (1985) dropped the argument for semantic distinction Collins CoBuild Dictionary “If you help someone, you make it easier for them to do something, for example by doing part of the work for them or by giving them advice or money.” It is not always easy or even possible to make a distinction between whether or not the helper actually takes part in the helping activity Counter examples are abundant in corpora I help people stop smoking. (FLOB) oh it says if you have a dose last thing at night it helps you sleep. (BNC)

Syntactic condition: Intervening NP The previous claim (Lind 1983; Kjellmer 1985; Biber et al 1999) that an intervening NP increases the proportion of bare infinitives is only partly supported by our corpora Only valid in AmE, both written and spoken Unpredictable results, no statistical significance in BrE

Syntactic condition: Intervening adverbial Lind (1983) claims that ‘an intervening adverbial will preclude omission of to’ The whisky helped me not to stagger under this blow. This claim is ungrounded, esp. in AmE (CPSA) Some counter examples So, to help people not jump all over it as soon as they see it <…> (CPSA) <…> that would even help perhaps focus some of those responses. (CPSA) Mr. Clinton <…> also helped, to a much lesser degree, organize a huge march in Washington <…> (Frown) ...helping dramatically reduce poverty. (Time Magazine 2005/12/05) Now my daughter...is helping digitally restore the Disney films her grandfather worked on. (Time Magazine 2006/04/10)

Syntactic condition: to preceding help To preceding help is a decisive syntactic condition that encourages the omission of to (cf. Lind 1983; Kjellmer 1985; Biber et al 1999) HELP (lemma): 60% help (finite verb): 65% to help (infinitive): 88% (+23%) Consecutive repetition of to tends to be avoided on the grounds of euphony (cf. Lind 1983) They took on an estate manager and wine-maker to help run the business. (FLOB) A statistical norm, not categorical distinction In the BNC, to help V (2,161) is 17 times as frequent as to help to V (127)

Syntactic condition: Passive voice Palmer (1965: 169) observes that ‘passive occurs <…> only with to: They were helped to do it.’ All of the 9 instances of passivized HELP in our corpora take a full infinitive with no exception No instance of BE helped V is found in the whole BNC or the 100-million-word Time corpus of AmE Explanation (?): An analogy can be drawn between HELP and verbs such as MAKE, LET, SEE and HEAR: oC = bare infinitive The infinitive shifts from oC to sC in passive transformation So they should be made to bring their prices down. (BNC) So the authorities should make them (*to) bring their prices down. Pupils should be helped to investigate topics on their own. (BNC) Teachers should help pupils (to) investigate topics on their own.

Case study: A summary The choice of a full or bare infinitive following HELP is conditioned by a wide range of factors including, for example, language variety, language change, as well as various syntactic conditions Non-corpus-based grammars are likely to contain biased descriptions that do not accord with attested language use

Adverbial clauses: Position vs. semantic types Greenbaum and Nelson (1995)

Exploring if-clauses in ICE-GB One million words 500 samples (300 spoken + 200 written) Parsed corpus Position of if-clauses Clause initial position If it’s a really nice day we could walk. Clause-final position We could walk if it’s a really nice day. Reference Nelson, G., Wallis, S. and Aarts, B. (2002) Exploring Natural Language: Working with the British Component of ICE. Amsterdam: John Benjamins

ICEUP + Expand to see text categories

Fuzzy Tree Fragment (FTF) Press "Inset after" twice

“Edit Node” menu

Editing 1st node

Editing 2nd node

Editing 3rd node

Specifying word

Complete nodes with specified word clause (main) Adverbial clause introduced by the subordinator “if”

Specifying position (initial) Finally press "Start" Click on "First: Yes" for initial position; white linking line disappears

Results for initial position

Example of parse tree Parsing unit

Specifying position (final)

Results for final position

Example of parse tree

Frequencies of initial / final positions Initial position appears to be the “unmarked” position for if-clauses Initial position (886, 61.4%) Final position (556, 38.6%)

Written registers Greenbaum and Nelson's (1995) observation of conditional clause (64.8% for initial and 35.32% final) only applies to written registers

Spoken registers In the spoken data as a whole, the final position is preferred, though there is considerable internal variation. The more "formal" spoken registers (parliamentary debates, legal presentations and non-broadcast (scripted) speeches show a marked preference for the initial position.

ICE-GB: Ditransitive verbs