CORPUS LINGUISTICS 1) A revision of corpus linguistics 2) Language corpora in the ESL/EFL classroom
W HAT IS A CORPUS ? A corpus can be defined as a collection of texts assumed to be representative of a given language put together so that it can be used for linguistic analysis. Usually the assumption is that the language stored in a corpus is naturally-occurring, that is gathered according to explicit design criteria, with a specific purpose in mind, and with a claim to represent natural chunks of language selected according to specific typology Tognini-Bonelli (2001:2)
“nowadays the term 'corpus' nearly always implies the additional feature of 'machine- readable' ”. McEnery & Wilson, Corpus Linguistics. Online manual.
English language corpora: General vs. Specific
E NGLISH CORPORA : G ENERAL LANGUAGE CORPORA First generation corpora: - Brown Corpus of Written American English -Lancaster Oslo-Bergen of Written British English -500 texts of around 2000 words each -no spoken data -wide variety of written texts
E NGLISH CORPORA : G ENERAL LANGUAGE CORPORA Second generation corpora: -Bank of English -monitor corpus -both spoken and written text -different regional varieties of English -British National Corpus (BNC) -90 million written words -10 million spoken words -freely accessible: Mark Davies‘ interface
O THER TYPES OF E NGLISH LANGUAGE CORPORA -speech corpora: -sound recordings -SPOKEN ENGLISH CORPUS -detailed description of spoken phenomena: phonology, prosody (stress, tone units…), etc -multimedia corpora: -transcripts synchronised audio/video recordings -TALKBANK Website: SANTA BARBARACORPUS OF SPOKEN AMERICAN ENGLISH (SBCSAE)
space for our own annotation some mark- up for context audiovisual element
O THER TYPES OF E NGLISH LANGUAGE CORPORA -parsed corpora: -syntactically analysed - SURFACE AND UNDERLYING STRUCTURAL ANALYSES AND NATURALISTIC ENGLISH CORPUS ( SUSANNE ) -historical corpora: -English of earlier periods -may cover specific historical periods or genres -track and describe how language has evolved -A REPRESENTATIVE CORPUS OF HISTORICAL ENGLISH REGISTERS ( ARCHER )
O THER TYPES OF E NGLISH LANGUAGE CORPORA -specialised corpora: -focus on concrete genres/domains - BUSINESS LETTERS CORPUS ( BLC ) -lingua franca corpora: -ENGLISH AS A LINGUA FRANCA IN ACADEMIC SETTINGS ( ELFA ) CORPUS -intercultural exchanges among speakers who use English as a lingua franca
O THER TYPES OF E NGLISH LANGUAGE CORPORA -developmental language corpora: -non-adult English native speakers' output -not as proficient as native-speaker corpora - POLYTECHNIC OF WALES (POW) CORPUS -ESL/EFL learner corpora: -learners of English's output -one and the same L1 background or different mother tongues -JAPANESE EFL LEARNER CORPUS ( JEFLL )
W ORD S MITH : FLEXIBLE CORPUS -Computer program which permits users to compile their own corpus -Texts must be in.txt format -Any text can be subjected to the same process of analysis that official corpora undergo: concordance lines, word lists, etc -No need to pre-process such texts in advance
Corpus linguistics -Insights into the internal workings of real language -Knowledge in turn also used in other fields of enquiry -Planning, designing, compiling and tagging -Frequency lists and concordance lines (+further analysis) -Sinclair’s (2003) “degeneralisation”: -sceptical about 'received' descriptions - patterns found in the data: more precise or alternative descriptions -Corpus-based dictionaries and grammars -how lexis and grammar are “really” used - COLLINS COBUILD LEARNER'S DICTIONARY - THE LONGMAN GRAMMAR OF SPOKEN AND WRITTEN ENGLISH
CORPORA IN THE ESL/EFL CLASSROOM: PEDAGOGICAL FOUNDATIONS -Mixture between instructional and naturalistic LL -Fulfilment of both the input and output hypotheses -”Scaffolding” (though loosely speaking) -insights concerning English culture(s) -Student-centred and related to constructivism: mastering corpora = learning autonomy
C ORPUS - BASED ESL/EFL ACTIVITIES -Focus on lexis, grammar and register -introductory notions concerning collocation, colligation, and formal vs. informal -For already motivated students: BNC
Activity one: contractions, formal or informal? spoken or written? The key * ?’??
1 * ?’?? 2 3 4
Quotation marks!
Activities two and three: Corpora as a source of knowledge concerning collocation and colligation
[v*] mistakes
powerful, not strong!!! [aj*]
Activity four: meaning via collocations and co-text
For non-motivated students: WordSmith -Contact with the English language: input (at least lexis-wise) -Popular culture: MUSIC IN ENGLISH!!!
Activity one: music corpora, lexis, and the BNC for grammar accuracy
author corpus reference corpus
Select the text you want a list of
Save both lists to compare them with Keyword
author corpus list reference corpus list
That was all! The nightmare is over! Thank you for listening! ^.^