Presentation is loading. Please wait.

Presentation is loading. Please wait.

LELA English Corpus Linguistics

Similar presentations


Presentation on theme: "LELA English Corpus Linguistics"— Presentation transcript:

1 LELA 30922 English Corpus Linguistics
Harold Somers Professor of Language Engineering Office: Lamb 1.15

2 Syllabus

3 Assessment A practical project in which students will use the BNC (or other approved corpus material) to investigate some question of English language usage. Suggestion: base your project (more or less closely) on some existing study. Project write-up will include relevant background material and results and discussion of a corpus-based analysis. In other words: summarize (and criticize) the chosen study, then do your own version, and compare the results

4 Reading matter Main recommendations:
Kennedy, G.D. (1998) An introduction to corpus linguistics. London: Longman. McEnery, T. & A. Wilson (2001, 2nd ed) Corpus linguistics. Edinburgh: Edinburgh University Press. Meyer, C. (2002) English corpus Linguistics: An introduction. Cambridge: Cambridge University Press. Lots of other books, focussing on particular aspects Do not ignore journals (Int J Corp Ling) and specialist conferences, especially when considering practical assignment. for list of resources available at UoM

5 What is a corpus? Corpus (pl. corpora) = ‘body’
Collection of written text or transcribed speech Usually but not necessarily purposefully collected Usually but not necessarily structured Usually but not necessarily annotated (Usually stored on and accessible via computer) Corpus ~ text archive

6 Computers and corpus linguistics
Historically, manual analysis of large bodies of text (esp. in literary and biblical studies) Error-prone, time-consuming, not verifiable Computers have introduced Reliability, accuracy and replicability increased speed and capacity means you can do more on a grander scale new tools mean you can do things you might not have thought of doing

7 What is corpus linguistics?
Not a branch of linguistics, like socio~, psycho~, … Not a theory of linguistics A set of tools and methods (and a philosophy) to support linguistic investigation across all branches of the subject

8 Evidence in linguistics
Real attested usage as linguistic evidence Contrasts with introspective approach previously typical Relates to the competence~performance (langue~parole) distinction Corpus linguists often more interested in trends than rules (probabilities rather than certainties) Famous stories of corpus evidence contradicting widely-held assumptions about language use.

9 Activities in corpus linguistics
Design and compilation of corpora Development of tools for corpus analysis Descriptive linguists using corpora to analyze lexical and grammatical behaviour of language, eg for lexicography Exploiting corpora in applied linguistics – language teaching, translation.

10 History of Corpus Linguistics www. essex. ac
History of Corpus Linguistics Textual study has always included an element of counting and cataloguing, despite impracticalities – notably concordances of Shakespeare, the Bible, etc. Arrival of computers in 1950s of course changed everything

11 Brown corpus First modern computer-readable corpus
W.N. Francis and H. Kucera, Brown University, Providence, RI one million words of American English texts printed in 1961 sampled from 15 different text categories used as model for other corpora, including …

12 LOB corpus compiled by researchers in Lancaster, Oslo and Bergen
one million words of British English texts printed in 1961 sampled from same 15 text categories as Brown corpus All texts ≤ 2,000 words long Kolhapur corpus of Indian English compiled in 1978 to same sepcification

13 Chomsky’s criticisms Chomsky’s ideas drove linguists away from empiricism (data) towards rationalism (introspection) Chomsky switched focus onto abstract models of language competence He was especially scathing about corpus-based approaches Based on mistaken view that corpus linguists confused finiteness of data with finiteness of language See McEnery & Wilson, chapter 1

14 The London-Lund Corpus of Spoken English (LLC)
First corpus of transcribed spoken language Part of Survey of Spoken English at Lund University under the direction of J. Svartvik 500,000 words of spoken British English recorded from 1953 to 1987 different categories, such as spontaneous conversation, spontaneous commentary, spontaneous and prepared oration

15 COBUILD 1m-word corpus too small for many applications
1980: Collins instigated collection of 20m-word corpus to support lexicographers writing new Collins Birmingham University International Learners’ Dictionary (John Sinclair) Now expanded to Bank of English corpus, 320m words and growing

16 BNC (1995) http://www.natcorp.ox.ac.uk/
100m word collection of written and spoken text from (already dated in some respects!) Carefully designed and balanced Corpus is closed (finite, synchronic) All text tagged to high quality Lots of tools available for exploration

17 etc. Many other corpus projects now underway, sometimes modelled on BNC or other well-known corpora Various national projects Specialized corpora Historical texts Learner English International English Translated English Spoken dialogues for certain domains When widely used, they become a kind of benchmark, eg Wall Street Journal corpus (treebank) This can have pros and cons


Download ppt "LELA English Corpus Linguistics"

Similar presentations


Ads by Google