Download presentation
Presentation is loading. Please wait.
1
Harnessing Corpora for real and virtual ELT purposes IFELT Belinda Maia FLUP 10/11.2003
2
What is a corpus? CORPUS - 13c: from Latin corpus body - plural corpora) A body of texts, utterances or other specimens considered more or less representative of a language, stored as an electronic database. A corpus corpora may store many millions of running words A corpus can be tagged to identify and classify words and other formations A corpus can be searched using concordancing programmes
3
An example of concordancing (from the BNC) A0RA0R 2231 Maybe with twists of bacon. A35A35 256 This substantial, 15-minute orchestral movement was inspired by three paintings of Innocent X by Francis Bacon, themselves based on Velasquez. A6NA6N 1311 They could cook vegetables and meat simply, deal with eggs and bacon and porridge, and they were able to bake and housekeep, learning as they went along. AAXAAX 286 Sir Richard Body, MP Hirohito, shy god who liked bacon & eggs. ABBABB 67 Remembering bacon and ham, the versatility of the pig can be stretched to pies, sandwiches and ham, egg and chips. ABBABB 236 The Smoked Trout & Parma Ham Mousse (see p18) is merely decorated with slices of the ham and the Carbonnade of Beef is enriched by using diced ham instead of bacon.
4
An example of concordancing (with Wordsmith)
5
Tagging Example – courtesy Catherine Ball at: http://www.georgetown.edu/faculty/ballc/corpora/t utorial2.html#RTFToC16 http://www.georgetown.edu/faculty/ballc/corpora/t utorial2.html#RTFToC16 A01 2 ^ *'_*' stop_VB electing_VBG life_NN peers_NNS **'_**'._. A01 3 ^ by_IN Trevor_NP Williams_NP._. A01 4 ^ a_AT move_NN to_TO stop_VB \0Mr_NPT Gaitskell_NP from_IN A01 4 nominating_VBG any_DTI more_AP labour_NN A01 5 life_NN peers_NNS is_BEZ to_TO be_BE made_VBN at_IN a_AT meeting_NN A01 5 of_IN labour_NN \0MPs_NPTS tomorrow_NR._.
6
Types of Corpora Monolingual corpora - in which the texts are all in the same language Parallel and/or aligned corpora - in which originals and translations are aligned so that both texts appear on the screen together and you can see how the translator has translated the original. Comparable corpora - in which a selection of original texts has been made in two or more languages dealing with the same subject or genre.
7
Types of Corpora Specialized corpora - texts on specialized subjects for the extraction of terminology and complementary explanatory material - definitions, explanations etc. Concurrent corpora - used to describe texts taken from newspapers on the same subject on approximately the same dates. 'Do-it-yourself ' or ‘disposable’ corpora - small specialized corpora for the purpose of teaching translation or language
8
Corpora and Lexicography COBUILD = Collins Publishers + University of Birmingham – 1980s –Corpora work that revolutionised lexicography TODAY - All serious lexicography uses corpora - e.g. –Oxford English Dictionary http://www.oed.com/ http://www.oed.com/ –Academia das Ciências de Lisboa
9
Corpora & Grammar The Longman Grammars of English (Quirk, Greenbaum, Svartvik, Leech and others) –Based on corpora – the classical corpora now availableon CD-ROM through ICAME –http://www.hd.uib.no/icame.htmlhttp://www.hd.uib.no/icame.html BIBER, D., S. JOHANSSON, G. LEECH, S. CONRAD & E. FINEGAN. 1999. Longman Grammar of Spoken and Written English. Harlow: Pearson Education Ltd.
10
The corpora debate The bigger the corpus, the better The carefully chosen ‘representative’ corpora Chomsky > the average educated speaker was a better source Big corpora are not necessarily representative – e.g. The Hansard corpus Any selection of texts – is a selection
11
Yet Very Large corpora exist and are very useful Much research work nowadays is done with small selected corpora for studying: –different registers –special subjects
12
Using official corpora - EN British National Corpus at: http://sara.natcorp.ox.ac.uk/lookup.html - 50 examples of any word or expression for free on-line http://sara.natcorp.ox.ac.uk/lookup.html CD-ROM of 100 million words available The COBUILD project http://titania.cobuild.collins.co.uk/form.html http://titania.cobuild.collins.co.uk/form.html 40 Examples on-line
13
Using official corpora - PT AC/DC, CetemPúblico – Portuguese monolingual corpora COMPARA – aligned English/Portuguese corpus All at http://www.linguateca.pthttp://www.linguateca.pt
14
Language Learning/Teaching and corpora How can a language teacher use corpora? Why should a language learner need to know about corpora? What can be learnt?
15
How can a language teacher use corpora? The teacher can: – find an enormous amount of material for use in class, for exercises –check on real usage and compare it to textbooks used BUT: Must be aware that corpora sometimes prove the textbook wrong!
16
What can be learnt? Corpora as reference material for: –Lexical work –Syntactic study –Textual analysis –Observing language ‘in action’ –Learning about a wide variety of areas
17
The student Can be trained to search autonomously for information of all kinds –Finding texts that supply real knowledge –Finding texts that serve as models for style and register –Finding correct collocations of individual words
18
Do-it-yourself corpora Suggestion: Train students to make and use their own corpora by: –Collecting texts off the Internet –Using the ‘Find’ function in Word –Broadening their vocabulary
19
Useful sites Catherine N. Ball: Tutorial: Concordances and Corpora http://www.georgetown.edu/faculty/ballc/co rpora/tutorial.htmlhttp://www.georgetown.edu/faculty/ballc/co rpora/tutorial.html Tim John’s Data-driven learning at: http://web.bham.ac.uk/johnstf/ http://web.bham.ac.uk/johnstf/
20
Useful sites Concordance the whole Web at: http://www.webcorp.org.uk/ http://www.webcorp.org.uk/ And, of course, – Google at: http://www.google.com
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.