Download presentation
Presentation is loading. Please wait.
Published byRosalind Carpenter Modified over 9 years ago
1
Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008 Corpus Linguistics and Language Education: Development and Utility of the Corpus of Cameroon English Daniel A. Nkemleke Department of English Ecole Normale Supérieure University of YaoundeI Outline Introduction: Corpus Linguistics, history Some (main) existing corpora Development of the Corpus of Cameroon English (CCE) Corpus utility with reference to the CCE Prospect
2
Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008 Corpus Linguistics and Language Education: Development and Utility of the Corpus of Cameroon English Daniel A. Nkemleke Department of English Ecole Normale Supérieure University of YaoundeI Plan Introduction: Corpus Linguistics, history Some (main) existing corpora Development of the Corpus of Cameroon English (CCE) Corpus utility with reference to the CCE Prospect
3
Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008 Introduction: what is Corpus Linguistics? The study of language based on examples of “real life“ language use, collected, stored and processed via computer Facilitated by the advent of computer technology (1960s) Latin: corpus (body): body of text any collection of more than one text, written or spoken
4
Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008 Introduction (con’t): brief history Before 1940s/1950s: “early corpus linguistics“ corpus-based methodology (“Primitive corpora?“) Between 1960s and 1980s: minority of linguists continued working on corpus-based work (Quirk: SEU, Francis & Kucera: Brown corpus, Svartik: London-Lund corpus) Computer technology: major support for CL First African Corpus: 1989 (ICE-East Africa) (Schmied 1989) Second African Corpus: 1992 CCE (Tiamajou 1993)/ Nigeria??
5
Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008 Introduction (con’t): brief history “Thirty years ago when this research started it was considered impossible to process texts of several million words in length. Twenty years ago it was considered marginally possible but lunatic. Ten years ago it was considered quite possible but still lunatic. Today it is very popular“ (Thomas/Short 1996: 4)
6
Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008 Some (main) existing corpora L1 Corpora Brown Corpus of American English Lancaster-Oslo/Bergen Corpus (LOB) London-Lund Corpus British National Corpus (BNC) Birmingham Corpus of British English L2 Corpora ICE-East Africa (Kenya & Tanzania) Corpus of Cameroon English Corpus of Nigerian English ?? Kolhapur Corpus of Indian English Multinational Corpus Project International Corpus of English (ICE)
7
Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008 4 main characteristics of a corpus 1. Sampling & representativeness Interest in whole variety of English Attempts to construct a “representative” sample corpus Which maximally represents variety Aim: picture as accurate and reasonable as possible of a language population
8
Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008 Four main characteristic of a corpus (Con‘t) 2. Finite size Body of finite amount of words, e.g. 1,000,000 Figure determined at beginning of project monitor corpus: constant addition of texts
9
Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008 Four main characteristics of a corpus (con‘t) 3. Machine-readable form Past: reference to printed text Nowadays: implication, machine-redable Few in book form (e.g. original London-Lund) Occasionally other forms of media (microfiche, recordings)
10
Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008 Four main characteristics of a corpus (con‘t) 4. Standard reference Tacitly a corpus constitutes a standard reference Presupposition: wide availability to other researchers Direct comparison of results with other varieties
11
Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008 Development of the Corpus of Cameroon English (CCE) Began in 1992 with the collaboration of two British universities (Birmingham/Liverpool) Assistance of the British council in Yaoundé Target of a million words reached in 1994 Data use for classroom activities/research since then 2005: project benefited from a grant of the AvH → Goal: Further development (tagging) of the database (TU-Chemnitz)
12
Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008 Objective Provide authentic data for the description of the main features and problems inherent in the variety of English which is written in Cameroon Provide a source of authentic material for English language teaching/learning in Cameroon Serve as a database for comparative studies on CamE in relation to other varieties of English
13
Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008 Text categories: written component Text categoriesNo. of textsNo. of words A: Official Press257126,539 B: Private Press4249,098 C: Novels & Short Stories2177,096 D: Religion1996,380 E: Tourism526,881 F: Official letters7712,285 G: Private letters25079,386 H: Students’ Essays83137,399 I: Government Memos1671,368 J: Advertisement104,875 K: Miscellaneous22139,247 TOTAL802820,554
14
Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008 Text categories: spoken component Dialogues 1. Conversations 2. Phone calls 3. Broadcast discussions 4. Classroom lessons 5. Interviews 6. Parliamentary debates 7. Legal cross- examination 8. Business transactions Monologues 1. Commentaries 2. Demonstrations 3. Legal Presentations 4. Broadcast News 5. Broadcast Talks 6. Non-broadcast Talks
15
Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008 Corpus utility with reference to CCE 13 possible ways in which a corpus may be useful 1. Corpora as a source of empirical data 2. Corpora in language teaching and learning 3. Corpora in Lexical studies 4. Corpora in grammar studies 5. Corpora in speech research 6. Corpora and semantic studies 7. Corpora in pragmatic and discourse studies 8. Corpora in sociolinguistic studies 9. Corpora and stylistic studies 10. Corpora in historical linguistics 11. Corpora in dialectology and variational studies 12. Corpora in Psycholinguistics 13. Corpora in cultural studies
16
Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008 1. Corpus as a source of empirical data Linguists can make more objective statements on language use in the variety, comparing other varieties Nkemleke /Mbangwana (2001) Nkemleke (2003) Nkemleke (2004a, 2004b) Nkemleke (2005) Nkemleke(2006) Nkemleke (2007a, 2007b) Nkemleke(fc: 2008a, 2008b, 2008c) Schmied/Nkemleke (fc:2008a, 2008b) A number of post-graduate projects in ENS/Faculty
17
Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008 2. Corpora in language teaching/learning CCE data used for classroom activities over the years
18
Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008 Concordances : arrive _ NP (Simplification)
19
Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008 Value of concordances Support teachers’ classroom explanation Learner’s as researchers Data-driven learning Critical look at existing language teaching material
20
Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008 Natural data for textbook CCE data used for studies on aspects of Cameroon English usage, E.g. Hans-Georg Wolf used data from the corpus in his book English in Cameroon, published in 2001 by Mouton de Grouter (Berlin/New York).
21
Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008 3. Corpora in Lexical Studies Keep informed about new words, changing meanings Call up word combinations, co-occurring words
22
Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008 Prospect ICE-Cameroon is on-going Future possibility of more specialized corpora E.g. Academic texts, Fiction
23
Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008 END Thank You!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.