Corpus Linguistic Development with reference to Cameroon Prof. Dr Daniel A. Nkemleke Department of English Ecole Normale Supérieure University of Yaounde I
Humboldt "Kolleg" Nov , 2009 Prof. Dr Daniel A. Nkemleke2 Outline 1. Definition/Nature: Corpus Linguistics, Corpora 2. Development of Corpus Linguistics: History, Criticism 3. Some uses of Corpora 4. Some existing Corpora 5. The Cameroon Experience in corpus research 6. Achievements: publications, students’ projects 7. Conclusion: International collaborative efforts
Humboldt "Kolleg" Nov , 2009 Prof. Dr Daniel A. Nkemleke3 1.Definition: What is Corpus Linguistics? the study of language based on examples of “real life“ language use the study of language based on examples of “real life“ language use Basis analysis: Listing, Sorting, Counting of Concordances (KWIC) Basis analysis: Listing, Sorting, Counting of Concordances (KWIC) Complex analysis: Processing using complex programs (e.g. Complex Ana, WordSmith Tools) Complex analysis: Processing using complex programs (e.g. Complex Ana, WordSmith Tools)
Humboldt "Kolleg" Nov , 2009 Prof. Dr Daniel A. Nkemleke4 Definition, con‘t 4 main characteristics: 4 main characteristics: sampling and representativeness; sampling and representativeness; finite size; finite size; machine-readable form; machine-readable form; standard reference. standard reference.
Humboldt "Kolleg" Nov , 2009 Prof. Dr Daniel A. Nkemleke5 Definition con‘t: Sampling & representativeness Interest in whole variety of English Interest in whole variety of English Attempt to construct a “representative“ dadabase Attempt to construct a “representative“ dadabase Aim: Picture as accurate and reasonable as possible of a language population Aim: Picture as accurate and reasonable as possible of a language population Sampled to maximally represent variety
Humboldt "Kolleg" Nov , 2009 Prof. Dr Daniel A. Nkemleke6 Definition con‘t: Finite size Body of finite amount of words, e.g. 1,000,000 Body of finite amount of words, e.g. 1,000,000 Figure determined at beginning of project Figure determined at beginning of project Once grand total is reached, collection stops, no increase in size Once grand total is reached, collection stops, no increase in size Exceptions: Exceptions: monitor corpus: constant addition of texts monitor corpus: constant addition of texts e.g. London-Lund corpus, Birmingham corpus e.g. London-Lund corpus, Birmingham corpus
Humboldt "Kolleg" Nov , 2009 Prof. Dr Daniel A. Nkemleke7 Definition con‘t: Standard reference Silent understanding: corpus constitutes a standard reference Silent understanding: corpus constitutes a standard reference Presupposition: wide availability to other researchers Presupposition: wide availability to other researchers Direct comparison of new results Direct comparison of new results
Humboldt "Kolleg" Nov , 2009 Prof. Dr Daniel A. Nkemleke8 2. Development: History of Corpus Linguistics Before 1940s/1950s: “early corpus linguistics“ corpus-based methodology Before 1940s/1950s: “early corpus linguistics“ corpus-based methodology Several studies predating 1950s Several studies predating 1950s Language acquisition: child language (1870s- 1920s) Language acquisition: child language (1870s- 1920s) Diaries recording child‘s locutions Diaries recording child‘s locutions Primitive corpora Primitive corpora
Humboldt "Kolleg" Nov , 2009 Prof. Dr Daniel A. Nkemleke9 History of Corpus Linguistics : Criticisms Chomsky‘s theoretical criticism: Chomsky‘s theoretical criticism: Corpus = a collection of external utterances, thus a poor guide to modelling linguistic competence Corpus = a collection of external utterances, thus a poor guide to modelling linguistic competence Corpus as a source of evidence invalidated Corpus as a source of evidence invalidated
Humboldt "Kolleg" Nov , 2009 Prof. Dr Daniel A. Nkemleke10 History of Corpus Linguistics: Criticisms con‘t Further practical criticism: (Abercrombie: 1963) Further practical criticism: (Abercrombie: 1963) Time-consuming, prone to error Time-consuming, prone to error Unavailable data processing abilities Unavailable data processing abilities Corpus Linguistics largely abandoned
Humboldt "Kolleg" Nov , 2009 Prof. Dr Daniel A. Nkemleke11 History of Corpus Linguistics, con‘t: Revival Between 1950s and 1980s: minority of linguists continued working on corpus-based work (Quirk: SEU, Francis & Kucera: Brown corpus, Svartik: London-Lund corpus) Between 1950s and 1980s: minority of linguists continued working on corpus-based work (Quirk: SEU, Francis & Kucera: Brown corpus, Svartik: London-Lund corpus) Computer: major support for CL Computer: major support for CL computerized corpus: increase in computing facilities revival of CL computerized corpus: increase in computing facilities revival of CL
Humboldt "Kolleg" Nov , 2009 Prof. Dr Daniel A. Nkemleke12 3. Uses of corpora Corpora in speech research Corpora in speech research Corpora in lexical studies Corpora in lexical studies Corpora in the teaching of languages Corpora in the teaching of languages Importance due to empirical data “objective” statements on language
Humboldt "Kolleg" Nov , 2009 Prof. Dr Daniel A. Nkemleke13 Uses I: Corpora in Speech Research Importance of spoken corpus: Broad samples of speech generalizations about spoken language Broad samples of speech generalizations about spoken language Samples of naturalistic speech reflection of language in real life Samples of naturalistic speech reflection of language in real life
Humboldt "Kolleg" Nov , 2009 Prof. Dr Daniel A. Nkemleke14 Uses II: Corpora in Lexical Studies Quicker production and revision of dictionaries Quicker production and revision of dictionaries more complete and precise definitions more complete and precise definitions Keep informed about new words, changing meanings Keep informed about new words, changing meanings Call up word combinations, co-occurings words Call up word combinations, co-occurings words
Humboldt "Kolleg" Nov , 2009 Prof. Dr Daniel A. Nkemleke15 Uses III: Corpora in Teaching Real life language data for textbook examples Real life language data for textbook examples Critical look at existing language teaching material Critical look at existing language teaching material
Humboldt "Kolleg" Nov , 2009 Prof. Dr Daniel A. Nkemleke16 4. Some Existing Corpora Brown Corpus Brown Corpus Lancaster-Oslo/Bergen corpus (LOB) Lancaster-Oslo/Bergen corpus (LOB) London-Lund corpus London-Lund corpus International Corpus of English (ICE) International Corpus of English (ICE) British National Corpus (BNC) British National Corpus (BNC) ICE-EAST Africa (ICE-EA) ICE-EAST Africa (ICE-EA) Corpus of Cameroon English (CCE) Corpus of Cameroon English (CCE) ICE-CAM,on-going ICE-CAM,on-going
Humboldt "Kolleg" Nov , 2009 Prof. Dr Daniel A. Nkemleke17 Brown Corpus published in 1964: Brown University Standard Corpus of Present-Day American English by Francis/Kučera published in 1964: Brown University Standard Corpus of Present-Day American English by Francis/Kučera First computer corpus compiled for linguistic research First computer corpus compiled for linguistic research About 1,000,000 words “representative“ of American English texts About 1,000,000 words “representative“ of American English texts 500 samples of each 2,000 words 500 samples of each 2,000 words
Humboldt "Kolleg" Nov , 2009 Prof. Dr Daniel A. Nkemleke18 Lancaster-Oslo/Bergen corpus (LOB) corpus of written British English corpus of written British English University of Lancaster/University of Oslo/Norwegian Computing Centre for Humanities at Bergen University of Lancaster/University of Oslo/Norwegian Computing Centre for Humanities at Bergen 500 texts of 2,000 words each 500 texts of 2,000 words each
Humboldt "Kolleg" Nov , 2009 Prof. Dr Daniel A. Nkemleke19 British National Corpus 100,000,000 words from British English from the later part of 20th century, both spoken (10%) and written (90%) 100,000,000 words from British English from the later part of 20th century, both spoken (10%) and written (90%) carried out and managed by the BNC Consortium carried out and managed by the BNC Consortium Samples of about 45,000 words Samples of about 45,000 words
Humboldt "Kolleg" Nov , 2009 Prof. Dr Daniel A. Nkemleke20 5. Cameroon Experience: Corpus of Cameroon English About 1,000,000 words (all written) About 1,000,000 words (all written) Tagged, but not all verified Tagged, but not all verified Developer(s): Developer(s): - Prof. Dr Daniel Nkemleke & Colleagues - Group of research students Hosted in Yaounde and Tu-Chemnitz Hosted in Yaounde and Tu-Chemnitz Funding agencies: British Council + AvH Funding agencies: British Council + AvH
Humboldt "Kolleg" Nov , 2009 Prof. Dr Daniel A. Nkemleke21 Cameroon Experience: ICE- CAM, on-going Started in Feb Started in Feb Funded by AvH Funded by AvH Developer (s) Developer (s) - Prof. Dr Daniel Nkemleke - Group of research students Written (400,000 words) Written (400,000 words) On-going collection of texts for a sample of Cameroon Spoken Corpus (estimate: 600,000 words) On-going collection of texts for a sample of Cameroon Spoken Corpus (estimate: 600,000 words)
Humboldt "Kolleg" Nov , 2009 Prof. Dr Daniel A. Nkemleke22 Composition of ICE-CAM (written, 400,000 words) Students’ Essays Skills and Hobbies Examination scripts Editorials Social Letters Novels Business Letters Humanities (Academic) Humanities (Popular) Social Sciences (Academic) Social Sciences (Popular) Natural Science (Academic) Natural Science (Popular) Technology (Academic) Technology (Popular) Press Reports Administrative writing
Humboldt "Kolleg" Nov , 2009 Prof. Dr Daniel A. Nkemleke23 Composition of ICE-CAM (spoken, 600,000 words) Conversations (Private) Broadcast Interviews (Public) Legal cross Examination (Public) Class Lessons (Public) Broadcast Discussion (Public) Commentaries (Unscripted) Unscripted Speech (Unscripted) Demonstrations (Unscripted) Legal presentations (Unscripted) Broadcast News (Scripted) Broadcast Talks (Scripted) Non-broadcast Talks (Scripted)
Humboldt "Kolleg" Nov , 2009 Prof. Dr Daniel A. Nkemleke24 Students‘ contribution: Smaller specialized corpora Dissertations (BIS 2009) Students‘ Essays (Tayuh 2009) Students‘ Essays (Mafor 2009) Editorials (Fuh Che, 2009) Students‘ Essays (Pone 2010)
Humboldt "Kolleg" Nov , 2009 Prof. Dr Daniel A. Nkemleke25 6. Achievements: Some publications on the CCE (2008) “Milestones in the corpus of Cameroon English: research possibilities in an ESL Context”. Annals of the Faculty of Arts, Letters & Social Sciences. (Special edition: Festschrift in honour of Professor Paul Mbangwana University of Yaounde I Press, (2008) “Milestones in the corpus of Cameroon English: research possibilities in an ESL Context”. Annals of the Faculty of Arts, Letters & Social Sciences. (Special edition: Festschrift in honour of Professor Paul Mbangwana University of Yaounde I Press, (2008) “Modality in novice academic writing: the case of African and German university students”. Research in English & Applied Linguistics REAL 4: English Projects in Teaching and Research in Central Europe. Göttingen: Cuvillier, (2008) “Modality in novice academic writing: the case of African and German university students”. Research in English & Applied Linguistics REAL 4: English Projects in Teaching and Research in Central Europe. Göttingen: Cuvillier,
Humboldt "Kolleg" Nov , 2009 Prof. Dr Daniel A. Nkemleke26 Some publications Con‘t (2008) “Frequency and variety of if-constructions in Cameroon English”. English Studies and Language Teaching. Plzen: University of West Bohemia, (2008) “Frequency and variety of if-constructions in Cameroon English”. English Studies and Language Teaching. Plzen: University of West Bohemia, (2008) “Please-request in Cameroonian and Kenyan private (social) letters”. Discourse Interaction 1(2). Brno: Masaryk University, (2008) “Please-request in Cameroonian and Kenyan private (social) letters”. Discourse Interaction 1(2). Brno: Masaryk University, (2007) “Frequency and use of modals in Cameroon English and application to language education”. Indian Journal of Applied Linguistics, vol. 33, no. 1, (2007) “Frequency and use of modals in Cameroon English and application to language education”. Indian Journal of Applied Linguistics, vol. 33, no. 1,
Humboldt "Kolleg" Nov , 2009 Prof. Dr Daniel A. Nkemleke27 Some publications Con‘t (2005) “Must and Should in Cameroon English”. Nordic Journal for African Studies, vol. 14 no. 1, (2005) “Must and Should in Cameroon English”. Nordic Journal for African Studies, vol. 14 no. 1, (2004) “Context and function of Need and Be able to in Cameroon English”. Indian Journal of Applied Linguistics, vol. 12 no. 2, (2004) “Context and function of Need and Be able to in Cameroon English”. Indian Journal of Applied Linguistics, vol. 12 no. 2, (2004) “A corpus-based study of the modal verbs in Cameroonian and British English”. CASTALIA: Ibadan Journal of Multicultural & Multidisciplinary Studies, vol. no.19, (2004) “A corpus-based study of the modal verbs in Cameroonian and British English”. CASTALIA: Ibadan Journal of Multicultural & Multidisciplinary Studies, vol. no.19, 1-23.
Humboldt "Kolleg" Nov , 2009 Prof. Dr Daniel A. Nkemleke28 Some publications Con‘t (2001, with Paul Mbangwana) “The modals of obligation and necessity in Cameroon English”. CASTALIA: Ibadan Journal of Multicultural & Multidisciplinary Studies, Vol. 6, (2001, with Paul Mbangwana) “The modals of obligation and necessity in Cameroon English”. CASTALIA: Ibadan Journal of Multicultural & Multidisciplinary Studies, Vol. 6, (2008) Manual of information to accompany the corpus of Cameroon English. Department of English, Chemnitz University of Technology, Germany: 47 pages (2008) Manual of information to accompany the corpus of Cameroon English. Department of English, Chemnitz University of Technology, Germany: 47 pages
Humboldt "Kolleg" Nov , 2009 Prof. Dr Daniel A. Nkemleke29 Some publications Con‘t (2003) A corpus-based study of the modal verbs in Cameroon written English. Unpublished PhD thesis. University of Yaounde I. (2003) A corpus-based study of the modal verbs in Cameroon written English. Unpublished PhD thesis. University of Yaounde I.
Humboldt "Kolleg" Nov , 2009 Prof. Dr Daniel A. Nkemleke30 Forth-coming publications (2010 in press) “Frequent collocates and major senses of two prepositions in ESL and ENL corpora”. Language Forum.New Delhi. (2010 in press) “Frequent collocates and major senses of two prepositions in ESL and ENL corpora”. Language Forum.New Delhi. (2010 in press) “Variation in written discourse: comparing Cameroonian, East-African and British English on the basis of text corpora. Discourse and Text Studies. University of West Bohemia. (2010 in press) “Variation in written discourse: comparing Cameroonian, East-African and British English on the basis of text corpora. Discourse and Text Studies. University of West Bohemia.
Humboldt "Kolleg" Nov , 2009 Prof. Dr Daniel A. Nkemleke31 Forth-coming con’t (2010 in press) “The expression of modality in Cameroon English”. In: Achimbe, E. (ed.) Approaches to Cameroon English: Features, Structure, and New Perspectives (for the series Varieties of English around the World (John Benjamins) Edited by Edgar W. Schneider. (2010 in press) “The expression of modality in Cameroon English”. In: Achimbe, E. (ed.) Approaches to Cameroon English: Features, Structure, and New Perspectives (for the series Varieties of English around the World (John Benjamins) Edited by Edgar W. Schneider. (2010 in press, with Josef Schmied). “Prepositions in Cameroon and Kenyan English: corpus-linguistic comparisons of simplification and expressivity”. World Englishes: Problems-Properties-Prospects, to be published by John Benjamins (2010 in press, with Josef Schmied). “Prepositions in Cameroon and Kenyan English: corpus-linguistic comparisons of simplification and expressivity”. World Englishes: Problems-Properties-Prospects, to be published by John Benjamins
Humboldt "Kolleg" Nov , 2009 Prof. Dr Daniel A. Nkemleke32 Forth-coming con’t (2010, with Josef Schmied). “Reference, coherence and complexity in students’ academic writing: examples from Cameroon and East-Africa corpus”. Pragmatics, Language and Literature: A Festschrift for Efurisibina Adegbija. (2010, with Josef Schmied). “Reference, coherence and complexity in students’ academic writing: examples from Cameroon and East-Africa corpus”. Pragmatics, Language and Literature: A Festschrift for Efurisibina Adegbija.
Humboldt "Kolleg" Nov , 2009 Prof. Dr Daniel A. Nkemleke33 Corpus-based research supervised “The Semantics and Syntax of Syndetic Coordination in the Corpus of Cameroon English” (2005) “The Semantics and Syntax of Syndetic Coordination in the Corpus of Cameroon English” (2005) “Modality in Students’ Dissertations: A Corpus-based Approach” (2009) “Modality in Students’ Dissertations: A Corpus-based Approach” (2009) “The Expression of Stance in Newspaper Editorials” (2009) “The Expression of Stance in Newspaper Editorials” (2009) Conjunction Resources in Students’ Dissertation: A Corpus-based approach (2009) Conjunction Resources in Students’ Dissertation: A Corpus-based approach (2009)
Humboldt "Kolleg" Nov , 2009 Prof. Dr Daniel A. Nkemleke34 Corpus-based research supervised, con’t A Linguistic Study of Dissertation Abstracts (2009) A Linguistic Study of Dissertation Abstracts (2009) The Expression of Stance in Postgraduate Students’ Dissertations in Cameroon (for 2010) The Expression of Stance in Postgraduate Students’ Dissertations in Cameroon (for 2010) A Corpus-based Study of Directives in Students’ Essays (for 2010) A Corpus-based Study of Directives in Students’ Essays (for 2010)
Humboldt "Kolleg" Nov , 2009 Prof. Dr Daniel A. Nkemleke35 7. Conclusion: International collaborative efforts CCE affiliated to ICE (International Corpus of English) project, founded in 1988 CCE affiliated to ICE (International Corpus of English) project, founded in 1988 Membership: 17 countries Membership: 17 countries Australia, Cameroon, Canada, East Africa (Kenya, Tanzania), Fiji, Ghana, Great Britain, Ireland, Hong Kong, India, New Zealand, Nigeria, Philippines, Singapore, South Africa and the United States of America Australia, Cameroon, Canada, East Africa (Kenya, Tanzania), Fiji, Ghana, Great Britain, Ireland, Hong Kong, India, New Zealand, Nigeria, Philippines, Singapore, South Africa and the United States of America
Humboldt "Kolleg" Nov , 2009 Prof. Dr Daniel A. Nkemleke36 Conclusion: International collaborative efforts con’t Long standing links with Germany Long standing links with Germany (Technische Universiät Chemnitz, TUC) (Technische Universiät Chemnitz, TUC) 1999 visit Prof Dr Josef Schmied 1992 visit of 2 Cameroonian academics to TUC Avh Grant (Prof Dr Daniel Nkemleke) Nov-Dec, 2008 return visit of a German scientist/colleague to Cam.
Humboldt "Kolleg" Nov , 2009 Prof. Dr Daniel A. Nkemleke37 END THANK YOU FOR LISTENING !