LELA English Corpus Linguistics

Slides:



Advertisements
Similar presentations
Corpora and Language Teaching mers/LELA30922/Language%20teaching.ppthttp://personalpages.manchester.ac.uk/staff/harold.so.
Advertisements

Uses of a Corpus “[E]xplore actual patterns of language use”
1 Corpora for all Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of Leeds and Sussex.
Corpus design See G Kennedy, Introduction to Corpus Linguistics, Ch.2
Introduction: A discourse perspective on grammar
L EARNERS ’ D ICTIONARY Deny A. Kwary
1 Analysing and teaching meaning (3) Analysing and teaching meaning (3) SSIS Lazio - Lesson 3 prof. Hugo Bowles January 2007.
Macrostructure  Front matter  Body  Appendices Jackson, Howard Lexicography: An Introduction. London: Routledge, p. 25.
Recent Developments in Technological Tools for the Purpose of Facilitating SLA.
Corpus Linguistics. What is corpus linguistics? Method / Theory in Linguistics Analysis of collections of texts (corpora) Verifying/ Strengthening or.
1/26 Corpus Linguistics. 2/26 Varieties of English Relevance of corpus linguistics to this course –Previously studies of stylistics were largely informal.
Using Corpora in Linguistics Introduction to WordSmith Tools for Beginners Íde O’Sullivan Regional Writing Centre
Corpus 01 Introduction Historical Review. Corpus Linguistics Linguists need evidence for theories. Evidences can be from intuition or introspection, experimentation.
Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.
1/23 LELA Lecture 2 Corpus-based research in Linguistics See esp. Meyer pp
Corpora and Language Teaching
Corpus Linguistics and Corpora. Corpus Corpus, plural Corpora A collection of linguistic data, either compiled as written texts or as a transcription.
The application of corpus analysis and concordance feedback to collegiate EFL writing Presenter: Wen-Shuenn Wu (Michael Wu) Chung Hua University, Hsinchu,
Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008 Corpus Linguistics and Language Education: Development and Utility of the Corpus of Cameroon English.
Presented by Jennifer Robison TexTESOL II March 12, 2010 San Antonio, TX.
1 Vocab Assessment & Corpora and Concordancing Major vocabulary assessment tools Major corpora and concordancers.
Corpus Linguistics What can a corpus tell us ? Levels of information range from simple word lists to catalogues of complex grammatical structures and.
Research methods in corpus linguistics Xiaofei Lu.
Corpus Linguistics Case study 2 Grammatical studies based on morphemes or words. G Kennedy (1998) An introduction to corpus linguistics, London: Longman,
Chapter 3: An Introduction to Corpus Linguistics Compiled by: Sajjad Ghadamyari Farhad Ghiasvand Presentation Date: Dec. 8, Monday.
English Corpora and Language Learning Tamás Váradi
Memory Strategy – Using Mental Images
CORPUS LINGUISTICS: AN INTRODUCTION Susi Yuliawati, M.Hum. Universitas Padjadjaran
McEnery, T., Xiao, R. and Y.Tono Corpus-based language studies. Routledge. Unit A 2. Representativeness, balance and sampling (pp13-21)
BTANT 129 w5 Introduction to corpus linguistics. BTANT 129 w5 Corpus The old school concept – A collection of texts especially if complete and self-contained:
Translation Studies 8. Research methods in Translation Studies Krisztina Károly, Spring, 2006 Sources: Károly, 2002; Klaudy, 2003.
Representatıvness, balance and samplıng ın a corpus Lınguistıcs.
Reflections on Using Corpora Data in EFL Teaching CHEN BO Chongqing Jiaotong University 2006.
1 Corpora: Annotating and Searching LING 5200 Computational Corpus Linguistics Martha Palmer.
Corpus Linguistics Lecture 1 Albert Gatt. Contact details  My  Drop me a line with queries etc, and.
Linguistics and Language
Researching language with computers Paul Thompson.
ENG 626 CORPUS APPROACHES TO LANGUAGE STUDIES introduction (02) Bambang Kaswanti Purwo
Linguistics & AI1 Linguistics and Artificial Intelligence Linguistics and Artificial Intelligence Frank Van Eynde Center for Computational Linguistics.
Why We Need Corpora and the Sketch Engine Adam Kilgarriff Lexical Computing Ltd, UK Universities of Leeds and Sussex.
Chapter 10 Language and Computer English Linguistics: An Introduction.
UCREL: from LOB to REVERE Paul Rayson. November 1999CSEG awayday Paul Rayson2 A brief history of UCREL In ten minutes, I will present a brief history.
Tracking Language Development with Learner Corpora Xiaofei Lu CALPER 2010 Summer Workshop July 12, 2010.
for Materials Design The Theory & Practice of Concordancing.
Using Corpora in Language Research Adam Kilgarriff Lexical Computing Ltd Universities of Leeds January 2013Adam Kilgarriff.
Seminar in Applied Corpus Linguistics: Introduction APLNG 597A Xiaofei Lu August 26, 2009.
How Can Corpora Help Me To Be Successful in CO150?
Enda F. Scott 2001 Good morning An introduction to modern dictionary making.
Practicing Problems and Thinking About Linguistics Billy Clark, Middlesex University, UK Linguistics Olympiad Summer Course Corpus Christi.
Colorado State University
Grammar is to Meaning as the Law if to Good Behaviour Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of Leeds and Sussex.
Corpus search What are the most common words in English
Learners' Dictionaries Oxford1948 Longman1978 Collins COBUILD1987 Macmillan2002 Macmillan2008 (bilingualized) Merriam-Webster2008 Jackson, Howard
Corpus Linguistics MOHAMMAD ALIPOUR ISLAMIC AZAD UNIVERSITY, AHVAZ BRANCH.
GCSE English Language 8700 GCSE English Literature 8702 A two year course focused on the development of skills in reading, writing and speaking and listening.
COGS Bilge Say1 Introduction to Corpora and Corpus Linguistics COGS 523-Lecture 1 General Introduction.
Using Corpora in TEFL By Terri Yueh. WhyWhy Work With Corpora? Why  From Vocabulary to Corpus  Choosing a Corpus Choosing a Corpus  Examples of Word.
What is a Corpus? What is not a corpus?  the Web  collection of citations  a text Definition of a corpus “A corpus is a collection of pieces of language.
CORPUS LINGUISTICS 1) A revision of corpus linguistics 2) Language corpora in the ESL/EFL classroom.
Use of Concordancers A corpus (plural corpora) – a large collection of texts, written or spoken, stored on a computer. A concordancer – a computer programme.
PRIMENJENA LINGVISTIKA I NASTAVA JEZIKA II 3 rd class.
E303 Part II The Context of Language Research
CORPUS LINGUISTICS Corpus linguistics is the study of language as expressed in samples (corpora) or "real world" text. An approach to derive at a set of.
Using Corpora in Linguistics
Computational and Statistical Methods for Corpus Analysis: Overview
Corpus Linguistics I ENG 617
عمادة التعلم الإلكتروني والتعليم عن بعد
Introduction to Corpus Linguistics: Applications Lexicography
Intro to corpus linguistics: Data Driven Grammar
Corpus-Based ELT CEL Symposium Creating Learning Designers
Presentation transcript:

LELA 30922 English Corpus Linguistics Harold Somers Professor of Language Engineering Office: Lamb 1.15

Syllabus

Assessment A practical project in which students will use the BNC (or other approved corpus material) to investigate some question of English language usage. Suggestion: base your project (more or less closely) on some existing study. Project write-up will include relevant background material and results and discussion of a corpus-based analysis. In other words: summarize (and criticize) the chosen study, then do your own version, and compare the results

Reading matter Main recommendations: Kennedy, G.D. (1998) An introduction to corpus linguistics. London: Longman. McEnery, T. & A. Wilson (2001, 2nd ed) Corpus linguistics. Edinburgh: Edinburgh University Press. Meyer, C. (2002) English corpus Linguistics: An introduction. Cambridge: Cambridge University Press. Lots of other books, focussing on particular aspects Do not ignore journals (Int J Corp Ling) and specialist conferences, especially when considering practical assignment. http://tinyurl.com/32abhb for list of resources available at UoM

What is a corpus? Corpus (pl. corpora) = ‘body’ Collection of written text or transcribed speech Usually but not necessarily purposefully collected Usually but not necessarily structured Usually but not necessarily annotated (Usually stored on and accessible via computer) Corpus ~ text archive

Computers and corpus linguistics Historically, manual analysis of large bodies of text (esp. in literary and biblical studies) Error-prone, time-consuming, not verifiable Computers have introduced Reliability, accuracy and replicability increased speed and capacity means you can do more on a grander scale new tools mean you can do things you might not have thought of doing

What is corpus linguistics? Not a branch of linguistics, like socio~, psycho~, … Not a theory of linguistics A set of tools and methods (and a philosophy) to support linguistic investigation across all branches of the subject

Evidence in linguistics Real attested usage as linguistic evidence Contrasts with introspective approach previously typical Relates to the competence~performance (langue~parole) distinction Corpus linguists often more interested in trends than rules (probabilities rather than certainties) Famous stories of corpus evidence contradicting widely-held assumptions about language use.

Activities in corpus linguistics Design and compilation of corpora Development of tools for corpus analysis Descriptive linguists using corpora to analyze lexical and grammatical behaviour of language, eg for lexicography Exploiting corpora in applied linguistics – language teaching, translation.

History of Corpus Linguistics www. essex. ac History of Corpus Linguistics www.essex.ac.uk/linguistics/clmt/w3c/corpus_ling/content/history.html Textual study has always included an element of counting and cataloguing, despite impracticalities – notably concordances of Shakespeare, the Bible, etc. Arrival of computers in 1950s of course changed everything

Brown corpus First modern computer-readable corpus W.N. Francis and H. Kucera, Brown University, Providence, RI one million words of American English texts printed in 1961 sampled from 15 different text categories used as model for other corpora, including …

LOB corpus compiled by researchers in Lancaster, Oslo and Bergen one million words of British English texts printed in 1961 sampled from same 15 text categories as Brown corpus All texts ≤ 2,000 words long Kolhapur corpus of Indian English compiled in 1978 to same sepcification

Chomsky’s criticisms Chomsky’s ideas drove linguists away from empiricism (data) towards rationalism (introspection) Chomsky switched focus onto abstract models of language competence He was especially scathing about corpus-based approaches Based on mistaken view that corpus linguists confused finiteness of data with finiteness of language See McEnery & Wilson, chapter 1

The London-Lund Corpus of Spoken English (LLC) First corpus of transcribed spoken language Part of Survey of Spoken English at Lund University under the direction of J. Svartvik 500,000 words of spoken British English recorded from 1953 to 1987 different categories, such as spontaneous conversation, spontaneous commentary, spontaneous and prepared oration

COBUILD 1m-word corpus too small for many applications 1980: Collins instigated collection of 20m-word corpus to support lexicographers writing new Collins Birmingham University International Learners’ Dictionary (John Sinclair) Now expanded to Bank of English corpus, 320m words and growing www.collins.co.uk/Corpus/CorpusSearch.aspx www.collins.co.uk/books.aspx?group=153

BNC (1995) http://www.natcorp.ox.ac.uk/ 100m word collection of written and spoken text from 1975-93 (already dated in some respects!) Carefully designed and balanced Corpus is closed (finite, synchronic) All text tagged to high quality Lots of tools available for exploration

etc. Many other corpus projects now underway, sometimes modelled on BNC or other well-known corpora Various national projects Specialized corpora Historical texts Learner English International English Translated English Spoken dialogues for certain domains When widely used, they become a kind of benchmark, eg Wall Street Journal corpus (treebank) This can have pros and cons