Using Corpora in Linguistics

Slides:



Advertisements
Similar presentations
Concordancing at Upper-Intermediate Levels What it is not What you will get from this talk.
Advertisements

Building up Corpus of Technical Vocabulary – Strategies and Feasibility Presenters: Dr. Aparna Palle, Preetha Anthony GNITS, HYDERABAD.
Uses of a Corpus “[E]xplore actual patterns of language use”
Using Corpus Tools in Discourse Analysis Discourse and Pragmatics Week 12.
Recycling Writing: learning from a corpus of student-generated texts Megan Bruce & Simon Rees Durham University Foundation Centre March 2013 Supported.
Recent Developments in Technological Tools for the Purpose of Facilitating SLA.
What is a corpus?* A corpus is defined in terms of  form  purpose The word corpus is used to describe a collection of examples of language collected.
Compiling a corpus II. Corpus A finite size, non random collection of naturally occurring language, in a computer readable form. Non-random = representative.
Approaches to Using Literature in the classroom. Definition Literature means those novels, short stories, plays and poems which convey their message by.
Corpus Linguistics. What is corpus linguistics? Method / Theory in Linguistics Analysis of collections of texts (corpora) Verifying/ Strengthening or.
1/26 Corpus Linguistics. 2/26 Varieties of English Relevance of corpus linguistics to this course –Previously studies of stylistics were largely informal.
Using Corpora in Linguistics Introduction to WordSmith Tools for Beginners Íde O’Sullivan Regional Writing Centre
Data-Driven South Asian Language Learning SALRC Pedagogy Workshop June 8, 2005 J. Scott Payne Penn State University
LELA English Corpus Linguistics
Using Corpora in Linguistics
The application of corpus analysis and concordance feedback to collegiate EFL writing Presenter: Wen-Shuenn Wu (Michael Wu) Chung Hua University, Hsinchu,
CALL – computer assisted language learning A short course delivered by Dr. Klaus Schwienhorst. MITE January 2002.
Corpus Linguistics: session 2 Corpus Linguistics (2): The Tools of the Trade 669o4zt
Presented by Jennifer Robison TexTESOL II March 12, 2010 San Antonio, TX.
1 Vocab Assessment & Corpora and Concordancing Major vocabulary assessment tools Major corpora and concordancers.
Corpus Linguistics What can a corpus tell us ? Levels of information range from simple word lists to catalogues of complex grammatical structures and.
Research methods in corpus linguistics Xiaofei Lu.
Memory Strategy – Using Mental Images
CORPUS LINGUISTICS: AN INTRODUCTION Susi Yuliawati, M.Hum. Universitas Padjadjaran
McEnery, T., Xiao, R. and Y.Tono Corpus-based language studies. Routledge. Unit A 2. Representativeness, balance and sampling (pp13-21)
Online Corpora in L2 Writing Class Zawan Al Bulushi Indiana University Bloomington November 15,
Using corpora for bespoke language teaching
 What is the BNC?  What is Xaira?  How to use the BNC for: › Language teaching and learning › Research.
Researching language with computers Paul Thompson.
©2006 Barry Natusch Tools for Language Researchers Barry Natusch “ Man is a tool-using animal. Without tools he is nothing, with tools he is all. ” - Thomas.
Class 3 Corpora in language teaching. Current trends in FLT  Communicative Language Teaching  Trends within CLT authentic language contextualised language.
ENG 626 CORPUS APPROACHES TO LANGUAGE STUDIES exploring frequencies in texts Bambang Kaswanti Purwo
Tracking Language Development with Learner Corpora Xiaofei Lu CALPER 2010 Summer Workshop July 12, 2010.
Corpora and Concordancers in ESL/EFL Class: Truly Authentic Language for Language Learning. and opening.
Seminar in Applied Corpus Linguistics: Introduction APLNG 597A Xiaofei Lu August 26, 2009.
Corpus approaches to discourse
Corpus Linguistics in Research Doctorate in Education University of Warwick 6th November 2008.
Colorado State University
Corpus Linguistics MOHAMMAD ALIPOUR ISLAMIC AZAD UNIVERSITY, AHVAZ BRANCH.
What is a Corpus? What is not a corpus?  the Web  collection of citations  a text Definition of a corpus “A corpus is a collection of pieces of language.
INTRODUCTION TO APPLIED LINGUISTICS
CORPUS LINGUISTICS 1) A revision of corpus linguistics 2) Language corpora in the ESL/EFL classroom.
Use of Concordancers A corpus (plural corpora) – a large collection of texts, written or spoken, stored on a computer. A concordancer – a computer programme.
PRIMENJENA LINGVISTIKA I NASTAVA JEZIKA II 3 rd class.
Developing EAP reading materials for teaching and publication
Corpus Linguistics Anca Dinu February, 2017.
Introduction to Corpus Linguistics
CORPUS LINGUISTICS Corpus linguistics is the study of language as expressed in samples (corpora) or "real world" text. An approach to derive at a set of.
Vocabulary acquisition in language classrooms
Textuality across linguistics and literature
AntConc is a freeware, multiplatform of application suitable for all types of users

Computational and Statistical Methods for Corpus Analysis: Overview
Exploring the BNC Corpus
Corpus Linguistics I ENG 617
عمادة التعلم الإلكتروني والتعليم عن بعد
Topics in Linguistics ENG 331
Introduction to Corpus Linguistics: Dispersion/concordance plots
Introduction to Corpus Linguistics: Key Word Analysis
PALC 2005 Łódź, Poland My Concordancer
Corpora and Concordancers in ESL/EFL Class:
Corpus-Based ELT CEL Symposium Creating Learning Designers
Corpus Linguistics I ENG 617
English Courses in Grades 11 &12
A Search for Discipline-Specific Vocabulary
(word formation: follow up)
Using GOLD to Tracking L2 Development
Applied Linguistics Chapter Four: Corpus Linguistics
Corpus processing tools
Corpora, Language Technology and Maltese
Presentation transcript:

Using Corpora in Linguistics Introduction to WordSmith Tools for Beginners Íde O’Sullivan Regional Writing Centre www.ul.ie/rwc

Regional Writing Centre Corpus Linguistics McEnery and Wilson (2001:1) describe corpus linguistics as “the study of language based on examples of ‘real life’ language use”. McEnery, T. and Wilson, A. (2001) (2nd edition) Corpus Linguistics. Edinburgh: Edinburgh University Press. Regional Writing Centre

Regional Writing Centre Corpus: Definition “A corpus is [the name given to] a set of texts which has been put together for some purpose, usually (though not necessarily), in computer-readable form” (Wray, Trott & Bloomer, 1990:213). Wray, T., Trott, K. & Bloomer, A. (1998) Projects in Linguistics: A Practical Guide to Researching Language. London, New York: Arnold. Regional Writing Centre

Regional Writing Centre Corpus: Definition “a corpus typically implies a finite body of text, sampled to be maximally representative of a particular variety of a language, and which can be stored and manipulated using a computer” McEnery and Wilson (2001:73). Corpus ≠ Archive Regional Writing Centre

Concordancing: Definition “A concordance, in its simplest form, is an alphabetical listing of the words in a text, given together with the contexts in which they appear”. Catherine Ball, Concordances & Corpora: Tutorial: http://www.georgetown.edu/faculty/ballc/corpora/tutorial.html Regional Writing Centre

Concordancing: Definition “A concordance is a list of examples of a particular word, part of a word or combination of words, in its contexts drawn from a text corpus. The search word is sometimes also referred to as a keyword. The most common way of displaying a concordance is by a series of lines h the keyword in context (KWIC)”. Kettemann, B. (1995) “Concordancing in stylistics teaching”, in Grosser, W., Hogg, J. and Hubmeyer, K. (eds), Style: Literary and Non-Literary. Contemporary Trends in Cultural Stylistics. New York: The Edwin Mellen Press: 307-318. Regional Writing Centre

Regional Writing Centre

Software to Analyse Corpora “Concordancing software enables you to discover patterns that exist in natural language by grouping text in such a way that they are clearly visible […] The real value of the concordancer lies in this question of visibility” (Tribble & Jones, 1997:3). Tribble, C. and Jones, G. (1997) Concordances in the Classroom: Using Corpora in Language Education. Houston TX: Athelstan. Regional Writing Centre

Regional Writing Centre

Using Corpora in Language Learning and Teaching Organisation of the CD This CD contains a collection of small genre-specific academic and journalistic corpora in English, French, Gaeilge, German and Spanish. For each language there are two small genre-specific corpora: a journalistic corpus (100,000 words) and an academic corpus (50,000 words). The journalistic corpora are divided into four subcorpora: current affairs, editorials, reviews and sport. The academic corpora are divided into two subcorpora: theses and articles. Regional Writing Centre

Using Corpora in Language Learning and Teaching Organisation of the CD Regional Writing Centre

Sources of Journalistic Corpora English: Irish Examiner Irish Independent Irish Times French: Le Monde L’Humanité Gaeilge: Beo Foinse Lá German: Die Süddeutsche Zeitung Die Frankfurter Allgemeine Zeitung Spanish: La Vanguardia El Periódico Regional Writing Centre

Sources of Academic Corpora Articles and thesis written by native speakers Subject Areas: Literature, Cultural Studies, Translation Studies, Education, Applied Linguistics, Sociolinguistics, Corpus Linguistics, Media Studies, Language Pedagogy, Teacher Training, Discourse Analysis, Politics, Research Methodology, Second Language Acquisition, History of Language Regional Writing Centre

Regional Writing Centre WordSmith Tools Wordlists Frequency Alphabetical order Statistical information Keywords Concord Collocations Clusters Patterns Plot Source text Regional Writing Centre

Regional Writing Centre WordSmith Tools Concord Sorting data Concord expansion option Concordance with multiple views Settings Wildcards Advanced searching Close texts Regional Writing Centre

Regional Writing Centre Worksheet Run individual wordlists for the Academic Corpus and the Journalistic Corpus. Compare and contrast your findings to reach relative conclusions about each genre. Run a concordance lists for a chosen aspect of the language: Do any collocational patterns emerge from this evidence? What are the most common clusters including the search word(s). Identify the most common uses of the word. Are their exceptions to these uses? Regional Writing Centre

Regional Writing Centre Resources WordSmith Tools: http://www.lexically.net/wordsmith/ MonoConc and ParaConc http://www.athel.com/mono.html Regional Writing Centre

Regional Writing Centre Online Resources Tim Johns Data-driven Learning Page: http://www.eisu.bham.ac.uk/johnstf/timconc.htm Mike Barlow: http://www.athel.com/corpus.html Other resources: http://www.ul.ie/~appliedlanguages/LI4113_C&C_websites.doc Regional Writing Centre

Regional Writing Centre Online Concordancing Hong Kong Virtual Language Centre http://www.edict.com.hk/concordance/default.htm The Compleat Lexical Tutor (Lextutor) http://www.lextutor.ca/ French Learner Language Oral Corpus (flloc) http://www.flloc.soton.ac.uk/ Regional Writing Centre

Regional Writing Centre Resources Freeware Concordancers ConcApp: http://www.edict.com.hk/pub/concapp/ Create your own corpus - Disposable corpus Issues of copyright Issue of reliability Regional Writing Centre

Regional Writing Centre Resources British National Corpus (corpus demo) http://info.ox.ac.uk/bnc/ Cobuild Bank of English (wordbanks online) http://www.cobuild.collins.co.uk/ Corpus Concordance Sampler http://www.collins.co.uk/Corpus/CorpusSearch.aspx Limerick Corpus of Irish-English (L-CIE): http://www.ul.ie/~lcie/ Regional Writing Centre