Using Corpora in Linguistics Introduction to WordSmith Tools for Beginners Íde O’Sullivan Regional Writing Centre www.ul.ie/rwc.

Slides:



Advertisements
Similar presentations
Concordancing at Upper-Intermediate Levels What it is not What you will get from this talk.
Advertisements

Building up Corpus of Technical Vocabulary – Strategies and Feasibility Presenters: Dr. Aparna Palle, Preetha Anthony GNITS, HYDERABAD.
Integrating corpus-based vocabulary activities into an academic writing course TESOL 2005, San Antonio, Texas March 30, 2005 John Bunting Georgia State.
Uses of a Corpus “[E]xplore actual patterns of language use”
Using Corpus Tools in Discourse Analysis Discourse and Pragmatics Week 12.
Recycling Writing: learning from a corpus of student-generated texts Megan Bruce & Simon Rees Durham University Foundation Centre March 2013 Supported.
1 Corpora for all Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of Leeds and Sussex.
Recent Developments in Technological Tools for the Purpose of Facilitating SLA.
What is a corpus?* A corpus is defined in terms of  form  purpose The word corpus is used to describe a collection of examples of language collected.
Compiling a corpus II. Corpus A finite size, non random collection of naturally occurring language, in a computer readable form. Non-random = representative.
Approaches to Using Literature in the classroom. Definition Literature means those novels, short stories, plays and poems which convey their message by.
Corpus Linguistics. What is corpus linguistics? Method / Theory in Linguistics Analysis of collections of texts (corpora) Verifying/ Strengthening or.
1/26 Corpus Linguistics. 2/26 Varieties of English Relevance of corpus linguistics to this course –Previously studies of stylistics were largely informal.
Data-Driven South Asian Language Learning SALRC Pedagogy Workshop June 8, 2005 J. Scott Payne Penn State University
LELA English Corpus Linguistics
Using Corpora in Linguistics
The application of corpus analysis and concordance feedback to collegiate EFL writing Presenter: Wen-Shuenn Wu (Michael Wu) Chung Hua University, Hsinchu,
CALL – computer assisted language learning A short course delivered by Dr. Klaus Schwienhorst. MITE January 2002.
Corpus Linguistics: session 2 Corpus Linguistics (2): The Tools of the Trade 669o4zt
Presented by Jennifer Robison TexTESOL II March 12, 2010 San Antonio, TX.
1 Vocab Assessment & Corpora and Concordancing Major vocabulary assessment tools Major corpora and concordancers.
Corpus Linguistics What can a corpus tell us ? Levels of information range from simple word lists to catalogues of complex grammatical structures and.
Research methods in corpus linguistics Xiaofei Lu.
Memory Strategy – Using Mental Images
CORPUS LINGUISTICS: AN INTRODUCTION Susi Yuliawati, M.Hum. Universitas Padjadjaran
McEnery, T., Xiao, R. and Y.Tono Corpus-based language studies. Routledge. Unit A 2. Representativeness, balance and sampling (pp13-21)
Online Corpora in L2 Writing Class Zawan Al Bulushi Indiana University Bloomington November 15,
Using corpora for bespoke language teaching
Technology for Open Education Training with Open E-resources TOETOE Created by Alannah Fitzgerald Research Fellow at the English Language Centre, Durham.
BTANT 129 w5 Introduction to corpus linguistics. BTANT 129 w5 Corpus The old school concept – A collection of texts especially if complete and self-contained:
 What is the BNC?  What is Xaira?  How to use the BNC for: › Language teaching and learning › Research.
Reflections on Using Corpora Data in EFL Teaching CHEN BO Chongqing Jiaotong University 2006.
Researching language with computers Paul Thompson.
©2006 Barry Natusch Tools for Language Researchers Barry Natusch “ Man is a tool-using animal. Without tools he is nothing, with tools he is all. ” - Thomas.
Class 3 Corpora in language teaching. Current trends in FLT  Communicative Language Teaching  Trends within CLT authentic language contextualised language.
ENG 626 CORPUS APPROACHES TO LANGUAGE STUDIES exploring frequencies in texts Bambang Kaswanti Purwo
Why We Need Corpora and the Sketch Engine Adam Kilgarriff Lexical Computing Ltd, UK Universities of Leeds and Sussex.
Chapter 10 Language and Computer English Linguistics: An Introduction.
Tracking Language Development with Learner Corpora Xiaofei Lu CALPER 2010 Summer Workshop July 12, 2010.
Corpora and Concordancers in ESL/EFL Class: Truly Authentic Language for Language Learning. and opening.
for Materials Design The Theory & Practice of Concordancing.
Seminar in Applied Corpus Linguistics: Introduction APLNG 597A Xiaofei Lu August 26, 2009.
Corpus approaches to discourse
Corpus Linguistics in Research Doctorate in Education University of Warwick 6th November 2008.
Colorado State University
Using Corpora to Teach Vocabulary Helping Students Help Themselves 1.
Literature Review. Terminology Authentic Materials –Texts (written or spoken) designed for native speakers (Harmer 1991) –Text not specifically produced.
Corpus Linguistics MOHAMMAD ALIPOUR ISLAMIC AZAD UNIVERSITY, AHVAZ BRANCH.
What is a Corpus? What is not a corpus?  the Web  collection of citations  a text Definition of a corpus “A corpus is a collection of pieces of language.
INTRODUCTION TO APPLIED LINGUISTICS
CORPUS LINGUISTICS 1) A revision of corpus linguistics 2) Language corpora in the ESL/EFL classroom.
Class 3 Corpora in language teaching. Trends in FLT  Communicative Language Teaching  Trends within CLT authentic language contextualised language focus.
Making trouble-free corpus tasks in 10 minutes Jennie Wright.
Use of Concordancers A corpus (plural corpora) – a large collection of texts, written or spoken, stored on a computer. A concordancer – a computer programme.
PRIMENJENA LINGVISTIKA I NASTAVA JEZIKA II 3 rd class.
CORPUS LINGUISTICS Corpus linguistics is the study of language as expressed in samples (corpora) or "real world" text. An approach to derive at a set of.
Vocabulary acquisition in language classrooms
AntConc is a freeware, multiplatform of application suitable for all types of users

Using Corpora in Linguistics
Computational and Statistical Methods for Corpus Analysis: Overview
Exploring the BNC Corpus
عمادة التعلم الإلكتروني والتعليم عن بعد
Topics in Linguistics ENG 331
Introduction to Corpus Linguistics: Dispersion/concordance plots
Introduction to Corpus Linguistics: Key Word Analysis
Corpora and Concordancers in ESL/EFL Class:
(word formation: follow up)
Using GOLD to Tracking L2 Development
Applied Linguistics Chapter Four: Corpus Linguistics
Presentation transcript:

Using Corpora in Linguistics Introduction to WordSmith Tools for Beginners Íde O’Sullivan Regional Writing Centre

Regional Writing Centre2 Corpus Linguistics  McEnery and Wilson (2001:1) describe corpus linguistics as “the study of language based on examples of ‘real life’ language use”. McEnery, T. and Wilson, A. (2001) (2 nd edition) Corpus Linguistics. Edinburgh: Edinburgh University Press.

Regional Writing Centre3 Corpus: Definition  “A corpus is [the name given to] a set of texts which has been put together for some purpose, usually (though not necessarily), in computer-readable form” (Wray, Trott & Bloomer, 1990:213). Wray, T., Trott, K. & Bloomer, A. (1998) Projects in Linguistics: A Practical Guide to Researching Language. London, New York: Arnold.

Regional Writing Centre4 Corpus: Definition  “a corpus typically implies a finite body of text, sampled to be maximally representative of a particular variety of a language, and which can be stored and manipulated using a computer” McEnery and Wilson (2001:73).  Corpus ≠ Archive

Regional Writing Centre5 Concordancing: Definition  “A concordance, in its simplest form, is an alphabetical listing of the words in a text, given together with the contexts in which they appear”. Catherine Ball, Concordances & Corpora: Tutorial: pora/tutorial.html pora/tutorial.html

Regional Writing Centre6 Concordancing: Definition  “A concordance is a list of examples of a particular word, part of a word or combination of words, in its contexts drawn from a text corpus. The search word is sometimes also referred to as a keyword. The most common way of displaying a concordance is by a series of lines h the keyword in context (KWIC)”. Kettemann, B. (1995) “Concordancing in stylistics teaching”, in Grosser, W., Hogg, J. and Hubmeyer, K. (eds), Style: Literary and Non-Literary. Contemporary Trends in Cultural Stylistics. New York: The Edwin Mellen Press:

Regional Writing Centre7

8 Software to Analyse Corpora  “Concordancing software enables you to discover patterns that exist in natural language by grouping text in such a way that they are clearly visible […] The real value of the concordancer lies in this question of visibility” (Tribble & Jones, 1997:3). Tribble, C. and Jones, G. (1997) Concordances in the Classroom: Using Corpora in Language Education. Houston TX: Athelstan.

Regional Writing Centre9

10 Using Corpora in Language Learning and Teaching Organisation of the CD  This CD contains a collection of small genre- specific academic and journalistic corpora in English, French, Gaeilge, German and Spanish.  For each language there are two small genre- specific corpora: a journalistic corpus (100,000 words) and an academic corpus (50,000 words). The journalistic corpora are divided into four subcorpora: current affairs, editorials, reviews and sport. The academic corpora are divided into two subcorpora: theses and articles.

Regional Writing Centre11 Using Corpora in Language Learning and Teaching Organisation of the CD

Regional Writing Centre12 Sources of Journalistic Corpora English:Irish Examiner Irish Independent Irish Times French:Le Monde L’Humanité Gaeilge:Beo Foinse Lá German:Die Süddeutsche Zeitung Die Frankfurter Allgemeine Zeitung Spanish:La Vanguardia El Periódico

Regional Writing Centre13 Sources of Academic Corpora  Articles and thesis written by native speakers  Subject Areas:Literature,Cultural Studies, Translation Studies, Education, Applied Linguistics,Sociolinguistics, Corpus Linguistics,Media Studies, Language Pedagogy,Teacher Training, Discourse Analysis,Politics, Research Methodology, Second Language Acquisition, History of Language

Regional Writing Centre14 WordSmith Tools  Wordlists Frequency Alphabetical order Statistical information  Keywords  Concord Collocations Clusters Patterns Plot Source text

Regional Writing Centre15 WordSmith Tools  Concord Sorting data Concord expansion option Concordance with multiple views Settings Wildcards Advanced searching Close texts

Regional Writing Centre16 Worksheet  Run individual wordlists for the Academic Corpus and the Journalistic Corpus. Compare and contrast your findings to reach relative conclusions about each genre.  Run a concordance lists for a chosen aspect of the language: Do any collocational patterns emerge from this evidence? What are the most common clusters including the search word(s). Identify the most common uses of the word. Are their exceptions to these uses?

Regional Writing Centre17 Resources  WordSmith Tools:  MonoConc and ParaConc

Regional Writing Centre18 Online Resources  Tim Johns Data-driven Learning Page: conc.htm  Mike Barlow:  Other resources: 113_C&C_websites.doc 113_C&C_websites.doc

Regional Writing Centre19 Online Concordancing  Hong Kong Virtual Language Centre fault.htm  The Compleat Lexical Tutor (Lextutor)  French Learner Language Oral Corpus (flloc)

Regional Writing Centre20 Resources Freeware Concordancers  ConcApp:  Create your own corpus - Disposable corpus  Issues of copyright  Issue of reliability

Regional Writing Centre21 Resources  British National Corpus (corpus demo)  Cobuild Bank of English (wordbanks online)  Corpus Concordance Sampler Search.aspx  Limerick Corpus of Irish-English (L-CIE):