Corpus Linguistics MOHAMMAD ALIPOUR ISLAMIC AZAD UNIVERSITY, AHVAZ BRANCH.

Slides:



Advertisements
Similar presentations
Integrating corpus-based vocabulary activities into an academic writing course TESOL 2005, San Antonio, Texas March 30, 2005 John Bunting Georgia State.
Advertisements

Uses of a Corpus “[E]xplore actual patterns of language use”
Using Corpus Tools in Discourse Analysis Discourse and Pragmatics Week 12.
1 Corpora for all Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of Leeds and Sussex.
Høgskolen i Oslo Using Self-Compiled, Discipline- Specific Corpora as a Practical Learning-Research Tool for Developing Written Language Skills in English.
Recent Developments in Technological Tools for the Purpose of Facilitating SLA.
What is a corpus?* A corpus is defined in terms of  form  purpose The word corpus is used to describe a collection of examples of language collected.
January 12, Statistical NLP: Lecture 2 Introduction to Statistical NLP.
Corpus Linguistics. What is corpus linguistics? Method / Theory in Linguistics Analysis of collections of texts (corpora) Verifying/ Strengthening or.
1/26 Corpus Linguistics. 2/26 Varieties of English Relevance of corpus linguistics to this course –Previously studies of stylistics were largely informal.
Using Corpora in Linguistics Introduction to WordSmith Tools for Beginners Íde O’Sullivan Regional Writing Centre
Data-Driven South Asian Language Learning SALRC Pedagogy Workshop June 8, 2005 J. Scott Payne Penn State University
Using Corpora in Linguistics
CALL – computer assisted language learning A short course delivered by Dr. Klaus Schwienhorst. MITE January 2002.
Presented by Jennifer Robison TexTESOL II March 12, 2010 San Antonio, TX.
Corpus Linguistics What can a corpus tell us ? Levels of information range from simple word lists to catalogues of complex grammatical structures and.
Research methods in corpus linguistics Xiaofei Lu.
Chapter 3: An Introduction to Corpus Linguistics Compiled by: Sajjad Ghadamyari Farhad Ghiasvand Presentation Date: Dec. 8, Monday.
Memory Strategy – Using Mental Images
CORPUS LINGUISTICS: AN INTRODUCTION Susi Yuliawati, M.Hum. Universitas Padjadjaran
McEnery, T., Xiao, R. and Y.Tono Corpus-based language studies. Routledge. Unit A 2. Representativeness, balance and sampling (pp13-21)
Corpus linguistics for translators Amanda Saksida University of Nova Gorica.
The main advantage of concordancing tools is that they allow translators to see terms in a variety of contexts simultaneously to detect various kinds of.
Online Corpora in L2 Writing Class Zawan Al Bulushi Indiana University Bloomington November 15,
U SING C ORPUS - BASED R ESEARCH FOR L ANGUAGE T EACHING AND L EARNING ENGLISH 510 Hee Sung (Grace) Jun & Kimberly LeVelle.
Translation Studies 8. Research methods in Translation Studies Krisztina Károly, Spring, 2006 Sources: Károly, 2002; Klaudy, 2003.
Representatıvness, balance and samplıng ın a corpus Lınguistıcs.
Reflections on Using Corpora Data in EFL Teaching CHEN BO Chongqing Jiaotong University 2006.
Researching language with computers Paul Thompson.
Corpus-assisted discourse analysis
PowerConc: An R-gram Based Corpus Analysis Tool Jiajin Xu & Yunlong Jia Beijing Foreign Studies University.
ENG 626 CORPUS APPROACHES TO LANGUAGE STUDIES exploring frequencies in texts Bambang Kaswanti Purwo
Chapter 10 Language and Computer English Linguistics: An Introduction.
Tracking Language Development with Learner Corpora Xiaofei Lu CALPER 2010 Summer Workshop July 12, 2010.
Corpora and Concordancers in ESL/EFL Class: Truly Authentic Language for Language Learning. and opening.
B.A. (English Language) UNIVERSITI PUTRA MALAYSIA Second Semester 2011/2012 BBI 3211 (English for Specific Purposes)
Seminar in Applied Corpus Linguistics: Introduction APLNG 597A Xiaofei Lu August 26, 2009.
Title: Corpus-Based Analysis of the Representation of Thai Government in Bangkok Post and the Nation: Critical Discourse Analysis Mr.Warawit Natephra.
How Can Corpora Help Me To Be Successful in CO150?
Corpus approaches to discourse
Corpus Linguistics in Research Doctorate in Education University of Warwick 6th November 2008.
RESEARCH DESIGN & CORPUS COMPILATION. Corpus design is intrinsic and a fundamental part of the analysis. It is guided by the RQ and affects the results.
Engaging with data Choices and decisions. Seeing or looking at? The advance of corpus linguistics has certainly changed the way that we can look at our.
Using Corpora to Teach Vocabulary Helping Students Help Themselves 1.
What is a Corpus? What is not a corpus?  the Web  collection of citations  a text Definition of a corpus “A corpus is a collection of pieces of language.
CORPUS LINGUISTICS 1) A revision of corpus linguistics 2) Language corpora in the ESL/EFL classroom.
King Faisal University جامعة الملك فيصل Deanship of E-Learning and Distance Education عمادة التعلم الإلكتروني والتعليم عن بعد [ ] 1 جامعة الملك فيصل عمادة.
Use of Concordancers A corpus (plural corpora) – a large collection of texts, written or spoken, stored on a computer. A concordancer – a computer programme.
PRIMENJENA LINGVISTIKA I NASTAVA JEZIKA II 3 rd class.
Corpus Linguistics Anca Dinu February, 2017.
CORPUS LINGUISTICS Corpus linguistics is the study of language as expressed in samples (corpora) or "real world" text. An approach to derive at a set of.
AntConc is a freeware, multiplatform of application suitable for all types of users

Using Corpora in Linguistics
Computational and Statistical Methods for Corpus Analysis: Overview
Exploring the BNC Corpus
Corpus Linguistics I ENG 617
عمادة التعلم الإلكتروني والتعليم عن بعد
Topics in Linguistics ENG 331
Introduction to Corpus Linguistics: Dispersion/concordance plots
Introduction to Corpus Linguistics: Key Word Analysis
Content Analysis What is it? How do you do it? What are the advantages and disadvantages of it?
Corpora and Concordancers in ESL/EFL Class:
A Brief Intro to Corpus Techniques in ELT Research
(word formation: follow up)
Business and Management Research
Using GOLD to Tracking L2 Development
Applied Linguistics Chapter Four: Corpus Linguistics
Corpus processing tools
LINGUA INGLESE 2A – a.a. 2018/2019 Computer-Aided Translation Technology LESSON 2 prof. ssa Laura Liucci –
Presentation transcript:

Corpus Linguistics MOHAMMAD ALIPOUR ISLAMIC AZAD UNIVERSITY, AHVAZ BRANCH

Corpus Linguistics History  In the “pre-Chomskyan era”: “Corpora” where few paper slips with data. “Shoebox Corpora”: Non-representative. Corpus based in that methodology was empirical and based on observable data.

Corpus Linguistics History  The revolutionary 60s  With the advances in computer technology the exploitation of massive corpora became feasible. From the 80s onward, the number and size of corpora and corpus-based studies increased dramatically.  Corpora have revolutionized almost all branches of linguistics.  Computers:1) allow us to speed up the processing of data, 2) avoid human bias in data analysis, 3) and allow the enrichment of data with metadata

Corpus Linguistics History  Since the 1990s, the corpus methodology has revolutionized nearly all branches of linguistics  Corpus analysis can be illuminating in “virtually all branches of linguistics or language learning.” (Leech 1997)  Early studies used general corpora to carry out lexicographical research which led to the production of dictionaries. More recently, specialized corpora have been compiled in order to examine texts belonging to a particular register or genre, for example newspapers or academic discourse.

Corpus Linguistics  What is Corpus Linguistics? Corpus linguistics can be described as the study of language based on text corpora. The study of language based on examples of “real life“ language use.

What is a corpus?  “A corpus is a collection of pieces of language that are selected and ordered according to explicit linguistic criteria in order to be used as a sample of the language”. (Sinclair 1996)  It is an electronic collection of text both spoken and written, stored on a computer which can be easily retrieved for different application and uses.  “A corpus is a collection of (1) machine-readable (2) authentic texts (including transcripts of spoken data) which is (3) sampled to be (4) representative of a particular language or language variety.”

Characteristics of a Good Corpus  large  systematically assembled  natural texts  often available to other researchers  spoken and/or written language  usually in electronic form  can be tagged for use with text manipulation programs

Types of Corpora 1) General 2) Special  1) General: One which attempts to represent language as a whole, not any specific part of language. Different genres are included, lectures, movies, newspapers, etc.  For example: BNC, Includes both written and spoken (mostly written) Cambridge and Nottingham Corpus in Discourse in English (CANCODE) Michigan Corpus of Academic Spoken English (MICASE)

Types of Corpora  2)Special: Its more useful for use when you have a specific purpose in your mind and you want to conduct a study. You cantnot find a corpus related to your topic, so you should collect your own corpus. Specialized corpora have specific purposes and are used for single studies.  The collection might be time consuming and difficult, but not like general ones.  The frequency of individual words or phrases can be examined, and compared across sub-corpora, for example, in different genres or institutional contexts.  Concordancing Programs show all the instances of the same lexical item in concordance lines.

Nature of Corpus-Based Approach  It is empirical, analysing the actual patterns of use from natural texts  It utilizes a large and principled collection of natural texts as the basis for analysis  It makes extensive use of computers for analysis, using both automatic and interactive techniques  It integrates both quantitative and qualitative analytical techniques

Corpus Data Collection Spoken data must be transcribed from audio recordings. Written text must be rendered machine-readable by keyboarding or OCR (Optical Character Recognition) scanning. Language data so collected form a RAW CORPUS.

Corpus Collection Considerations  1. Size  2. Manageability  3. Representativeness  4. Generalizability  5. Content or composition  6. Data Saturation

Synchronic Corpora vs. Diachronic Corpora  Synchronic Corpora : Useful to compare varieties of English. Texts date all to the same period.  Diachronic Corpora : Texts date to different periods in time. Ideal to study language change and history.

Concordancers  WordSmith 4, 5, Mike Scott  AntConc  Tenka Text  Concapp V4  Simple Concordance Program  MonoConc

AntConc  AntConc  AntConc is a freeware, multiplatform tool for carrying out corpus linguistics research and data-driven learning.

AntConc  AntConc contains seven tools that can be accessed either by clicking on their 'tabs' in the tool window, or using the function keys F1 to F7.  Concordance Tool: This tool shows search results in a 'KWIC' (KeyWord In Context) format. This allows you to see how words and phrases are commonly used in a corpus of texts.  Concordance Plot Tool This tool shows search results plotted as a 'barcode' format. This allows you to see the position where search results appear in target texts.  File View Tool This tool shows the text of individual files. This allows you to investigate in more detail the results generated in other tools of AntConc.  Clusters/N-Grams The Clusters Tool shows clusters based on the search condition. In effect it summarizes the results generated in the Concordance Tool or Concordance Plot Tool. The N-Grams Tool, on the other hand, scans the entire corpus for 'N' (e.g. 1 word, 2 words, …) length clusters. This allows you to find common expressions in a corpus. Collocates: This tool shows the collocates of a search term. This allows you to investigate non-sequential patterns in language.  Word List: This tool counts all the words in the corpus and presents them in an ordered list. This allows you to quickly find which words are the most frequent in a corpus.  Keyword List: This tool shows the which words are unusually frequent (or infrequent) in the corpus in comparison with the words in a reference corpus. This allows you to identify characteristic words in the corpus, for example, as part of a genre or ESP study