Chapter 3: An Introduction to Corpus Linguistics Compiled by: Sajjad Ghadamyari Farhad Ghiasvand Presentation Date: Dec. 8, 2014 - Monday.

Slides:



Advertisements
Similar presentations
Diachronic study and language change Corpus Linguistics Richard Xiao
Advertisements

Interlanguage IL LEC. 9.
Introduction to Computational Linguistics
Why study grammar? Knowledge of grammar facilitates language learning
Diachronic study and language change Corpus Linguistics Richard Xiao
Uses of a Corpus “[E]xplore actual patterns of language use”
Using Corpus Tools in Discourse Analysis Discourse and Pragmatics Week 12.
Introduction: A discourse perspective on grammar
What is VOICE? VOICE, the Vienna-Oxford International Corpus of English, is a structured collection of language data, the first computer-readable corpus.
English Lexicography.
Recent Developments in Technological Tools for the Purpose of Facilitating SLA.
What is a corpus?* A corpus is defined in terms of  form  purpose The word corpus is used to describe a collection of examples of language collected.
1/26 Corpus Linguistics. 2/26 Varieties of English Relevance of corpus linguistics to this course –Previously studies of stylistics were largely informal.
Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.
Corpora and Language Teaching
Presented by Jennifer Robison TexTESOL II March 12, 2010 San Antonio, TX.
1. Introduction Which rules to describe Form and Function Type versus Token 2 Discourse Grammar Appreciation.
Research methods in corpus linguistics Xiaofei Lu.
14: THE TEACHING OF GRAMMAR  Should grammar be taught?  When? How? Why?  Grammar teaching: Any strategies conducted in order to help learners understand,
English Corpora and Language Learning Tamás Váradi
Memory Strategy – Using Mental Images
CORPUS LINGUISTICS: AN INTRODUCTION Susi Yuliawati, M.Hum. Universitas Padjadjaran
GRAMMAR APPROACH By: Katherine Marzán Concepción EDUC 413 Prof. Evelyn Lugo.
McEnery, T., Xiao, R. and Y.Tono Corpus-based language studies. Routledge. Unit A 2. Representativeness, balance and sampling (pp13-21)
Corpus linguistics for translators Amanda Saksida University of Nova Gorica.
The DVC project: Disambiguation of Verbs by Collocation ____ an introduction to the linguistic theory of norms and exploitations Patrick Hanks Research.
E-Meld Workshop on Digitization of lexical Information 3-5 August 2002, EMU, Ypsilanti Working Group on Lexicon Macrostructures Chairman’s Report Dafydd.
Data collection and experimentation. Why should we talk about data collection? It is a central part of most, if not all, aspects of current speech technology.
Translation Studies 8. Research methods in Translation Studies Krisztina Károly, Spring, 2006 Sources: Károly, 2002; Klaudy, 2003.
Zolkower-SELL 1. 2 By the end of today’s class, you will be able to:  Describe the connection between language, culture and identity.  Articulate the.
Representatıvness, balance and samplıng ın a corpus Lınguistıcs.
Vocabulary connections
1 Corpora: Annotating and Searching LING 5200 Computational Corpus Linguistics Martha Palmer.
Chapter 1: By: Ms. Ola Al-arjani
Researching language with computers Paul Thompson.
Vocabulary connections:multi- word items in English.
Chapter 10 Language and Computer English Linguistics: An Introduction.
Tracking Language Development with Learner Corpora Xiaofei Lu CALPER 2010 Summer Workshop July 12, 2010.
Creating Authentic EFL Materials Using English Corpora: Some Benefits of Corpus for the Layman Tyler Barrett Kure City ALT
THE NATURE OF TEXTS English Language Yo. Lets Refresh So we tend to get caught up in the themes on English Language that we need to remember our basic.
인공지능 연구실 황명진 FSNLP Introduction. 2 The beginning Linguistic science 의 4 부분 –Cognitive side of how human acquire, produce, and understand.
Seminar in Applied Corpus Linguistics: Introduction APLNG 597A Xiaofei Lu August 26, 2009.
How Can Corpora Help Me To Be Successful in CO150?
A Survey of English Lexicology
Introduction to Linguistics Class # 1. What is Linguistics? Linguistics is NOT: Linguistics is NOT:  learning to speak many languages  evaluating different.
Lecture 1 Lec. Maha Alwasidi. Branches of Linguistics There are two main branches: Theoretical linguistics and applied linguistics Theoretical linguistics.
RESEARCH DESIGN & CORPUS COMPILATION. Corpus design is intrinsic and a fundamental part of the analysis. It is guided by the RQ and affects the results.
1 Branches of Linguistics. 2 Branches of linguistics Linguists are engaged in a multiplicity of studies, some of which bear little direct relationship.
Introduction Chapter 1 Foundations of statistical natural language processing.
Corpus search What are the most common words in English
SIMS 296a-4 Text Data Mining Marti Hearst UC Berkeley SIMS.
Corpus Linguistics MOHAMMAD ALIPOUR ISLAMIC AZAD UNIVERSITY, AHVAZ BRANCH.
What is a Corpus? What is not a corpus?  the Web  collection of citations  a text Definition of a corpus “A corpus is a collection of pieces of language.
Welcome to All S. Course Code: EL 120 Course Name English Phonetics and Linguistics Lecture 1 Introducing the Course (p.2-8) Unit 1: Introducing Phonetics.
INTRODUCTION TO APPLIED LINGUISTICS
CORPUS LINGUISTICS 1) A revision of corpus linguistics 2) Language corpora in the ESL/EFL classroom.
Use of Concordancers A corpus (plural corpora) – a large collection of texts, written or spoken, stored on a computer. A concordancer – a computer programme.
PRIMENJENA LINGVISTIKA I NASTAVA JEZIKA II 3 rd class.
AMANY ALKHAYAT PSCW ENG371 INTRODUCTION TO CORPUS PROCESSING Corpus Processing Ch1.
Text Linguistics. Definition of linguistics Linguistics can be defined as the scientific or systematic study of language. It is a science in the sense.
E303 Part II The Context of Language Research
Corpus Linguistics Anca Dinu February, 2017.
Syntax 1 Introduction.
CORPUS LINGUISTICS Corpus linguistics is the study of language as expressed in samples (corpora) or "real world" text. An approach to derive at a set of.
Computational and Statistical Methods for Corpus Analysis: Overview
Exploring the BNC Corpus
Corpus Linguistics I ENG 617
عمادة التعلم الإلكتروني والتعليم عن بعد
Using GOLD to Tracking L2 Development
Applied Linguistics Chapter Four: Corpus Linguistics
Presentation transcript:

Chapter 3: An Introduction to Corpus Linguistics Compiled by: Sajjad Ghadamyari Farhad Ghiasvand Presentation Date: Dec. 8, Monday

 Is it a branch of Linguistics, like: phonology, syntax, semantics, etc.? Or  It is a methodology of language studies? In an international conference on Corpus Linguistics held in the US in 2005, some said: It is an empirical method of linguistic analysis and description + using real-life examples of language from corpora. Some others said: it is an approach or methodology for studying language use. Some viewed it as a theory, much more than methodology. “Tuebert and Krishnamurty”’s view: It is a bottom-up approach that looks at the evidence of the corpus, analyses the evidence with the aim of finding probabilities and patterns, i.e., searching behind the curtain of language data for a system which would explain those data. What is Corpus Linguistics?

Corpus is a collection of naturally occurring language texts, spoken or written. Corpus is designed and compiled based on following principles: Corpus contents are selected based on their communicative purpose. Controlling the subject matter in corpus is done by external criteria. Criteria determining the structure of the corpus are small in number and separated from each other. Samples of language for corpus consist of entire text. The design and composition of the corpus are fully documented with full justification. Corpus design and construction

Common external criteria Date Period of time Location UK,USA,etc. Language British,American,etc. Domain Academic,etc. Type Book,Journal,etc. Mode Spoken,Written

Types of Corpora Criteria Purpose Text selection procedure Periods of time Medium Numebr of languages Type of speaker Annotation

General vs. Specialized/Domain-specific General corpora are bigger than specialized ones. They aim to examine patterns of language use for a language as a whole.  BNC (British National Corpus). Specialized/Domain-specific corpora aim to describe language use in a specific variety, register or genre.  JDEST Computer Corpus of Text in English for Science and Technology The selection of the contents of a specialized corpus requires the corpus linguist to seek advice from the experts of the field to ensure its representativeness and balance. Types of corpora in terms of purpose

A.Sample vs. Full-text Sample corpora consist of sections of samples of approximately same length.  SEU (Survey of English Usage corpus) Full-text corpora consist of full texts.  English Poetry Full-Text Database B.Closed/Static vs. Open/Dynamic In closed/static corpora, once the corpora are completed, no more texts are added. In open/dynamic corpora, new materials are continually added and older materials are discarded.  Bank of English (University of Birmingham) A.Types of corpora in terms of text selection procedure B.Types of corpora in terms of periods of time

A.Written vs. Spoken Written corpora only contain written texts.  Brown Spoken corpora contain spoken materials, concentrating on stress, intonation, etc.  MARSEC (Machine Readable Spoken English Corpus) Mixed: Contating both written and spoken  ICE (International Corpus of English) B.Monolingual vs. Multilingual(Parallel, Translation) Monolingual corpora are made of samples of only one language. Multilingual corpora are made of samples of more than one languages.(Same sampels, different labguages)  English-Norwegian Parallel Corpus A.Types of corpora in terms of medium B.Types of corpora in terms of numbers of languages

A.Native vs. Learner Native corpora are written by Native English-Speakers. Learner Corpora are written by those who learn English. B.Plain/Unannotated vs. Annotated Plain/Unannotated corpora solely covers samples.  Project Gutenberg Annotated corpora contains samples of texts plus some explicit linguistic information, e.g., genre, register,etc. A.Types of corpora in terms of type of speaker B.Types of corpora in terms of annotation

Corpus application Application s Tracking changes and vaiations Dictionary production Reference materials production Language research

Leech ( 2007) suggests four main reasons for all changes: Grammaticalization: is a process of language change by which words representing objects and actions (i.e. nouns and verbs) transform to become grammatical markers (affixes, preposition, etc.).  You will/do let us go. Colloquialization: is a tendency for written norms to become more informal and move closer to speech.  As an example, there has been a decrease in the use of passive forms, and a change in ‘of’ genitive “ the defeat of Liverpool” to “ Liverpool’s defeat”. Tracking of English Language variations and changes

Americanaization: Developing the new norms in language in the US results in an increase in the use of that American term.  “person” to “guy”, “doctor” to “doc”, “wireless” to “radio”. Democratization: Removing linguistic inequalities in society.  A decrease in the use of honorifics “Mr.” and “Madam” and increase in camaraderie terms, example, “Mr.” to “dude, guy”.

Due to the simplicity of the samples in corpora, their samples can be used in producing dictionaries and pedagogical coursebooks and reference material to make a better understanding in students and learners and users. Production of dictionaries Production of other reference materials: practice books,grammar books,etc.

A major application of corpora is to study different aspects of Linguistics including: Lexis: Corpora reveal which words have higher frequency, provide information about word-formation, derivation and word-compounds. Grammar: It provides information about how sentences and utterances are formed. Phraseology: In linguistics, phraseology is the study of set or fixed expressions, such as idioms, collocations, phrasal verb, and other types of multi-word lexical units in which the component parts of the expression’s meaning is created by co-selection of words.  Many phraseologies are adjacent words called n-grams, e.g., thank you. Another kind of phraseology involves non-adjacent words, e.g., the … of …. Literal and metaphorical meanings: Corpora try to teach metaphorical meanings of a term, besides its literal meaning.  Fruit, something to be eaten, and results. Using a corpus for language research