CORPUS LINGUISTICS Corpus linguistics is the study of language as expressed in samples (corpora) or "real world" text. An approach to derive at a set of.

Slides:



Advertisements
Similar presentations
Introduction to Computational Linguistics
Advertisements

Uses of a Corpus “[E]xplore actual patterns of language use”
Using Corpus Tools in Discourse Analysis Discourse and Pragmatics Week 12.
English Lexicography.
Recent Developments in Technological Tools for the Purpose of Facilitating SLA.
1/26 Corpus Linguistics. 2/26 Varieties of English Relevance of corpus linguistics to this course –Previously studies of stylistics were largely informal.
LELA English Corpus Linguistics
1/23 LELA Lecture 2 Corpus-based research in Linguistics See esp. Meyer pp
CALL – computer assisted language learning A short course delivered by Dr. Klaus Schwienhorst. MITE January 2002.
Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008 Corpus Linguistics and Language Education: Development and Utility of the Corpus of Cameroon English.
Presented by Jennifer Robison TexTESOL II March 12, 2010 San Antonio, TX.
1. Introduction Which rules to describe Form and Function Type versus Token 2 Discourse Grammar Appreciation.
Corpus Linguistics What can a corpus tell us ? Levels of information range from simple word lists to catalogues of complex grammatical structures and.
Chapter 3: An Introduction to Corpus Linguistics Compiled by: Sajjad Ghadamyari Farhad Ghiasvand Presentation Date: Dec. 8, Monday.
Memory Strategy – Using Mental Images
CORPUS LINGUISTICS: AN INTRODUCTION Susi Yuliawati, M.Hum. Universitas Padjadjaran
McEnery, T., Xiao, R. and Y.Tono Corpus-based language studies. Routledge. Unit A 2. Representativeness, balance and sampling (pp13-21)
Online Corpora in L2 Writing Class Zawan Al Bulushi Indiana University Bloomington November 15,
Translation Studies 8. Research methods in Translation Studies Krisztina Károly, Spring, 2006 Sources: Károly, 2002; Klaudy, 2003.
Researching language with computers Paul Thompson.
Linguistics The first week. Chapter 1 Introduction 1.1 Linguistics.
Tracking Language Development with Learner Corpora Xiaofei Lu CALPER 2010 Summer Workshop July 12, 2010.
Corpora and Concordancers in ESL/EFL Class: Truly Authentic Language for Language Learning. and opening.
인공지능 연구실 황명진 FSNLP Introduction. 2 The beginning Linguistic science 의 4 부분 –Cognitive side of how human acquire, produce, and understand.
Seminar in Applied Corpus Linguistics: Introduction APLNG 597A Xiaofei Lu August 26, 2009.
Corpus approaches to discourse
Lecture 1 Lec. Maha Alwasidi. Branches of Linguistics There are two main branches: Theoretical linguistics and applied linguistics Theoretical linguistics.
1 Branches of Linguistics. 2 Branches of linguistics Linguists are engaged in a multiplicity of studies, some of which bear little direct relationship.
Corpus Linguistics MOHAMMAD ALIPOUR ISLAMIC AZAD UNIVERSITY, AHVAZ BRANCH.
Using Corpora in TEFL By Terri Yueh. WhyWhy Work With Corpora? Why  From Vocabulary to Corpus  Choosing a Corpus Choosing a Corpus  Examples of Word.
Welcome to All S. Course Code: EL 120 Course Name English Phonetics and Linguistics Lecture 1 Introducing the Course (p.2-8) Unit 1: Introducing Phonetics.
INTRODUCTION TO APPLIED LINGUISTICS
Applied Linguistics Applied Linguistics means
Use of Concordancers A corpus (plural corpora) – a large collection of texts, written or spoken, stored on a computer. A concordancer – a computer programme.
Chapter 11 Linguistics and Foreign Language Teaching Lecturer: Rui Liu.
Theory of Legal Translation Unit 1 Introduction. The theory of legal translation as a linguistic discipline  General theory of translation studies the.
PRIMENJENA LINGVISTIKA I NASTAVA JEZIKA II 3 rd class.
Text Linguistics. Definition of linguistics Linguistics can be defined as the scientific or systematic study of language. It is a science in the sense.
Use of Literature in Language Teaching
Collecting Written Data
E303 Part II The Context of Language Research
An Introduction to Linguistics
Linguistics Linguistics can be defined as the scientific or systematic study of language. It is a science in the sense that it scientifically studies the.
Corpus Linguistics Anca Dinu February, 2017.
Syntax 1 Introduction.
Introduction to Corpus Linguistics
Learning Usage of English KWICly with WebLEAP/DSR
IB Assessments CRITERION!!!.
INTRODUCTION TO LINGUISTICS 1
Reading and Frequency Lists
Using Corpora in Linguistics
What is linguistics?.
Computational and Statistical Methods for Corpus Analysis: Overview
Exploring the BNC Corpus
Corpus Linguistics I ENG 617
عمادة التعلم الإلكتروني والتعليم عن بعد
Corpora and Concordancers in ESL/EFL Class:
Macrolinguistics Linguistics is not the only field concerned with language. Other disciplines such as psychology, sociology, ethnography, the science of.
Corpus-Based ELT CEL Symposium Creating Learning Designers
European Network of e-Lexicography
Corpus Linguistics I ENG 617
Introduction to Linguistics
Stylistics and Stylometry
What is Stylistics? Stylistics is the science which explores how readers interact with the language of (mainly literary) texts in order to explain how.
Using GOLD to Tracking L2 Development
Applied Linguistics Chapter Four: Corpus Linguistics
Branches of Applied Linguistics
English Teaching Models
Applied linguistics in language teaching 1
What is sociolinguistics?
Presentation transcript:

CORPUS LINGUISTICS Corpus linguistics is the study of language as expressed in samples (corpora) or "real world" text. An approach to derive at a set of abstract rules by which a natural language is governed or relates to another language. Originally done by hand, corpora are now largely derived by an automated process.

Corpus “Corpus", derived from the Latin word meaning "body", may be used to refer to any text in written or spoken form. In modern Linguistics, this term is used to refer to large collections of texts which represent a sample of a particular variety or use of language(s) that are presented in machine readable form.

Scope of Studies : The possible words, structures or uses in a language Their probable occurrence of an aspect in a language The description and explanation of the nature, structure and use of language with particular matters such as language acquisition, variation and change.

Types of Corpora spoken (transcribed) language, Written language from:- modern or old texts, texts from one language or several languages, texts from whole books, newspapers, journals, speeches, extracts of varying length. Online data

Corpus Linguistics is now seen as the study of linguistics phenomena through large collections of machine-readable texts: corpora. These are used within a number of research areas going from the Descriptive Study of the Syntax of a Language to Language Learning, etc.

List of corpora LIST OF CORPORA

Examples of Corpora Brown Corpus The Brown Corpus of Standard American English was the first of the modern, computer readable, general corpora. It was compiled by W.N. Francis and H. Kucera, Brown University, Providence, RI. The corpus consists of one million words of American English texts printed in 1961. The texts for the corpus were sampled from 15 different text categories to make the corpus a good standard reference. The LOB corpus (British English) and the Kolhapur Corpus (Indian English) are two examples of corpora made to match the Brown corpus. The availability of corpora which are so similar in structure is a valuable resourse for researchers interested in comparing different language varieties, for example.

BNC-British National Corpus The British National Corpus (BNC) is a 100 million word collection of samples of written and spoken language from a wide range of sources, designed to represent a wide cross-section of British English from the later part of the 20th century, both spoken and written. The latest edition is the BNC XML Edition, released in 2007.

Sample Corpus Sample

MALAY CORPUS http://mcp.anu.edu.au/ http://dbp.gov.my/korpus/korpus_DBP.pdf http://www.ukessays.com/essays/linguistics/malay-speech-corpus-linguistics-essay.php

Quranic Corpus http://corpus.quran.com/

Corpora of CMC http://www.cmc-corpora.de/ http://michael-beisswenger.de/pub/hsk-corpora.pdf

Role of The Computer in Corpus Linguistics To store huge amount of text To quickly retrieve huge amounts of texts To retrieve words, phrases or whole texts in context To sort out linguistic items To increase reliability in searching, counting and sorting linguistic items To provide accurate probability of occurrence of specific linguistic items.

Corpus-Related Research Computational Linguistics Cultural Studies Discourse Analysis and Pragmatics Grammar/Syntax Historical Linguistics Language Acquisition Language Teaching Language Variation Lexicography Linguistics Machine Translation Natural Language Processing (NLP) Psycholinguistics Semantics Social Psychology Sociolinguistics Speech Stylistics

Computational Linguistics (The use of computers to process or produce human language) Corpora are used as a resource to solve various problems.

Cultural Studies The existence of comparable corpora makes it possible to compare the language use in different countries. The result can point to differences in culture.

Grammar/ Syntax The existence of large corpora allows for the study of language as it is produced or to study the performance of people. By confronting the grammar with unrestricted corpus data, it can be tested on its correctness and its completeness.

Historical Linguistics Machine-readable corpora from different times allow historical linguists to conduct research related to development of a language over time

Language Acquisition Could provide data from learners of a target language from different countries, different age etc

Language Teaching -Corpus is used as data driven learning -more for higher level -investigate idiolect, idiosyncrasy, or certain aspects of grammar usage READING ASSIGNMENT Corpus Linguistics: What It Is and How It Can Be Applied to Teaching Daniel Krieger dannykrieger99 [at] hotmail.com Siebold University of Nagasaki (Nagasaki, Japan) http://iteslj.org/Articles/Krieger-Corpus.html

Language Variation To study or compare how language varies between different text types, domains, regions, speakers, writers, etc.

Lexicography Corpora is used for the production of dictionary and grammar books. Examples-Collins Cobuild, British National Corpus (BNC) & Longman Corpus Network.

Linguistics To provide traditional linguistics descriptions.

Psycholinguistics Contribute to the creation of hypothesis about the way the language is processed by the mind.

Semantics Study the meanings of words or utterances by looking at the context in which the words or phrase occurs.

Sociolinguistics To study the speakers’ age, sex, social class, writers’ age, etc.

Speech To be used for speech science and speech technology Speech To be used for speech science and speech technology. To compare spoken and written language. Teaching computers to produce and understand speech. Example- London-Lund Corpus (LLC)

Stylistics To find specific features of text types Stylistics To find specific features of text types. To compare with different texts. To detect changes of styles in authors’ writings.

Computational Stylistics The style of a text is a function of the aggregate of the ratios between the frequencies of its phonological, grammatical and lexical items, and the frequencies of the corresponding items in a contextually related norm Computers are used to study the stylistic characteristics of particular texts, authors, genres, periods etc.

Forensic Linguistics Forensic linguistics is the application of linguistics knowledge, methods and insights to the forensic context of law, language, crime investigation, trial, and judicial procedure. It is a branch of applied linguistics. Basically, there are three areas of application for linguists working in forensic contexts – 1) understanding language of the written law, 2) understanding language use in forensic and judicial processes and 3) the provision of linguistic evidence.

BASIC TOOL Concordancer Example of a software used for corpus linguistics What is a concordancer Examples of concordance programs How does it assist in the field of Corpus Linguistics and teaching and learning. Simple demonstration of the usage of a concordancer

Studies on Corpus Linguistics http://ieeexplore.ieee.org/xpl/abstractAuthors.jsp?reload=true&arnumber=5278382 International Journal of Education and Development using Information and Communication Technology (IJEDICT), 2011, Vol. 7, Issue 3, pp. 96-101 EDICT-2011-1303.pdf

Journal of Corpus Linguistics International Journal of Corpus Linguistics http://benjamins.com/catalog/ijcl EDICT-2011-1303.pdf

Reflection In what ways could the availability of corpus enrich your studies as a BENL student. Include a suggestion for a possible (Corpus Linguistic) topic for your MA thesis