LIN 3098 – Corpus Linguistics Albert Gatt. In this lecture  Corpora for the study of genre/register variation revisit the concept of representativeness.

Slides:



Advertisements
Similar presentations
Diachronic study and language change Corpus Linguistics Richard Xiao
Advertisements

Corpora in language variation studies
Using an enhanced MDA model in study of World Englishes
Why study grammar? Knowledge of grammar facilitates language learning
How to do an article/book report? An example from Lakoff in Context: critical approach by Deborah Cameron.
Uses of a Corpus “[E]xplore actual patterns of language use”
Using Corpus Tools in Discourse Analysis Discourse and Pragmatics Week 12.
Assessing Student Learning: Using the standards, progression points and assessment maps Workshop 1: An overview FS1 Student Learning.
Introduction: A discourse perspective on grammar
Verbs Longman Student Grammar of Spoken and Written English Biber; Conrad; Leech (2009, p ) Verbs provide the focal point of the clause. The main.
Text Types (Sabatini) Group 1
Using an Enhanced MDA Model in study of World Englishes Richard Xiao
Word Order Choices Chapter 12
1 Words and the Lexicon September 10th 2009 Lecture #3.
Corpus 06 Discourse Characteristics. Reasons why discourse studies are not corpus-based: 1. Many discourse features cannot be identified automatically.
Corpus 05 Grammar. Unlike lexicography, grammar does not have a long tradition of empirical study. Prescriptive vs descriptive: traditionally, grammatical.
Communicative Language Ability
KS2 English Parent Workshop January 2015
Key terms Text Semiotics Semantic Syntax Pragmatics Transcoding Specialized text Non-specialized text.
1. Introduction Which rules to describe Form and Function Type versus Token 2 Discourse Grammar Appreciation.
Corpus Linguistics What can a corpus tell us ? Levels of information range from simple word lists to catalogues of complex grammatical structures and.
Corpus Linguistics Case study 2 Grammatical studies based on morphemes or words. G Kennedy (1998) An introduction to corpus linguistics, London: Longman,
Chapter 3: An Introduction to Corpus Linguistics Compiled by: Sajjad Ghadamyari Farhad Ghiasvand Presentation Date: Dec. 8, Monday.
Chapter 2 Words and word classes.
National Curriculum Key Stage 2
English Language and Literature Prelim Lesson: Investigating Language Use in ‘The Handmaid’s Tale’
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
Albert Gatt LIN 3098 Corpus Linguistics. In this lecture Some more on corpora and grammar Construction Grammar as a theoretical framework Collostructional.
Language Objectives. Planning Teachers should write both content and language objectives Content objectives are drawn from the subject area standards.
McEnery, T., Xiao, R. and Y.Tono Corpus-based language studies. Routledge. Unit A 2. Representativeness, balance and sampling (pp13-21)
TOM TORLAKSON State Superintendent of Public Instruction CALIFORNIA DEPARTMENT OF EDUCATION Tom Torlakson, State Superintendent of Public Instruction Next.
Channel Oral texts Written texts Intent of the Communicator Various types of texts (procedural, expository, persuasive, narrative, descriptive)
What is discourse analysis?
What is Readability?  A characteristic of text documents..  “the sum total of all those elements within a given piece of printed material that affect.
Representatıvness, balance and samplıng ın a corpus Lınguistıcs.
ENG 626 CORPUS APPROACHES TO LANGUAGE STUDIES exploring frequencies in texts Bambang Kaswanti Purwo
Change in Style: A Multi-Dimensional Approach John C. Paolillo SCAN Research Group Meeting October 4, 2002.
Tracking Language Development with Learner Corpora Xiaofei Lu CALPER 2010 Summer Workshop July 12, 2010.
LIN Corpus Linguistics LIN3098 – Corpus Linguistics Lecture 2 Albert Gatt.
1 Statistical NLP: Lecture 7 Collocations. 2 Introduction 4 Collocations are characterized by limited compositionality. 4 Large overlap between the concepts.
HYMES (1964) He developed the concept that culture, language and social context are clearly interrelated and strongly rejected the idea of viewing language.
Translation Studies 9. The use of corpora in TS Krisztina Károly, Spring, 2006 Sources: Olohan, 2004; Tirkkonen-Condit, 2005.
1 KINDS OF PARAGRAPH. There are at least seven types of paragraphs. Knowledge of the differences between them can facilitate composing well-structured.
Corpus approaches to discourse
Hayley’s Text Analysis Text Types Narrative Report Recount Instructions Explanations Arguments.
Language and Society II Ethnic dialect An ethnic dialect is a social dialect of a language that is mainly spoken by a less privileged population.
English for Specific Purposes
Register Analysis. Registers we use Think of all of the reading, writing, listening, and speaking you have done in the past week.
Corpus search What are the most common words in English
Levels of Linguistic Analysis
Differences between Spoken and Written Discourse
GCSE English Language 8700 GCSE English Literature 8702 A two year course focused on the development of skills in reading, writing and speaking and listening.
Genre and cultural purpose We recognize a genre when a text does something with language that we’re familiar with. Very often we are able state what kind.
Using the Resources in the Depository of Curriculum-based Learning and Teaching Resources to Introduce Text Types in English Language at Primary Level.
COGS Bilge Say1 Using Corpora for Language Research COGS 523-Lecture 9 Discource Characteristics and Register Variations.
2. The standards of textuality: cohesion Traditional approach to the study of lannguage: sentence as conventional object of study Structuralism (Bloofield,
Text type variation: Biber’s approach Andrew Hardie LING306.
PRIMENJENA LINGVISTIKA I NASTAVA JEZIKA II 3 rd class.
KS2 English Parent Workshop 21st October 2016
Information for studying GCSE English Language
Teaching English to Speakers of Other Languages
TRANSLATION 5. Genre and translation 1 Lingua Inglese 2 LM.
Making Connections: guidance on non-exam assessment
Types of text.
Stylistics and Stylometry
Levels of Linguistic Analysis
Applied Linguistics Chapter Four: Corpus Linguistics
What is Discourse Analysis
Presentation transcript:

LIN 3098 – Corpus Linguistics Albert Gatt

In this lecture  Corpora for the study of genre/register variation revisit the concept of representativeness and balance  external vs. internal criteria: Biber (1992) introduce the multi-dimensional approach to register/genre variation (Biber 1988)

Part 1 The concept of register/genre

A preliminary example  Compare the following: It is hard to resolve this problem. I find it hard to resolve this problem.  Is one intuitively more “formal”?  Why?

A preliminary example  Extraposed to-clause It is hard to resolve this problem.  It (expletive)  Verb be  An adjective (hard) or participle (boring)  Clause starting with to + infinitive verb  Tends to be associated with a formal, “anomymous” style.  Tends to be “static”: Adjective or participle denotes a state, not a dynamic event.

A preliminary example  Extraposed to-clause It is hard to resolve this problem.  It (expletive)  Verb be  An adjective (hard) or participle (boring)  Clause starting with to + infinitive verb  If our intuitions are correct, we would expect the distribution of this clause to vary across genres and registers.

What is a register?  Would you consider the following to be registers? 1.recipe English 2.legal Maltese 3.specialised language used by ship- builders  What are the crucial characteristics of register?

Defining register  Possible definitions (see overview in Paolillo 2000): register = “a field of discourse” or “topic” register = “a combination of all the parameters of the communicative situation” register = “an occupationally determined variety of language”

Defining genre  In discourse analysis and related fields, genre is given a “sociologically oriented” definition:  “A socially ratified way of using language in connection with a particular type of social activity” suggests “typical” settings in which language is used e.g. interview, lecture, story…

Why is this relevant?  Reminder (see lecture 2): general-purpose corpora aim for balance and representativeness  how genre/register are defined affects the structure and the uses of the corpus corpus-based studies of variation across/within registers need a well- defined notion

Balance and representativeness  Balance: refers to the range of types of text in the corpus e.g. the BNC’s construction was based on an a priori classification of texts by domain, time and medium  Representativeness: refers to the extent to which the corpus contains the full range of variation in the language.  Representativeness depends on balance as a prerequisite

Biber (1993) on achieving balance  Biber distinguishes: external criteria:  social and communicative contexts in which a particular sample of text/speech is produced  external criteria define registers or genres internal criteria:  linguistic (e.g. lexico-grammatical) features that distinguish texts  internal criteria define text types

External vs. internal  Example: academic writing vs. spoken conversation Some external criteria of differentiation:  primary channel (spoken/written/…)  type of addressee  factuality Some internal criteria of differentiation:  more uses of personal pronouns in spoken discourse  more use of passives in academic writing ……

Which should come first?  Biber’s argument: “in defining the population for a corpus, register/genre distinctions [i.e. external criteria] take precedence over text-type distinctions. […] identification of the salient text-type distinctions in a language requires a representative corpus of texts…”

Biber’s external criteria 1.Primary channel: written/spoken/scripted 2. Format: published/unpublished  includes various publication formats 3. Setting: institutional/other/private-personal

Biber’s external criteria 4. Addresse/receiver a.Plurality: unenumerated/ plural/individual/self b.Presence: present/absent c.Interactiveness: none/little/extensive d.Shared knowledge: general/ specialised/ personal

Biber’s external criteria 5.Addressor: a.Demographic variation: age, sex etc b.Acknowledgement: acknowledged invididual/insititution 6.Factuality: factual-informational / intermediate / imaginative 7.Purposes: persuade, entertain, edify, inform, instruct… 8.Topics: [cf. the “Domain” definition in BNC texts]

The logic behind genre/register comparison  A priori distinction between different genres/registers adequately sampled to be representative  Given these externally-based distinctions, the question is: what linguistic features are characteristic (give rise to) different genres?

Part 2 The multifeature/multidimensional framework (Biber 1988, Biber 1995)

Biber (1988, 1995)  Compared twenty-one genres in spoken and written British English  Used a precompiled list of 67 linguistic features, comparing: the extent to which these features “cluster together” across genres  high relative frequency of personal pronouns => high relative frequency of questions the extent to which these clusters are more clearly present in different genres

Primary goals 1.identify the main dimensions (clusters of features) of variation underlying all registers 2.find similarities and differences between different registers

Dimensions  Dimension: group of features that are empirically determined to co-occur in text  Functional interpretation: given a set of features forming a dimension  e.g. pers. pronouns + questions the crucial question is: how do we interpret it functionally? e.g. the cluster containing pers. pronouns and questions shows a high level of interpersonal focus in the text

Factor analysis  The MF/MD approach uses factor analysis statistical technique to group together related features based on their co- occurrence resulting clusters of features (“factors) are then interpreted and given a label this is the process of identification and functional interpretation of dimensions

Biber’s methodology 1.Identify the grammatical features based on review of existing literature 2.tag all relevant features in the corpus texts 3.post-edit the texts to ensure accuracy 4.count frequency of each feature in each text 5.apply factor analysis to compute co-occurrence patterns among features 6.interpret the resulting dimensions functionally 7.compare different registers to see how much each dimension is represented in them

Types of features  Lexical features type-token ratio (indicates the average no. of different types given the number of tokens) word length  lexical semantic features e.g. word classes like hedges (probably, possibly…); speech act verbs (declare), etc

Types of features  Grammatical feature classes nouns, prepositional phrases, attributive and predicative adjectives, etc.  Syntactic features: relative clauses, that-complements, pied-piping constructions (Which car does he like?), conditional subordination (should you ever…)

The dimensions identified  Involved vs. informational production  Narrative vs. non-narrative production  Elaborated vs. situation-dependent reference  Overt expression of persuasion  Abstract vs. non-abstract style NB. Many of these dimensions define “poles of opposition”

Dimension 1: involved vs. informational  Features: 1 st & 2 nd personal pronouns questions reductions stance verbs hedges emphatics adverbial subordination nouns adjectives prepositional phrases long words Typical of conversations, letters (high personal involvement) Typical of informational exposition, e.g. in official documents and academic writing

Dimension 2: Narrative vs. non- narrative  Features: past tense perfect aspect 3 rd person pronouns speech act verbs present tense attributive adjectives Typical of fiction Typical of broadcasts, telephone conversations, professional letters

Dimension 3: elaborated vs. situation-dependent reference  Features: wh-relative clauses pied-piping phrasal coordination time adverbials place adverbials Typical of “elaborated” text: official documents, professional letters, written exposition Typical of “situation- independent language” Typical of “situation- dependent language”, e.g. broadcasts, fiction, personal letters

Dimension 4: Overt expression of persuasion  Features: modals conditional subordination lack of any of the above Defines an “overt expression of persuasion type” e.g editorials, professional letters Language which does not overtly seek to persuade

Dimension 5: Abstract vs. non- abstract style  Features: agentless passives by-passives … lack of any of the above An “abstract style”: technical prose, academic prose, official documents Language which is typically not abstract: conversation, public speeches, broadcasts…

Biber’s main argument  No one dimension is enough to characterise the properties of a particular register dimensions are coherent, correlated groupings of features every register could be defined in terms of the relative prominence of all 5 dimensions

Biber’s main argument  Biber finds no evidence of an absolute difference between spoken and written language e.g. conversations often display similar characteristics to other non-spoken genres  Better to identify different types of speech (broadcast, scripted, spontaneous) view similarities and differences to different types of writing

Summary  Biber’s MF/MD approach has proved highly influential in the study of register and genre  Crucially, relies on a priori definition of: features (“what to look for”) registers (“situationally-defined uses of language”)

References  Paolillo, J. C. (2000). Formalising formality. Journal of Linguistics, 36: 215—259  Biber, D. (1993). Representativeness in corpus design. Literary and Linguistic Computing, 8 (4):  Biber, D. (1995). On the role of computational, statistical and interpretive techniques in multi-dimensional analysis of register variation. Text, 15 (3): 314—370