Text type variation: Biber’s approach Andrew Hardie LING306.

Slides:



Advertisements
Similar presentations
What you’ll need to know for Freshman DGP
Advertisements

Corpora in grammatical studies
Corpora in language variation studies
Using an enhanced MDA model in study of World Englishes
Uses of a Corpus “[E]xplore actual patterns of language use”
Introduction: A discourse perspective on grammar
Verbs Longman Student Grammar of Spoken and Written English Biber; Conrad; Leech (2009, p ) Verbs provide the focal point of the clause. The main.
Chapter 4 Basics of English Grammar
Introduction to phrases & clauses
Using an Enhanced MDA Model in study of World Englishes Richard Xiao
LIN 3098 – Corpus Linguistics Albert Gatt. In this lecture  Corpora for the study of genre/register variation revisit the concept of representativeness.
What is a corpus?* A corpus is defined in terms of  form  purpose The word corpus is used to describe a collection of examples of language collected.
Testing 09. Problems and Directions Past experience with language help us come to the conclusion that we should keep a balance between linguistic and.
Corpus 06 Discourse Characteristics. Reasons why discourse studies are not corpus-based: 1. Many discourse features cannot be identified automatically.
SOWK 6003 Social Work Research Week 10 Quantitative Data Analysis
Corpus 05 Grammar. Unlike lexicography, grammar does not have a long tradition of empirical study. Prescriptive vs descriptive: traditionally, grammatical.
Presented by Jennifer Robison TexTESOL II March 12, 2010 San Antonio, TX.
1. Introduction Which rules to describe Form and Function Type versus Token 2 Discourse Grammar Appreciation.
Corpus Linguistics What can a corpus tell us ? Levels of information range from simple word lists to catalogues of complex grammatical structures and.
Chapter 2 Words and word classes.
Albert Gatt LIN 3098 Corpus Linguistics. In this lecture Some more on corpora and grammar Construction Grammar as a theoretical framework Collostructional.
GRAMMAR APPROACH By: Katherine Marzán Concepción EDUC 413 Prof. Evelyn Lugo.
Chapter 4 Basics of English Grammar Business Communication Copyright 2010 South-Western Cengage Learning.
McEnery, T., Xiao, R. and Y.Tono Corpus-based language studies. Routledge. Unit A 2. Representativeness, balance and sampling (pp13-21)
What is discourse analysis?
Researching language with computers Paul Thompson.
What are little verbs made of? What are little verbs made of? Deriving the English verbal system from underlying elements Jim Baker Trinity Hall McMenemy.
ELA Common Core Shifts. Shift 1 Balancing Informational & Literary Text.
WRITING THE RESEARCH REPORT & CITING RESOURCES BUSN 364 – Week 15 Özge Can.
Scientific writing style Exact  Word choice: make certain that every word means exactly what you want to express. Choose synonyms with care. Be not.
ENG 626 CORPUS APPROACHES TO LANGUAGE STUDIES exploring frequencies in texts Bambang Kaswanti Purwo
Change in Style: A Multi-Dimensional Approach John C. Paolillo SCAN Research Group Meeting October 4, 2002.
Adverbials Chapter 11 Longman Student Grammar of Spoken and Written English Biber; Conrad; Leech (2009, p )
Academically Productive Conversations Adapted from: Lily Wong Fillmore UC Berkeley Instructional Strategy.
Academic Vocabulary and Grammar Academic Word Lists.
Tracking Language Development with Learner Corpora Xiaofei Lu CALPER 2010 Summer Workshop July 12, 2010.
Capturing patterns of linguistic interaction in a parsed corpus A methodological case study Sean Wallis Survey of English Usage University College London.
Data Analysis Econ 176, Fall Populations When we run an experiment, we are always measuring an outcome, x. We say that an outcome belongs to some.
Register Analysis. Registers we use Think of all of the reading, writing, listening, and speaking you have done in the past week.
Assessment Procedures for Counselors and Helping Professionals, 7e © 2010 Pearson Education, Inc. All rights reserved. English Language Learners Assessing.
Corpus search What are the most common words in English
◦ Process of describing the structure of phrases and sentences Chapter 8 - Phrases and sentences: grammar1.
Differences between Spoken and Written Discourse
Putting it All Together Xiaofei Lu APLNG 596D July 17, 2009.
Corpus Linguistics MOHAMMAD ALIPOUR ISLAMIC AZAD UNIVERSITY, AHVAZ BRANCH.
GCSE English Language 8700 GCSE English Literature 8702 A two year course focused on the development of skills in reading, writing and speaking and listening.
Writing Technical Reports in Science Writing in Science Writing in Science.
Parents Meeting – SATS St James Primary Wednesday 2nd March
Pronouns Pronouns are used in place of nouns, mostly to avoid repetition. Personal pronouns – refer to particular people: I, you, us. Impersonal pronouns.
What is a Corpus? What is not a corpus?  the Web  collection of citations  a text Definition of a corpus “A corpus is a collection of pieces of language.
LING 306.  Last week – informalisation  Ongoing change in spoken English  This week – written English  Is written English being “informalised” as.
1 Vocabulary acquisition from extensive reading: A case study Maria Pigada and Norbert Schmitt ( 2006)
© Worth Weller; M. Stadnycki. Your essays must be your own words with your own thoughts and your own voice. However, quoting sources in your essays: 
Differences between Spoken and Written Discourse Source: Paltridge, p.p
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
Arrangements for teacher assessments and SATs at Gamlingay First School in 2016.
Writing 2 ENG 221 Norah AlFayez. Lecture Contents Revision of Writing 1. Introduction to basic grammar. Parts of speech. Parts of sentences. Subordinate.
Lecture 7 Gender & Age.
Approaches to teaching English The differences between EAP and General EFL Louis Rogers.
King Faisal University جامعة الملك فيصل Deanship of E-Learning and Distance Education عمادة التعلم الإلكتروني والتعليم عن بعد [ ] 1 King Faisal University.
PRIMENJENA LINGVISTIKA I NASTAVA JEZIKA II 3 rd class.
Self- Assessment Literacy Learning Continuum Example
Making Connections: guidance on non-exam assessment
Welcome to Year 6 Today we are primarily looking at how to help your child in their reading test. Also, you will be given information about all the elements.
A Systematic Framework for Language Analysis
Stylistics and Stylometry
Applied Linguistics Chapter Four: Corpus Linguistics
Differences between written and spoken discourse
TECHNICAL REPORTS WRITING
Presentation transcript:

Text type variation: Biber’s approach Andrew Hardie LING306

Introduction This lecture focuses on text-type Genre, medium etc. (especially Biber’s approach to text types) We will be concentrating on grammatical variation among different text-types

Vocabulary and text-type What type of text do the following vocabulary items suggest? facilitate enable proactive sustainability innovation integrate move forward impact synergy enhance strategic people-oriented vision excellence

The descriptive tradition of English grammar Hallmarks: Descriptive rather than theoretical Corpus-based Emphasis on variation Some important works Quirk et al (1985) Comprehensive Grammar of the English Language Biber et al. (1999) Longman Grammar of Spoken and Written English Key people include… Randolph Quirk, Geoffrey Leech, Doug Biber

Part-of-speech variation The frequency of different parts-of-speech in a text is a very basic thing to investigate… … but it is very useful in discovering differences between text-types Part-of-speech tagging: recap

Some example tags NN2 – noun, common, plural NP1 – noun, proper, singular VV0 – verb, lexical, base-form VVZ – verb, lexical, 3 rd sing. pres tense (-s) VVD – verb, lexical, past-tense CC – conjunction, coordinating CS – conjunction, subordinating XX – negative particle (the word not)

Procedure Count the tags in a text / corpus Compare the frequencies you find to the frequencies of the tags in a reference corpus Use a statistical significance test to tell you which differences are most significant  see what the significantly-more-frequent- than-normal or significantly-less-frequent- than-normal tags tell you about the text type

An example in detail: Rayson, Wilson and Leech (2002) Data: BNC Sampler 2 million word subset of the BNC a mixture of different spoken and written text types manually-corrected POS tagging Rayson, Wilson & Leech compared the frequencies of POS tags in all the different text types in this corpus

Rayson, Wilson & Leech: findings Nouns are more common in writing than speech Nouns are more common in informative writing than imaginative writing Nouns are more common in task-oriented speech than conversational speech Adjectives are more common in writing than speech Adjectives are more common in informative writing than imaginative writing Adjectives are more common in task-oriented speech than conversational speech

Rayson, Wilson & Leech: findings Pronouns are more common in speech than writing Pronouns are more common in imaginative writing than informative writing Pronouns are more common in conversational speech than task-oriented speech Verbs are more common in imaginative writing than informative writing Verbs are more common in conversational speech than task-oriented speech Auxiliary verbs are more common in speech than writing (but same doesn’t apply to all verbs)

Rayson, Wilson & Leech: summary The more “speech-y”/less “writing-y” a text, the MORE verbs and pronouns it has, and the LESS nouns and adjectives it has information density Informative writing is more “writing-y” than imaginative writing (fiction) Everyday conversational speech is more “speech-y” than task-oriented speech

Speech vs. writing as a scale Most like speech Most like writing Conversational speech Task-oriented speech Imaginative writing Informative writing

Other studies … have done other kinds of comparison based on POS frequency Male speech versus female speech (Rayson, Leech and Hodges 1997) Learner writing versus native-speaker writing (Granger and Rayson 1998) Change over time (Mair et al. 2003)

Biber’s approach to text types Biber (1988) argues that the best way to describe variation is using dimensions Evidence has been presented for a supposed randomness in the movement of plankton animals. If valid, this implies that migrations involve kineses rather than taxes (Chapter 10). However, the data cited in support of this idea comprise without exception observations made in the laboratory.

Dimensions to classify this text We might use dimensions such as… Specialised vs. unspecialised Planned vs. unplanned Interactive vs. non-interactive Formal vs. informal Is there a better, non-impressionistic way to investigate the dimensions of variation?

Biber’s approach “The raw data of this approach are frequency counts of particular linguistic features. Frequency counts give an exact, quantitative characterization of a text, so that different texts can be compared in very precise terms. By themselves, however, frequency counts cannot identify linguistic dimensions. Rather, a linguistic dimension is determined on the basis of a consistent co-occurrence pattern among features. That is, when a group of features consistently co- occur in texts, these features define a linguistic dimension.” (Biber 1988: 13)

Why “dimensions”? Because a text-type can have different positions on different dimensions

Different dimensions can vary independently Given – say – two dimensions formal vs. informal concrete vs. abstract … we could equally well have any of… informal concrete informal abstract formal concrete formal abstract Unlike space, we are not limited to 3 dimensions

Biber’s multi-feature / multi- dimensional (MF/MD) approach Biber’s corpus had 481 texts (just under 1 million words), consisting of: Extracts from LOB Corpus (UK written English, 1960s) Extracts from London-Lund Corpus (UK spoken English, 1960s/1970s) Some letters (including professional and personal letters) (added because LOB contains no letters)

What features did Biber search for? Biber picked 67 features that had been used in previous research (full list on handout) Some straightforward: 42. Total adverbs 44. Average word length 9. Pronoun it 58. Verbs seem and appear Some more complex 33. Pied-piping relative clauses (“the house in which he lived”) 63. Split auxiliaries (“they are objectively shown to be”)

Automatic searches Biber’s corpus was tagged Searching for verb forms 1. Past tense: verbs like went, walked all tagged VBD 3. Present tense: verbs like go, goes, walk, walks all tagged VB or VBZ Searching for more complex features 2. Perfect aspect How can you look for this??

Finding instances of perfect aspect Perfect = HAVE + past participle (VBN) We need BOTH for it to be perfect Problem: things can come between HAVE & PP He has very easily done it They have obviously not succeeded Biber searched for HAVE + (ADV) + (ADV) + VBN But what about the question form? Have they succeeded?Has he done it? HAVE + N / PRO + VBN

Statistical processing Convert frequencies  frequency per thousand words MULTIVARIATE statistical techniques FACTOR ANALYSIS “In a factor analysis, a large number of original variables, in this case the frequencies of linguistic features, are reduced to a small set of derived variables, the ‘factors’. Each factor represents some area in the original data that can be summarised or generalised.” (Biber 1988: 79)

Interpretation The statistical factors = dimensions Biber INTERPRETED the dimensions functionally an EXPLANATION for why that bundle of features is used that way A total of 6 dimensions were found

Biber’s Dimension 1 ‘Involved versus Informational Production’ Interactional involvement of speaker/writer & addressee, vs. lots of precise info packed in LOTS OF: THAT deletion, 1st / 2nd person pronouns, present tense, contractions LITTLE OF: nouns, prepositions, attributive adjectives High scores (involved): e.g. telephone conversations, face-to-face conversations Low scores (informational): e.g. academic prose, press reportage, official documents

Dimension 2: ‘Narrative versus Non-Narrative concerns’ Is the text telling a story/sequence of events, or is just describing/explaining? LOTS OF: past tense verbs, 3rd person pronouns, perfect aspect, present participle clauses High scores (narrative): e.g. romantic fiction; other types of fiction; biographies Low scores (non-narrative): e.g. academic prose, official documents, hobbies, broadcast

Dimension 3: ‘Explicit versus Situation-Dependent Reference’ Is what you’re talking about spelt out precisely, or do you have to know the context to know what it’s all about? LOTS OF: relative clauses (various sorts), coordinated phrases, nominalised verbs High scores (explicit): e.g. official documents, professional letters Low scores (situation-dependent): e.g. telephone conversations, broadcasts

The other dimensions Dimension 4 – ‘Overt Expression of Persuasion’ Does the text have linguistic features that mark the use of persuasion? Dimension 5 – ‘Abstract versus Non-Abstract Information’ Is the information formal, abstract & technical or not? Dimension 6 – ‘On-Line Informational Elaboration’ Is the information being produced in real time (i.e. unplanned), or has it been planned out?

A range not a dichotomy Dimension 2 – narrative / non-narrative the top text-types the bottom text types there exists a whole range of text-types in the middle – it’s not just a two-way distinction Note also – speech/writing is NOT the main distinction – spoken and written text types are mixed together along the dimension

Biber’s conclusion on speech and writing “… the variation among texts within speech and writing is often as great as the variation across the two modes. No absolute spoken/written distinction is identified in the study. Rather, the relations among spoken and written texts are complex and associated with a variety of different situational, functional and processing considerations.” (Biber 1988: 25)

Some potential problems Do searches for complex grammatical features in a corpus always find all the examples that they’re looking for? (see Ball 1994: 296) What constitutes a “text-type”? Do the texts which we’ve grouped together for the analysis really belong together? Can we be sure of the representativeness of our corpus? Does our sample reflect the full range of what’s “out there” in the language? Does it have the same relative frequencies of features as what’s “out there”? Criticisms of Biber: Lee (2001), Watson (1994, 1995), Biber (1995b)

Conclusion In this lecture we have: Talked about the descriptive tradition of English grammar research & how it is intrinsically linked with corpus linguistics Demonstrated a straightforward approach to text-type variation based on part-of-speech frequency Investigated Biber’s more detailed “multi-dimensional” approach to text-type variation Considered some problems that might be encountered in this kind of approach

Seminar In the seminar, you will: Have a go at searching for one of Biber’s features in the BNC See how it varies across different text-types as pre-defined in the BNC.

This week’s reading Compulsory (in the reading pack): Chapter 6 of Biber, Conrad and Reppen 1998 Optional advanced: Rayson, Wilson and Leech (2002) ons/rwl_lc36_2002.pdf ons/rwl_lc36_2002.pdf (ignore the statistics and concentrate on the results/findings) Biber (1988) (not the whole thing, you can dip in and out!) (you might try chapters 1, 4 and 6)