Text type variation: Biber’s approach Andrew Hardie LING306
Introduction This lecture focuses on text-type Genre, medium etc. (especially Biber’s approach to text types) We will be concentrating on grammatical variation among different text-types
Vocabulary and text-type What type of text do the following vocabulary items suggest? facilitate enable proactive sustainability innovation integrate move forward impact synergy enhance strategic people-oriented vision excellence
The descriptive tradition of English grammar Hallmarks: Descriptive rather than theoretical Corpus-based Emphasis on variation Some important works Quirk et al (1985) Comprehensive Grammar of the English Language Biber et al. (1999) Longman Grammar of Spoken and Written English Key people include… Randolph Quirk, Geoffrey Leech, Doug Biber
Part-of-speech variation The frequency of different parts-of-speech in a text is a very basic thing to investigate… … but it is very useful in discovering differences between text-types Part-of-speech tagging: recap
Some example tags NN2 – noun, common, plural NP1 – noun, proper, singular VV0 – verb, lexical, base-form VVZ – verb, lexical, 3 rd sing. pres tense (-s) VVD – verb, lexical, past-tense CC – conjunction, coordinating CS – conjunction, subordinating XX – negative particle (the word not)
Procedure Count the tags in a text / corpus Compare the frequencies you find to the frequencies of the tags in a reference corpus Use a statistical significance test to tell you which differences are most significant see what the significantly-more-frequent- than-normal or significantly-less-frequent- than-normal tags tell you about the text type
An example in detail: Rayson, Wilson and Leech (2002) Data: BNC Sampler 2 million word subset of the BNC a mixture of different spoken and written text types manually-corrected POS tagging Rayson, Wilson & Leech compared the frequencies of POS tags in all the different text types in this corpus
Rayson, Wilson & Leech: findings Nouns are more common in writing than speech Nouns are more common in informative writing than imaginative writing Nouns are more common in task-oriented speech than conversational speech Adjectives are more common in writing than speech Adjectives are more common in informative writing than imaginative writing Adjectives are more common in task-oriented speech than conversational speech
Rayson, Wilson & Leech: findings Pronouns are more common in speech than writing Pronouns are more common in imaginative writing than informative writing Pronouns are more common in conversational speech than task-oriented speech Verbs are more common in imaginative writing than informative writing Verbs are more common in conversational speech than task-oriented speech Auxiliary verbs are more common in speech than writing (but same doesn’t apply to all verbs)
Rayson, Wilson & Leech: summary The more “speech-y”/less “writing-y” a text, the MORE verbs and pronouns it has, and the LESS nouns and adjectives it has information density Informative writing is more “writing-y” than imaginative writing (fiction) Everyday conversational speech is more “speech-y” than task-oriented speech
Speech vs. writing as a scale Most like speech Most like writing Conversational speech Task-oriented speech Imaginative writing Informative writing
Other studies … have done other kinds of comparison based on POS frequency Male speech versus female speech (Rayson, Leech and Hodges 1997) Learner writing versus native-speaker writing (Granger and Rayson 1998) Change over time (Mair et al. 2003)
Biber’s approach to text types Biber (1988) argues that the best way to describe variation is using dimensions Evidence has been presented for a supposed randomness in the movement of plankton animals. If valid, this implies that migrations involve kineses rather than taxes (Chapter 10). However, the data cited in support of this idea comprise without exception observations made in the laboratory.
Dimensions to classify this text We might use dimensions such as… Specialised vs. unspecialised Planned vs. unplanned Interactive vs. non-interactive Formal vs. informal Is there a better, non-impressionistic way to investigate the dimensions of variation?
Biber’s approach “The raw data of this approach are frequency counts of particular linguistic features. Frequency counts give an exact, quantitative characterization of a text, so that different texts can be compared in very precise terms. By themselves, however, frequency counts cannot identify linguistic dimensions. Rather, a linguistic dimension is determined on the basis of a consistent co-occurrence pattern among features. That is, when a group of features consistently co- occur in texts, these features define a linguistic dimension.” (Biber 1988: 13)
Why “dimensions”? Because a text-type can have different positions on different dimensions
Different dimensions can vary independently Given – say – two dimensions formal vs. informal concrete vs. abstract … we could equally well have any of… informal concrete informal abstract formal concrete formal abstract Unlike space, we are not limited to 3 dimensions
Biber’s multi-feature / multi- dimensional (MF/MD) approach Biber’s corpus had 481 texts (just under 1 million words), consisting of: Extracts from LOB Corpus (UK written English, 1960s) Extracts from London-Lund Corpus (UK spoken English, 1960s/1970s) Some letters (including professional and personal letters) (added because LOB contains no letters)
What features did Biber search for? Biber picked 67 features that had been used in previous research (full list on handout) Some straightforward: 42. Total adverbs 44. Average word length 9. Pronoun it 58. Verbs seem and appear Some more complex 33. Pied-piping relative clauses (“the house in which he lived”) 63. Split auxiliaries (“they are objectively shown to be”)
Automatic searches Biber’s corpus was tagged Searching for verb forms 1. Past tense: verbs like went, walked all tagged VBD 3. Present tense: verbs like go, goes, walk, walks all tagged VB or VBZ Searching for more complex features 2. Perfect aspect How can you look for this??
Finding instances of perfect aspect Perfect = HAVE + past participle (VBN) We need BOTH for it to be perfect Problem: things can come between HAVE & PP He has very easily done it They have obviously not succeeded Biber searched for HAVE + (ADV) + (ADV) + VBN But what about the question form? Have they succeeded?Has he done it? HAVE + N / PRO + VBN
Statistical processing Convert frequencies frequency per thousand words MULTIVARIATE statistical techniques FACTOR ANALYSIS “In a factor analysis, a large number of original variables, in this case the frequencies of linguistic features, are reduced to a small set of derived variables, the ‘factors’. Each factor represents some area in the original data that can be summarised or generalised.” (Biber 1988: 79)
Interpretation The statistical factors = dimensions Biber INTERPRETED the dimensions functionally an EXPLANATION for why that bundle of features is used that way A total of 6 dimensions were found
Biber’s Dimension 1 ‘Involved versus Informational Production’ Interactional involvement of speaker/writer & addressee, vs. lots of precise info packed in LOTS OF: THAT deletion, 1st / 2nd person pronouns, present tense, contractions LITTLE OF: nouns, prepositions, attributive adjectives High scores (involved): e.g. telephone conversations, face-to-face conversations Low scores (informational): e.g. academic prose, press reportage, official documents
Dimension 2: ‘Narrative versus Non-Narrative concerns’ Is the text telling a story/sequence of events, or is just describing/explaining? LOTS OF: past tense verbs, 3rd person pronouns, perfect aspect, present participle clauses High scores (narrative): e.g. romantic fiction; other types of fiction; biographies Low scores (non-narrative): e.g. academic prose, official documents, hobbies, broadcast
Dimension 3: ‘Explicit versus Situation-Dependent Reference’ Is what you’re talking about spelt out precisely, or do you have to know the context to know what it’s all about? LOTS OF: relative clauses (various sorts), coordinated phrases, nominalised verbs High scores (explicit): e.g. official documents, professional letters Low scores (situation-dependent): e.g. telephone conversations, broadcasts
The other dimensions Dimension 4 – ‘Overt Expression of Persuasion’ Does the text have linguistic features that mark the use of persuasion? Dimension 5 – ‘Abstract versus Non-Abstract Information’ Is the information formal, abstract & technical or not? Dimension 6 – ‘On-Line Informational Elaboration’ Is the information being produced in real time (i.e. unplanned), or has it been planned out?
A range not a dichotomy Dimension 2 – narrative / non-narrative the top text-types the bottom text types there exists a whole range of text-types in the middle – it’s not just a two-way distinction Note also – speech/writing is NOT the main distinction – spoken and written text types are mixed together along the dimension
Biber’s conclusion on speech and writing “… the variation among texts within speech and writing is often as great as the variation across the two modes. No absolute spoken/written distinction is identified in the study. Rather, the relations among spoken and written texts are complex and associated with a variety of different situational, functional and processing considerations.” (Biber 1988: 25)
Some potential problems Do searches for complex grammatical features in a corpus always find all the examples that they’re looking for? (see Ball 1994: 296) What constitutes a “text-type”? Do the texts which we’ve grouped together for the analysis really belong together? Can we be sure of the representativeness of our corpus? Does our sample reflect the full range of what’s “out there” in the language? Does it have the same relative frequencies of features as what’s “out there”? Criticisms of Biber: Lee (2001), Watson (1994, 1995), Biber (1995b)
Conclusion In this lecture we have: Talked about the descriptive tradition of English grammar research & how it is intrinsically linked with corpus linguistics Demonstrated a straightforward approach to text-type variation based on part-of-speech frequency Investigated Biber’s more detailed “multi-dimensional” approach to text-type variation Considered some problems that might be encountered in this kind of approach
Seminar In the seminar, you will: Have a go at searching for one of Biber’s features in the BNC See how it varies across different text-types as pre-defined in the BNC.
This week’s reading Compulsory (in the reading pack): Chapter 6 of Biber, Conrad and Reppen 1998 Optional advanced: Rayson, Wilson and Leech (2002) ons/rwl_lc36_2002.pdf ons/rwl_lc36_2002.pdf (ignore the statistics and concentrate on the results/findings) Biber (1988) (not the whole thing, you can dip in and out!) (you might try chapters 1, 4 and 6)