Presentation is loading. Please wait.

Presentation is loading. Please wait.

12.06.2016COGS 523 - Bilge Say1 Using Corpora for Language Research COGS 523-Lecture 9 Discource Characteristics and Register Variations.

Similar presentations


Presentation on theme: "12.06.2016COGS 523 - Bilge Say1 Using Corpora for Language Research COGS 523-Lecture 9 Discource Characteristics and Register Variations."— Presentation transcript:

1 12.06.2016COGS 523 - Bilge Say1 Using Corpora for Language Research COGS 523-Lecture 9 Discource Characteristics and Register Variations

2 12.06.2016COGS 523 - Bilge Say2 Related Readings Biber, Conrad and Reppen (1998). Corpus Linguistics. Chs 5 and 6

3 12.06.2016COGS 523 - Bilge Say3 Discourse Studies Text based vs. corpus based Lack of generalizability and quantitative techniques Discourse features: hard to identify automatically Not much help from conventional corpus workbenches; but interactive tools in conjunction w. surface grammatical analysis tools can work

4 Text Sample 5.1: News reportage Throtec. International Inc. said it reached agreements with an investor group and Wells Fargo Bank under which it will receive loans and equity infusion in return for stock that will reduce the number of shares in public hands by as much as 85 percent. The engineering and consulting firm, which has been plagued by losses of five years, said the restructuring is required to relieve its debt burden and “acute shortage of cash.”

5 Text Sample 5.2: Conversation A: Right, I’m ready. Have you locked the back door? [pause] I thought we were walking. B: Well do you want to walk or do you want to go in the car. A: Well I have to go to the paper shop. B: Well I’ll drop you at the paper shop while I go round. A: Oh that’s a good idea.

6 12.06.2016COGS 523 - Bilge Say6 Referring Expressions in Different Text Types Exophoric (text-external) vs. text internal Known vs. New London-Lund Corpus (Spoken) Conversation Public Speeches LOB Corpus News Reportage Academic Prose

7 12.06.2016COGS 523 - Bilge Say7 Characteristics of Referring Expressions Status of Information: Given vs new For given information: type of reference (anaphoric, exophoric, or inferrable) For anaphoric reference, form of the expression (pronoun,synonym, or repetition) For anaphoric reference, the distance between the anaphoric expression and the antecedent

8 12.06.2016COGS 523 - Bilge Say8 Small sample of the texts from the London-Lund and LOB corpora were coded: the first 200 words in forty texts (five texts from conversation, nine texts from public speeches, ten texts from news reportage, and sixteen texts from academic prose) Illustrative Analysis

9 Six noun phrase characteristics recorded: 1. register of the text 2. nominal form: pronoun versus full noun 3. information status: given versus new 4. if given, type of reference: anaphoric, exophoric, or inferrable (last category not included in the references) 5. if anaphoric and a full noun, type of expression: synonym versus noun repetition (pronouns have already been identified in step 2) 6. if anaphoric, the distance between the target referring expression and its antecedent

10 12.06.2016COGS 523 - Bilge Say10 Interactive Text Analysis Program All texts grammatically tagged Stopping at all nouns and pronouns, asking for user feedback, from a list Initial analysis – e.g. anaphoric and given for pronouns ; repeated nouns as given... User selects the antecedent if necessary; the program counts the number of noun phrases intervening between the referring expression and the antecedent.

11 12.06.2016COGS 523 - Bilge Say11

12 12.06.2016COGS 523 - Bilge Say12

13 12.06.2016COGS 523 - Bilge Say13 Distance between RE and antecedents On-line comprehension and production requirements make a difference. Pronouns tend to occur much closer to their antecedents than repeated full nouns – holds across registers. Full noun expressions are preferred for anaphoric reference over large distances.

14 Average Distance Conversation4.5 Speeches5.5 News11.0 Academic Prose9.0 Table 5.1. Average Distance Measures for Registers Average Pronominal Distance Average Full Noun Distance Conversation3.09.0 Speeches3.510.0 News3.013.5 Academic Prose2.510.0 Table 5.2. Average Distance Measures for pronominal versus full noun anaphoric expressions (Biber et al., 1998)

15 12.06.2016COGS 523 - Bilge Say15 Comments A larger number of texts and longer text samples needed for generalizable results. Other distinctions could be investigated: e.g. Referring expression distributions between main clauses vs dependent clauses.

16 12.06.2016COGS 523 - Bilge Say16 Discourse maps of verb tense and voice Marking of verbe tense and voice can reflect larger rhetorical divisions within a text. Subtexts such as sections, and nonovertly marked divisions can reflect communicative purpose shifts accompanied by linguistic feature shifts. Verb tense and voice shifts in major sections (I- Introduction, M-Methods, R-Results- D- Discussion) of research articles English medical research (19 medical articles taken from ARCHER Corpus, published in 1985, each text as a unit of analysis)

17 Linguistic Feature Section IntroductionMethodsResultsDiscussion Present Tense 47.921.135.960.6 F=29.25; p <0.001: r 2 =.549 Past tense20.748.540.313 F=36.74; p <0.001: r 2 =.605 agentless passives18.439.916.916.3 F=33.17; p <0.001: r 2 =.580 Table 5.3 Mean scores (per 1000 words) of selected linguistic features across the I-M-R-D sections of English medical research articles (N=19) (Biber et al., 1998)

18 12.06.2016COGS 523 - Bilge Say18 Reflections on Frequency Counts Present tense verbs in Introduction and Discussion sections: emphasis on current state of the art and the present implications of the current research. Past tense in Methodology and Results: Focus on reportage of past events and procedures. Methodology: agentless passives-presenting events impersonally. How does a text develop? Are there systematic patterns of variation within sections? What can we do with texts that do not have overtly marked sections?

19 12.06.2016COGS 523 - Bilge Say19 Drawing a “map” of progression of verbs Two medical research articles in ecology. (from the Corpus of Writing in the Disciplines) A program that marks over pos- tagged text two binary distinctions: past vs non-past (including modals) and active vs passive (non-finite clauses were excluded from the analysis)

20 12.06.2016COGS 523 - Bilge Say20 NP: Nonpast P: Past A: Active PS: Passive (Biber et al., 1998)

21 12.06.2016COGS 523 - Bilge Say21 Comments Transition zones between sections: writers start a transition at the end of one section, continue a transition into the beginning of the following section. Extensions possible:Patterns of modal verbs, as well as perfect and progressive aspects.

22 12.06.2016COGS 523 - Bilge Say22 Studying Register Variation A cover term for varieties defined by their situational characteristics, such as purpose, topic, setting, interactiveness. We control a range of registers and switch from one to another, important for language acquisition and learning. Describing linguistic characteristics of different registers might be a prequisite to understanding and using this knowledge.

23 12.06.2016COGS 523 - Bilge Say23 Corpus based register analysis Inclusion of a large number of texts Consideration of a wide range of linguistic features Comparison across registers These requirements strengthen the applicability of a corpus based approach.

24 12.06.2016COGS 523 - Bilge Say24 Research Questions How do spoken and written registers differ in their use of dependent clauses? What patterns in the use of linguistic features are important in distinguishing among the major spoken and written registers? How do texts from different academic disciplines vary with respect to patterns of linguistic variation? How do the internal sections of texts within a single academic register vary linguistically?

25 12.06.2016COGS 523 - Bilge Say25 Dependent Clause Use Are all kinds of dependent clauses functionally similar, that is representing structural elaboration and complexity? Previous studies: Written registers are generally more structurally elaborated than spoken ones Distribution of three kinds of dependent clauses: Relative clauses Adverbial clauses Complement clauses

26 12.06.2016COGS 523 - Bilge Say26 Illustrative Analysis Two written registers from LOB corpus, (80 academic prose, 14 official documents); two spoken registers from London-Lund (44 conversations, 14 prepared speeches) 478.000 words Semi?-automatic counting based POS- tagged text Only causative adverbial clauses are counted

27 Register Number of texts Relative clauses Causative adverbial subordinate clauses that-comp. clauses Academic Prose806.80.33.2 Official Documents148.60.11.6 Conversations442.93.54.1 Prepared Speeches147.91.67.6 Table 6.1 Average frequencies of three dependent types (per 1000 words) in four registers (Biber et al., 1998)

28 12.06.2016COGS 523 - Bilge Say28 Comments Academic prose, official documents and prepared speeches are often focused on conveying information about particular referents in the text. Conversations, is more concerned with the interaction among participants, and concerns with causes and reasons. That-complement clauses mark the stance of the writer or reader (eg. With verbs such as think, wish, hope). Taking all dependent clauses as one big category or making generalizations based on one type only is dangerous.

29 Text sample 6.2: Conversation I wouldn’t want it before the end of June anyhow Reynard because I’m going to Madrid on the tenth... I rushed into the kitchen because I smelt something was burning... Text sample 6.3: Prepared speeches There are many people who think that to be a Christian is to lead a soft option in life... We would hope that our students would have a full understanding of the cultural differences...

30 12.06.2016COGS 523 - Bilge Say30 Importance of enough type and token frequencies Having too few text samples can lead to dramatically inaccurate conclusions. Following sample, J30, from LOB has 25 relative clauses per 1000 words – that is four times greater than the average for academic prose- and no that clauses at all (register average 3.2 per 1000 words)

31 Text sample 6.4: LOB Corpus Academic Prose. J30 Most Vale people also have kin ties with people who live in these areas and in other parts of south Wales with whom they maintain effective social relations. A larger number of Vale people who do not work in the urban areas neverthless visit them fairly regularly to see friends and relatives who live there or who are in hospital there...

32 12.06.2016COGS 523 - Bilge Say32 Co-occurence patterns in linguistic features In samples below, fragmented speech co-occur with second-person pronouns, modals, wh- complement clauses, whereas academic prose co-occurs with frequent nouns, nominalizations, passive constructions, extraposed constructions (e.g. it is possible that...) Multidimensional Analysis (MD) Factor Analysis for identifying sets of variables that are distributed in similar ways Count and normalize linguistic features in a representative corpus Each set of co-occuring linguistic features is called a “dimension”. Interpret the dimensions in terms of situational, social and cognitive functions, based on the assumption that co- occurence reflects shared function.

33 Text sample 6.5: Conversation What you’d have to do, you know, you tell him what you need to know, he’d be able to tell you how to do it. Text sample 6.6: Academic Prose As has been repeatedly shown cultural evolution is not a unilinear process and it is possible that under certain conditions a simpler social formation may emerge out of a more complex one.

34 12.06.2016COGS 523 - Bilge Say34 Illustrative Analysis 481 texts, 960,000 words LOB (written) and LLC (spoken) Sixteen major grammatical categories (Tense and aspect markers, place and time adverbials, pronouns, questions, nominal forms, passives, stative forms, modals etc.) Five major dimensions of variation were identified. Sets of features that occur in a complementary pattern (positive-negative) Functional interpretation based on analysis of the communicative function(s) and similarities and differences among the register with respect to that dimension. This section is based on Biber (1988)

35 12.06.2016COGS 523 - Bilge Say35 Features in parantheses are not used in the calculation of dimension scores.

36 12.06.2016COGS 523 - Bilge Say36 Functional Interpretations of Dimensions Dimension 1: Negative group – informational focus. Careful integration of information and precise word choice. Positive group: Involved, non-informational focus, related w. a primarily affective mode Primary purpose of the writer/speaker and production circumstances Dimension 2: Narrative vs non-narrative does not distinguish written-spoken registers

37 12.06.2016COGS 523 - Bilge Say37 Factor Scores Calculate dimension scores of each text, as well as calculation of mean dimension scores of registers

38 Mean scores of English dimension 1 for nine registers: “Involved versus information production” (F=119.9, p.<.0001, r 2 =84.3)

39 Mean scores of English dimension 2 for eleven registers: “Narrative versus non-narrative discourse” (F=32.3, p.<.0001, r 2 =60.8)

40 12.06.2016COGS 523 - Bilge Say40 English for Special Purposes MD study for English as a general background From Corpus of Writing in the Disciplines History is not as narrative as thought Subject matter affects linguistic realization

41 CategoryNo. of textsApprox. no. of words Ecology research articles (from Ecology, Journal of Ecology, and Journal of Animal Ecology)2064000 American history research articles (from The Journal of American History and The Western Historical Quarterly)2032000 Table 6.3 Composition of subcorpus of biology and history research articles

42 Mean scores of ecology and history research articles on Dimension 2, “Narrative versus non-narrative concerns”

43 Text sample 6.10: History research article Entertainer Josephine Baker posed a special problem for the government. During her international concert tours in the 1950s she harshly criticized American racism. The United States government could not restrict her travel by withdrawings her passport because she carried the passport of her adopted nation. France. Text sample 6.11:Ecology research article The effects of herbivores are potentially large and long lasting. How herbivores affect nutrient cycles in these forests is particularly important because nutrient availability is generally low, and changes in nurient availability are major factors driving succession. Furthermore, populations of boreal herbivores fluctuate drastically between years and decades...

44 Mean scores of ecology and history research articles on Dimension 5, “Impersonal versus non-impersonal style”

45 Mean scores of ecology research article sections on Dimension 5, “Impersonal versus non-impersonal style”

46 12.06.2016COGS 523 - Bilge Say46 Conclusion Corpus based methods could be adapted to even non-automaticized areas of languge studies. Congruent use of qualitative and quantitative methods

47 12.06.2016COGS 523 - Bilge Say47 Next Week Meyer (2002) Pseudotitles Chapter Invited Talk by Ruken Çakıcı – pls come at 9:40....


Download ppt "12.06.2016COGS 523 - Bilge Say1 Using Corpora for Language Research COGS 523-Lecture 9 Discource Characteristics and Register Variations."

Similar presentations


Ads by Google