Using an Enhanced MDA Model in study of World Englishes Richard Xiao

Slides:



Advertisements
Similar presentations
Corpora in grammatical studies
Advertisements

Diachronic study and language change Corpus Linguistics Richard Xiao
Corpora in language variation studies
Using an enhanced MDA model in study of World Englishes
1 © 2006 Curriculum K-12 Directorate, NSW Department of Education and Training Implementing English K-6 Using the syllabus for consistency of teacher judgement.
Diachronic study and language change Corpus Linguistics Richard Xiao
Uses of a Corpus “[E]xplore actual patterns of language use”
Using Corpus Tools in Discourse Analysis Discourse and Pragmatics Week 12.
Corpus design See G Kennedy, Introduction to Corpus Linguistics, Ch.2
Verbs Longman Student Grammar of Spoken and Written English Biber; Conrad; Leech (2009, p ) Verbs provide the focal point of the clause. The main.
Dr. Daniel A. Nkemleke Department of English Ecole Normale Supérieure
What is VOICE? VOICE, the Vienna-Oxford International Corpus of English, is a structured collection of language data, the first computer-readable corpus.
Chapter 4 Basics of English Grammar
Words Words Words! Helping ELL Students Develop Vocabulary.
Word Order Choices Chapter 12
A Corpus-based Study of Discourse Features in Learners ’ Writing Development Yu-Hua Chen Lancaster University, UK.
LIN 3098 – Corpus Linguistics Albert Gatt. In this lecture  Corpora for the study of genre/register variation revisit the concept of representativeness.
User and Task Analysis Howell Istance Department of Computer Science De Montfort University.
1/26 Corpus Linguistics. 2/26 Varieties of English Relevance of corpus linguistics to this course –Previously studies of stylistics were largely informal.
Corpus 06 Discourse Characteristics. Reasons why discourse studies are not corpus-based: 1. Many discourse features cannot be identified automatically.
Corpus Linguistics Lexicography. Questions for lexicography in corpus linguistics How common are different words? How common are the different senese.
Chapter 7 Correlational Research Gay, Mills, and Airasian
Daniel Nkemleke, Humboldt Kolleg Kamerun, 30/07/2008 Corpus Linguistics and Language Education: Development and Utility of the Corpus of Cameroon English.
Corpus Linguistics What can a corpus tell us ? Levels of information range from simple word lists to catalogues of complex grammatical structures and.
Corpus Linguistics Case study 2 Grammatical studies based on morphemes or words. G Kennedy (1998) An introduction to corpus linguistics, London: Longman,
Chapter 2 Words and word classes.
Memory Strategy – Using Mental Images
The ‘London Corpora’ projects - the benefits of hindsight - some lessons for diachronic corpus design Sean Wallis Survey of English Usage University College.
Wannapa Trakulkasemsuk A Comparative Analysis of English Feature Articles in Magazines Published in Thailand and Britain : Linguistic Aspects.
GRAMMAR APPROACH By: Katherine Marzán Concepción EDUC 413 Prof. Evelyn Lugo.
Preparing for the A2 exam Summer 2014 English Language B.
McEnery, T., Xiao, R. and Y.Tono Corpus-based language studies. Routledge. Unit A 2. Representativeness, balance and sampling (pp13-21)
Objectives To introduce you to: Key principles behind the new curriculum A practical procedure for designing lessons for Non- Language Arts Electives.
Representatıvness, balance and samplıng ın a corpus Lınguistıcs.
1 Corpora: Annotating and Searching LING 5200 Computational Corpus Linguistics Martha Palmer.
Change in Style: A Multi-Dimensional Approach John C. Paolillo SCAN Research Group Meeting October 4, 2002.
CRESST ONR/NETC Meetings, July 2003, v1 ONR Advanced Distributed Learning Linguistic Modification of Test Items Jamal Abedi University of California,
Adverbials Chapter 11 Longman Student Grammar of Spoken and Written English Biber; Conrad; Leech (2009, p )
Phrases and Clauses L/O: to revise/learn how to analyse larger units of language – phrases and clauses to revise/learn how to analyse larger units of language.
ASSIGNMENT: Text Types
A Corpora Study of English Dimension Adjectives In Academic Speaking and Writing Chen Shengwei International College Zhejiang Forestry University 陈声威 浙江林学院国际教育学院.
Ideas for 100K Word Data Set for Human and Machine Learning Lori Levin Alon Lavie Jaime Carbonell Language Technologies Institute Carnegie Mellon University.
HYMES (1964) He developed the concept that culture, language and social context are clearly interrelated and strongly rejected the idea of viewing language.
Translation Studies 9. The use of corpora in TS Krisztina Károly, Spring, 2006 Sources: Olohan, 2004; Tirkkonen-Condit, 2005.
1 And yeah, it was really good! Positive stance in native and learner speech Sylive De Cock Centre for English Corpus Linguistics Université catholique.
RESEARCH DESIGN & CORPUS COMPILATION. Corpus design is intrinsic and a fundamental part of the analysis. It is guided by the RQ and affects the results.
$100 $200 $300 $400 $500 $100 $200 $300 $400 $500 $100 $200 $300 $400 $500 $100 $200 $300 $400 $500 $100 $200 $300 $400 $500 $100.
C HAPTER 11 Grammar Fundamentals. T HE P ARTS OF S PEECH AND T HEIR F UNCTIONS Nouns name people, places things, qualities, or conditions Subject of a.
Engaging with data Choices and decisions. Seeing or looking at? The advance of corpus linguistics has certainly changed the way that we can look at our.
New Englishes. Global English  ‘[…] the English language ceased to be the sole possession of the English some time ago’ (Rushdie, 1991)  Loss of ownership.
Corpus search What are the most common words in English
Levels of Linguistic Analysis
Rebalancing corpora Disentangling effects of unstratified sampling and multiple variables in corpus data Sean Wallis Survey of English Usage University.
Differences between Spoken and Written Discourse
GCSE English Language 8700 GCSE English Literature 8702 A two year course focused on the development of skills in reading, writing and speaking and listening.
Topic The common errors in usage of written cohesive devices among secondary school Malaysian learners of English of intermediate proficiency.
Text type variation: Biber’s approach Andrew Hardie LING306.
Approaches to teaching English The differences between EAP and General EFL Louis Rogers.
Use of Concordancers A corpus (plural corpora) – a large collection of texts, written or spoken, stored on a computer. A concordancer – a computer programme.
PRIMENJENA LINGVISTIKA I NASTAVA JEZIKA II 3 rd class.
English Hub School networks A-level English Language
CORPUS LINGUISTICS Corpus linguistics is the study of language as expressed in samples (corpora) or "real world" text. An approach to derive at a set of.
Making Connections: guidance on non-exam assessment
Adverbials (focus on stance)
Stylistics and Stylometry
Levels of Linguistic Analysis
Applied Linguistics Chapter Four: Corpus Linguistics
Register variation: correlation, clusters and factors
Differences between written and spoken discourse
Presentation transcript:

Using an Enhanced MDA Model in study of World Englishes Richard Xiao

15/05/2015CRG, Lancaster University2 Overview of the talk Biber’s (1988) MF/MD analytical approach The enhanced multidimensional analysis (MDA) model Variation across 12 registers in 5 varieties of English in the ICE

15/05/2015CRG, Lancaster University3 Factor analysis The key to the multidimensional analysis approach A common data reduction method available in many standard statistics packages such as SPSS Reducing a large number of variables to a manageable set of underlying factors (“dimensions”) Extensively used in social sciences to identify clusters of inter-related variables

15/05/2015CRG, Lancaster University4 Biber’s MF/MD framework Established in Biber (1988): Variation across Speech and Writing (CUP) –Factor analysis of 67 functionally related linguistic features –481 text samples, amounting to 960,000 running words LOB London-Lund Brown corpus A collection of professional and personal letters

15/05/2015CRG, Lancaster University5 Biber’s MF/MD approach Biber’s seven factors / dimensions –Informational vs. involved production –Narrative vs. non-narrative concerns –Explicit vs. situation-dependent reference –Overt expression of persuasion –Abstract vs. non-abstract information –Online informational elaboration –Academic hedging

15/05/2015CRG, Lancaster University6 Biber’s MF/MD approach Influential and widely used –Synchronic analysis of specific registers / genres and author styles –Diachronic studies describing the evolution of registers –Register studies of non-Western languages and contrastive analyses –Research of University English and materials development –Move analysis and study of discourse structure …largely confined to grammatical categories

15/05/2015CRG, Lancaster University7 The enhanced MDA model Enhancing Biber’s MDA by incorporating semantic components with grammatical categories –Wmatrix = CLAWS + USAS –A total of 141 linguistic features investigated 109 features retained in the final model –Five million words in 2,500 text samples, with one million words in 500 samples for each of the 5 varieties of English ICE – GB, HK, India, Singapore, the Philippines 300 spoken written samples 12 registers ranging from private conversation to academic writing

15/05/2015CRG, Lancaster University8 ICE registers and proportions S1A (20%)Spoken – Private S1B (16%)Spoken – Public S2A (14%)Spoken – Monologue – Unscripted S2B (10%)Spoken – Monologue – Scripted W1A (4%)Written – Non-printed – Non-professional writing W1B (6%)Written – Non-printed – Correspondence W2A (8%)Written – Printed – Academic writing W2B (8%)Written – Printed – Non-academic writing W2C (4%)Written – Printed – Reportage W2D (4%)Written – Printed – Instructional writing W2E (2%)Written – Printed – Persuasive writing W2F (4%)Written – Printed – Creative writing

15/05/2015CRG, Lancaster University9 141 linguistic features covered A) Nouns: 21 categories, e.g. –nominalisation, other nouns; 19 semantic classes of nouns (e.g. evaluations, speech acts) B) Verbs: 28 categories, e.g. –Do as pro-verb, be as main verb, tense and aspect markers, modals, passives, 16 semantic categories of verbs C) Pronouns: 10 categories, e.g. –Person, case, demonstrative D) Adjectives: 11 categories, e.g. –Attributive vs. predicative use, 9 semantic categories

15/05/2015CRG, Lancaster University linguistic features covered E) Adverbs: 7 categories F) Prepositions (2 categories) G) Subordination (3 categories) H) Coordination (2 categories) I) WH-questions / clauses (2 categories) J) Nominal post-modifying clauses (5 categories) K) THAT-complement clauses (3 categories) L) Infinitive clauses (3 categories) M) Participle clauses (2 categories) N) Reduced forms and dispreferred structures (4 categories) O) Lexical and structural complexity (3 categories)

15/05/2015CRG, Lancaster University Linguistic features covered P) Quantifiers (4 categories) Q) Time expressions (11 categories) R) Degree expressions (8 categories) S) Negation (2 categories) T) Power relationship (4 categories) U) Definiteness (2 categories) V) Helping/hindrance (2 categories) X) Linear order (1 category) Y) Seem / Appear (1 category) Z) Discourse bin (1 category)

15/05/2015CRG, Lancaster University12 Procedure of data analysis 1) Data clean-up 2) Grammatical and semantic tagging with Wmatrix 3) Extracting the frequencies of 141 linguistic features from 2,500 corpus files 4) Building a profile of normalised frequencies (per 1,000 words) for each linguistic feature 5) Factor analysis –Factor extraction (Principal Factor Analysis) –Factor rotation (Pramax) –Optimum structure: 9 factors 6) Interpreting extracted factors 7) Computing factor scores 8) Using the enhanced MDA model in exploration of variation across registers and language varieties

15/05/2015CRG, Lancaster University13 The enhanced MDA model Nine factors established in the new model –1) Interactive casual discourse vs. informative elaborate discourse –2) Elaborative online evaluation –3) Narrative concern –4) Human vs. object description –5) Future projection –6) Personal impression and judgement –7) Lack of temporal / locative focus –8) Concern with degree and quantity –9) Concern with reported speech Robustness of the model in register analysis

15/05/2015CRG, Lancaster University14 1) Interactive casual discourse vs. informative elaborate discourse Private conversation is most interactive and casual Academic writing is most informative and elaborate Spoken registers are generally more interactive and less elaborate than written registers F= p< R 2 =77.4%

15/05/2015CRG, Lancaster University15 2) Elaborative online evaluation Public dialogue (e.g. broadcast discussion / interview, parliamentary debate) has the most prominent focus on elaborative online evaluation Unscripted monologue also involves a high level of elaborative online evaluation Persuasive writing may relate to elaborative evaluation but is not restricted by real- time production Private conversation is least elaborative even if the evaluation is made online Evaluation is not a concern in creative writing F= p< R 2 =31.1%

15/05/2015CRG, Lancaster University16 3) Narrative concern Unscripted monologue (e.g. demonstrations, presentations, commentaries) has a narrative concern Unsurprisingly, creative writing is also narrative Not a concern in academic writing, non-professional writing (student essays and exam scripts), and instructional writing F= p< R 2 =37.3%

15/05/2015CRG, Lancaster University17 4) Human vs. object description Private conversation is most likely to have a focus on people Correspondence (social letters and business letters) also involves human description Instructional writing tends to give concrete descriptions of objects Academic and non-academic writings can also be concrete when an object or substance is described F=44.03 p< R 2 =16.3%

15/05/2015CRG, Lancaster University18 5) Future projection Persuasive writing (e.g. press editorials, trying to influence people’s future attitudes and actions) has the most prominent focus on future projection Correspondence and public dialogue also involve future projection to varying extents Academic writing (timeless truth?) is least concerned with future projection F=28.10 p< R 2 =11.1%

15/05/2015CRG, Lancaster University19 6) Personal impression / judgement Factor score of creative writing is by far greater than any other register –Frequent use of possessive and reflective pronouns, as well as adjectives of judgement / appearance Instructional writing, private conversation, and student essays display low scores –They do not have a focus on personal impression and judgement Scripted and unscripted monologue, public dialogue and news reportage also tend to avoid expressions of personal impression and judgement F= p< R 2 =35.8%

15/05/2015CRG, Lancaster University20 7) Lack of temporal / locative focus Student essays and persuasive writing do not have a temporal / locative focus (not concerned with concepts such as when, how long, and where) Such specific information is of vital importance in correspondence (social and business letters) F=89.55 p< R 2 =28.4%)

15/05/2015CRG, Lancaster University21 8) Concern with degree / quantity Non-academic popular writing has the greatest concern of degree and quantity Persuasive writing also displays a high propensity for expressions of degree and quantity Such expressions tend to be avoided in instructional writing (e.g. administrative documents) and correspondence F=19.33 p< R 2 =7.9%

15/05/2015CRG, Lancaster University22 9) Concern with reported speech News reportage has the greatest concern with reported speech (both direct and indirect speech) Reported speech is also very common in creative writing (fictional dialogue) Instructional writing and academic prose do not appear to have a concern with reported speech F=80.02 p< R 2 =26.1%

15/05/2015CRG, Lancaster University23 12 registers along 9 factors Factor 1 is the dimension along which the 12 registers demonstrate the sharpest contrasts –Interactive casual discourse vs. informative elaborate discourse: a fundamental aspect of variation across registers Robustness of the model

15/05/2015CRG, Lancaster University24 5 English varieties across 9 factors Both differences and similarities This general picture may blur many register-based subtleties –Language can vary across registers even more substantially than across language varieties (cf. Biber 1995)

15/05/2015CRG, Lancaster University25 1) Interactive casual discourse vs. informative elaborate discourse Indian English displays the lowest score in nearly all registers - it is less interactive but more elaborate –Sanyal (2007): “clumsy Victorian English [that] hangs like a dead Albatross around each educated Indian’s neck” Modern BrE appears to be most interactive and least elaborate (e.g. S1A, S1B, W2D) 3 varieties of English used in East and Southeast Asia are very similar F=9.04, 4 d.f. p<0.001

15/05/2015CRG, Lancaster University26 2) Elaborative online evaluation BrE generally shows a higher score than non-native varieties of English (e.g. W2A, W1B, S2B) Non-native English varieties tend to be very close in most registers F= d.f. p<0.001

15/05/2015CRG, Lancaster University27 3) Narrative concern BrE demonstrates a greater propensity for narrative concern –Most noticeably in news reportage (W2C) and instructional writing (W2D) Indian English is least concerned with narrative –Esp. in registers like correspondence (W1B), instructional writing (W2D), and unscripted monologue (S2A) F= d.f. p<0.001

15/05/2015CRG, Lancaster University28 4) Human vs. object description Very close in a number of registers (e.g. S2B, W1B, W2E) Indian English and BrE show similarity in a greater range of registers HK and Singapore Englishes display great similarity (except W1A) Creative writing (W2F) is very similar in non-native varieties of English F= d.f. p<0.001

15/05/2015CRG, Lancaster University29 5) Future projection BrE has the highest score in all printed written registers (W2A–W2F) Indian English shows the lowest score in nearly all registers F= d.f. p<0.001

15/05/2015CRG, Lancaster University30 6) Personal impression / judgement Very similar in many registers…with most noticeable differences in non- printed written registers (W1A, W1B), non-academic writing (W2B), and news reportage (W2C) HK English displays a distribution pattern similar to Singapore English in spoken registers (S1A–S2B) and unpublished written registers (W1A, W1B), but it is very close to Philippine English in printed writing (W2A–W2F) F= d.f. p<0.001

15/05/2015CRG, Lancaster University31 7) Lack of temporal / locative focus Overall difference is not significant statistically –…but there are noticeable differences in some registers (e.g. W1B, W2D) Interestingly, Indian English demonstrates a consistently higher score in spoken registers (S1A-S2B) –…but a lower score in unpublished writing (e.g. W1B) F= d.f. p=0.058

15/05/2015CRG, Lancaster University32 8) Concern with degree / quantity BrE generally displays a higher score in nearly all registers HK English does not appear to be concerned with degree and quantity (e.g. W2D) Similarly Indian English also lacks a focus on degree and quantity (e.g. W1B) F= d.f. p<0.001

15/05/2015CRG, Lancaster University33 9) Concern with reported speech Overall difference is not significant …in spite of noticeable difference in news reportage (W2C) –East and Southeast Asian English varieties show a greater propensity for concern with reported speech than BrE and Indian English F= d.f. p=0.196

15/05/2015CRG, Lancaster University34 Summary and future research Summary –Seeking to enhance Biber’s MDA model with semantic components –Introducing the new model in research of World Englishes Directions for future research –More native English varieties from the Inner Circle –A wider and more balanced coverage of geographical regions –Including socio-culturally relevant semantic categories –Making “sense” of corpus finding by combining corpora and more traditional resources in socio-cultural studies and historical research …adequately descriptive + sufficiently explanatory…

15/05/2015CRG, Lancaster University35 Thank you!