Researching ESP Corpora: Issues in compilation and analysis Lynne Flowerdew.

Slides:



Advertisements
Similar presentations
Variation and regularities in translation: insights from multiple translation corpora Sara Castagnoli (University of Bologna at Forlì – University of Pisa)
Advertisements

Book Report Academic Writing for Graduate Students Essential Tasks and Skills (3 rd edition) Asst. Prof. Dr. Siriluck Usaha Department of English for Business.
Uses of a Corpus “[E]xplore actual patterns of language use”
Using Corpus Tools in Discourse Analysis Discourse and Pragmatics Week 12.
A Corpus-based Study of Discourse Features in Learners ’ Writing Development Yu-Hua Chen Lancaster University, UK.
A corpus-based study of lexical bundles in students‘ dissertations in Cameroon Prof Daniel A. Nkemleke Department of English Ecole Normale Supérieure University.
Recent Developments in Technological Tools for the Purpose of Facilitating SLA.
What is a corpus?* A corpus is defined in terms of  form  purpose The word corpus is used to describe a collection of examples of language collected.
Compiling a corpus II. Corpus A finite size, non random collection of naturally occurring language, in a computer readable form. Non-random = representative.
Analysing and interpreting cognitive interview data: a qualitative approach.
Corpus Linguistics. What is corpus linguistics? Method / Theory in Linguistics Analysis of collections of texts (corpora) Verifying/ Strengthening or.
Pedagogic uses of a corpus of student writing and their implications for sampling and annotation Alois Heuboeck University of Reading, UK.
Ideology and Translation. Definitions The set of factual and evaluative beliefs – that is, the knowledge and the opinions – of a group (van Dijk in Calzada.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Huimin Ye.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.
Presented by Jennifer Robison TexTESOL II March 12, 2010 San Antonio, TX.
Corpus Linguistics What can a corpus tell us ? Levels of information range from simple word lists to catalogues of complex grammatical structures and.
Research methods in corpus linguistics Xiaofei Lu.
14: THE TEACHING OF GRAMMAR  Should grammar be taught?  When? How? Why?  Grammar teaching: Any strategies conducted in order to help learners understand,
Metaphor Analysis in Social Science: The problem Lynne Cameron and Rob Maslen.
Multimodality and Activity Theory Methodological issues in their combination Dr. Mohammed Alhuthali Taif University, Saudi Arabia
Introducing small-group workshops as formative assessment in large first year psychology modules Suzanne Guerin School of Psychology, University College.
McEnery, T., Xiao, R. and Y.Tono Corpus-based language studies. Routledge. Unit A 2. Representativeness, balance and sampling (pp13-21)
What is discourse analysis?
Claudia Marzi Institute for Computational Linguistics, “Antonio Zampolli” – Italian National Research Council University of Pavia – Dept. of Theoretical.
Translation Studies 8. Research methods in Translation Studies Krisztina Károly, Spring, 2006 Sources: Károly, 2002; Klaudy, 2003.
Reflections on Using Corpora Data in EFL Teaching CHEN BO Chongqing Jiaotong University 2006.
How specific should we be?
Researching language with computers Paul Thompson.
Study Designs Afshin Ostovar Bushehr University of Medical Sciences Bushehr, /4/20151.
ENG 626 CORPUS APPROACHES TO LANGUAGE STUDIES exploring frequencies in texts Bambang Kaswanti Purwo
MY E-PORFOLIO. ¨Evaluation¨… What I know…What I want to know…What I learned… -Process/formative vs product/summative evaluation -Necessary to make changes.
Developing the language skills: reading Dr. Abdelrahim Hamid Mugaddam.
Tracking Language Development with Learner Corpora Xiaofei Lu CALPER 2010 Summer Workshop July 12, 2010.
Seminar in Applied Corpus Linguistics: Introduction APLNG 597A Xiaofei Lu August 26, 2009.
Creswell Qualitative Inquiry 2e 11.1 Chapter 11 Turning the Story and Conclusion.
Discourse and Genre. What is Genre? Genre – is an activity that people engage in through the use of language. Two types of genre 1. Spoken genres – academic.
The text-linguistic model of translation maintains that an original text and a translation are different not only because their sentences are different.
Corpus approaches to discourse
Corpus Linguistics in Research Doctorate in Education University of Warwick 6th November 2008.
1 Branches of Linguistics. 2 Branches of linguistics Linguists are engaged in a multiplicity of studies, some of which bear little direct relationship.
Engaging with data Choices and decisions. Seeing or looking at? The advance of corpus linguistics has certainly changed the way that we can look at our.
Introduction Chapter 1 Foundations of statistical natural language processing.
This multimedia product and its contents are protected under copyright law. The following are prohibited by law: any public performance or display, including.
Automatic acquisition for low frequency lexical items Nuria Bel, Sergio Espeja, Montserrat Marimon.
Qualitative and Quantitative Approaches to Comparative Research Anthony Sealey University of Toronto This material is distributed under an Attribution-NonCommercial-ShareAlike.
Differences between Spoken and Written Discourse
Corpus Linguistics MOHAMMAD ALIPOUR ISLAMIC AZAD UNIVERSITY, AHVAZ BRANCH.
Discuss how researchers analyze data obtained in observational research.
Discourse Analysis Week 10 Riggenbach (1999) Chapter 1 - Quotes.
Discourse analysis May 2012 Carina Jahani
What is a Corpus? What is not a corpus?  the Web  collection of citations  a text Definition of a corpus “A corpus is a collection of pieces of language.
Grounded theory, discourse analysis and hermeneutics Part Two – Discourse Analysis ERPM001 Interpretive Methodologies Dr Alexandra Allan.
INTRODUCTION TO APPLIED LINGUISTICS
PRIMENJENA LINGVISTIKA I NASTAVA JEZIKA II 3 rd class.
Automatic Writing Evaluation
Qualitative Data Analysis
The vocabulary of academic speaking: an interdisciplinary perspective
Introduction to Corpus Linguistics
Six Common Qualitative Research Designs

Computational and Statistical Methods for Corpus Analysis: Overview
Making Connections: guidance on non-exam assessment
Introduction to Corpus Linguistics: Exploring Collocation
Corpus Linguistics I ENG 617
Qualitative vs. Quantitative research
If and only if…: a corpus-based investigation of lexical bundles use by expert and novice mathematics writers By Abdullah Alasmary Assistant professor.
Using GOLD to Tracking L2 Development
Applied Linguistics Chapter Four: Corpus Linguistics
Presentation transcript:

Researching ESP Corpora: Issues in compilation and analysis Lynne Flowerdew

2 Outline Compilation Size Representativeness balance Analysis and interpretation Units for linguistic analysis Top-down vs. bottom-up analysis Role of context in interpretation of corpus data

3 Compilation Size Commonly held view − the larger the better “…a corpus should be as large as possible and keep on growing” (Sinclair 1991: 18) “…it is important to have a substantial corpus if you want to make claims based on statistical frequency” (Bowker & Pearson 2002: 48)

4 Compilation But size of corpus highly dependent on phenomenon one is investigating (de Haan 1992) lower the frequency of the feature under investigation, larger the corpus (McEnery & Wilson 2001: 154) Smaller corpora can be used for investigating more common features of language (Biber 1990) Different picture for ESP corpora (see Flowerdew 2004; Hunston 2009; Koester 2010 for pointers on building small, specialised corpora)

5 Compilation General vs. ESP corpora (Sinclair 2005 : 16) LOBHK of CS % Number of different word forms (types) % Number that occur once only % Number that occur twice only % Twenty times or more % 200 times or more471687(69%)

6 Compilation Representativeness Specialised corpora do not exhibit as much internal variation as general corpora Greater variation in the corpus text, the more samples and larger corpus required to ensure representativeness (Meyer 2002) “We should always bear in mind that the assumption of representativeness must be regarded as an act of faith, as at present we have no means of ensuring it, or even evaluating it objectively” (Tognini-Bonelli 2001: 57)

7 Compilation Corpus of EIA (Environmental Impact Assessment) reports 60 reports, approx. 225,000 words Selected on basis represent 23 different environmental consulting companies Impossible to select equal number of reports from each of companies; “convenience sampling” (Meyer 2002) Larger the company, more reports catalogued in library; distribution seen as reflecting size and importance of company

8 Compilation Corpus of EIA reports Balancedness Balanced corpus would consist of the same amount of text from each of the 23 companies If EIA reports from different companies were of different lengths then balancing the corpus in terms of number of texts would lead to an imbalance in terms of number of tokens

9 Compilation Pragmatic considerations Size balanced against level of delicacy of investigation (Kennedy 1998) my investigation primarily qualitative (phraseologies of keywords for P-S pattern) Investigation is of key vocabulary items – 225,000 words deemed sufficient

10 Analysis Units for linguistic analysis Frequency (Kennedy 1998) Keywords (Bondi 2001; Flowerdew 2008; Mudraya 2006; Nelson 2006) Lexical bundles (Biber et al. 2004; Hyland 2007, 2008) Corpus set up lexically rather than grammatically (Halliday 2004)

11 Analysis Comparison of ESP corpora with Coxhead’s AWL Disciplinary differences Hyland & Tse 2007; Chen & Ge 2007; Martinez et al Common core of academic vocabulary (AWL) Paquot 2010; Simpson-Vlach & Ellis 2010

12 Analysis Corpus analysis driven by type of software WMatrix (Rayson 2008) –Classifies vocabulary into semantic fields (Ali Mohamed 2007) ConcGram (Greaves 2009) –Finds sets of words that co-occur (e.g. AB; A*B), allowing up to 12 slots for constituency variation –Searches for positional variation (e.g. AB; BA) –Only a few studies (Cheng 2009; Durrant 2009; Milizia & Spinzi 2009; Warren 2011)

13 Analysis My corpus of EIA reports WordSmith Tools for keyword extraction (Scott 1999) Then manually classified lexico-grammar of keywords into causal / non-causal categories 1.The export scheme will create a noise problem 2.In order to alleviate the problem of noise… 3.Severe traffic noise problems already exist in.. WMatrix automatic identification of causal categories & CongGram for positional variation (e.g. problem of noise)

14 Analysis Top-down vs. bottom-up In the ‘top-down’ approach, the functional components of a genre are determined first and then all texts in a corpus are analysed in terms of these components. In contrast, textual components emerge from the corpus analysis in the ‘bottom-up’ approach, and the discourse organization of individual texts is then analysed in terms of linguistically-defined textual categories. (Biber, Connor & Upton 2007a: 11)

15 Analysis Bottom-up starting point Phraseology of preposition “in” in cancer research articles (Gledhill 2000) Politeness strategies in two moves in job application letters (Upton & Connor 2001) Verb-noun collocations in 4 moves in law cases (Bhatia et al. 2004) Phraseology of “research” in moves in PhD literature reviews (J. Flowerdew & Forest 2010)

16 Analysis Top-down starting point Kanoksilapatham’s (2007) corpus study of biochemistry research articles; first developed analytical framework through identifying moves In reality, many studies toggle between the two (Charles 2006) Different starting points yield different results (Biber, Connor & Upton 2007b)

17 Analysis Corpus of EIA reports Devised a coding system to account for 3 different levels of text –Macrostructure (Intro., Body, Concl.) –Problem – Solution elements –Discourse-based moves (e.g. ; ) Different phraseologies for different sections –…to assess in detail the environmental impacts of … –..in order to reduce potential noise impacts.

18 Role of Context in Interpretation Genre perspective Goal-driven communicative event associated with particular discourse communities and disciplines Handford (2010a) asks “how can we relate the specific instance (such as text, discourse move or lexico- grammatical item) to the wider social context within which it occurs … Is it possible to interpret the corpus data as a reflection of the context, or conversely, is it possible to rely on contextual features for interpretation of the corpus data? (Flowerdew 2011)

19 Interpretation Stubbs (2001a, 2001b) argues that conventional view that context-sensitive pragmatic markers meanings are usually inferred by speaker / hearer may be overstated; large-scale corpus studies show pragmatic meanings can be conventionally encoded in linguistic form Tognini Bonelli (2004) considers it possible to “read off” discursive practices of a discourse community from recurring multiple concordance lines.

20 Interpretation Corpus of EIA reports 1.The problems associated with continued pollution… 2.Health hazards associated with proximity to high tension power lines… 3.It is expected that there will be no significant residual impacts… 4.Works at the tunnel portal will create a noise problem…

21 Interpretation Discursive practices vs. strategies (Handford 2010b) Discursive practices: signify recurrent patterns of linguistic behaviour and “tie the communication to the wider social context” Strategies: “merely describe what the individual is trying to achieve within the particular speech event” Widdowson (2004: 60) points out difficult to assign pragmatic significance to phraseologies in one particular text.

22 Interpretation Interpret data related to strategies with reference to not only other co-textual features but also to external contextual information. Ethnographic perspective sometimes needed for interpretation of context-dependent pragmatically oriented features Widdowson (2000: 60) remarks that corpus-based methods focus on the text as product and ‘cannot account for complex interplay of linguistic and contextual factors whereby discourse is enacted’.

23 Conclusion No “tailor-made” corpora for teaching (Leech 2008); no “perfect” corpora for research Corpus linguistic techniques one of approaches (ethnographic dimension) Corpora are now being used in other applied linguistics areas: textlinguistics, genre analysis, CDA, sociolinguistics, SLA (Flowerdew in press, 2011a, b)

24 Thank You!