Lexico-Grammatical Patterns in English Scientific Abstracts: presenting the research’s purposes and results Carmen Dayrell Stella Tagnin DLM Arnaldo Candido Jr. Sandra Aluísio ICMC / NILC ELC 2010
English for Academic Purposes Academic communication poses real challenges for novice researchers (Hyland 2009:ix) Demands are heavier for non-native speakers of English (Hyland 2009:5, Milton and Hyland 1999, Vold 2006) Difficulties relate to: lexical and syntactical features of the target genre rhetorical motivations behind linguistic choices Disciplinary variation Cultural differences across languages Context
Local Context Courses on English academic writing Writing tools for non-native speakers of English Context Assist graduate students to write scientific papers in English
Courses on English Academic Writing USP Department of Physics (IFSC) Department of Pharmaceutical Sciences (FCF) Department of Computer Science (ICMC) UNESP IBILCE Dentistry and Biology/Genetics UFSCar Department of Biology/Genetics Context 2004 to 2010
Writing tools: Scipo-Farmácia ( Abstracts Context Gap Background Purpose Methodology Results Conclusion
Writing tools: Scipo-Farmácia ( Context Examples from published abstracts
Why Abstracts? Relevant in various academic contexts Context In Brazil: Abstracts are part of most research papers written in Portuguese as well as PhD and master’s dissertations However … (Swales & Feak 2009: xiii) Constructing an efficient, clear abstract is a fairly difficult task, even for experienced and widely published writers
General Objective Investigate the potential differences between English abstracts written by Brazilian graduate students vis-à-vis abstracts taken from published papers from the same disciplines Purpose
Aim of this study To investigate the recurring lexico-grammatical patterns used for presenting either the purposes or results of the research Purpose
Rhetorical ‘moves’ in abstracts Swales and Feak (2009: 5) Purpose Background / Introduction Purpose Methods / Materials / Subjects/ Procedures Results / Findings Discussion / Conclusion / Implications / Recommendations
Lexico-grammatical patterns The AIM of this STUDY the the present Purpose aim purpose objective goal aims objectives purposes study work investigation article research project paper
Student Abstracts Physical Sciences and Engineering Life and Health Sciences ST-EXA ST-BIO 169 abstracts 34,131 tokens 169 abstracts 34,131 tokens Abstracts: Tokens: Average Number Words (ANW): 202 Corpora
Student Abstracts Physical Sciences and Engineering ST-BIO 169 abstracts 34,131 tokens 169 abstracts 34,131 tokens Disciplines# texts Physics85 Computing46 Earth Sciences 20 Engineering Disciplines# texts Dentistry47 Pharmaceutical Scs. 39 Biology21 Biophysics21 Bioengineering5 Biomedical Scs Corpora Life and Health Sciences
English Abstracts Physical Sciences and Engineering ST-BIO 169 abstracts 34,131 tokens 169 abstracts 34,131 tokens DisciplinesSTPB Physics85425 Computing46230 Earth Sciences Engineering DisciplinesSTPB Dentistry47235 Pharmaceutical Scs Biology21105 Biophysics21105 Bioengineering525 Biomedical Scs Corpora Life and Health Sciences
Published Abstracts Physical Sciences and Engineering Life and Health Sciences PB-EXA PB-BIO 169 abstracts 34,131 tokens 169 abstracts 34,131 tokens Abstracts Tokens Average Number Words (ANW) Corpora
Published Abstracts Taken from papers published by various leading academic journals (CAPES - QUALIS A) Preference given to authors affiliated to universities in English speaking countries 169 abstracts 34,131 tokens 169 abstracts 34,131 tokens Corpora
Methodology 1.Identification of rhetorical moves 2.Identification and comparison of lexico-grammatical patterns in ‘purposes’ and ‘results’ Methods
1. Identifying Rhetorical Moves a) Automatic tagging AZEA (Argumentative Zoning for English Abstracts) (Genovês et al. 2007) Methods Background Gap Purpose Methodology Result Conclusion a corpus-based machine learning system PURPOSE: to automatically identify components of the schematic structure of scientific abstracts in English AZEA achieved 80.4% accuracy (kappa 0.73) using a very small training corpus
AZEA’s features Basic Features 1. Sentence Length 2. Position within the abstract 3-5. Verb Tense, Voice and Modal 6. Previous Component 7-8. Formulaic patterns Methods 14 additional features to distinguish between Results and Methods and improve accuracy
Azea-Web Methods
Azea-Web Methods
1a. AZEA tagging We propose a Local-Density approximation to calculate the entanglement entropy of the inhomogeneous one-dimensional Hubbard model. Such inhomogeneity can be due to the finite size, the presence of impurities, or the periodic variation of the interaction and the external potential, as in superlattices. We show that, to inhomogeneities due to finite size, our approximation reproduces the know thermodynamic limit and also the limit of the entanglement entropy in n=1, obtained by Cardy and Calabrese. Methods
1b. Manual Validation We propose a Local-Density approximation to calculate the entanglement entropy of the inhomogeneous one-dimensional Hubbard model. Such inhomogeneity can be due to the finite size, the presence of impurities, or the periodic variation of the interaction and the external potential, as in superlattices. We show that, to inhomogeneities due to finite size, our approximation reproduces the know thermodynamic limit and also the limit of the entanglement entropy in n=1, obtained by Cardy and Calabrese. Methods
Manual Tagging: Correcting sentence break We find aRb/aNa=1.959(5), aK/aNa =1.786(6), and aRb/aK=1.097(5). Methods We find aRb/aNa=1.959(5), aK/aNa =1.786(6), and aRb/aK=1.097 (5).
Manual Tagging: multi-labels Using whole-cell rapid-agonist application techniques and the cell-attached single-channel recording configuration, we examined human 5-HT3A(QDA) receptors expressed in human embryonic kidney 293 cells. Methods Using whole-cell rapid-agonist application techniques and the cell-attached single-channel recording configuration, we examined human 5-HT3A(QDA) receptors expressed in human embryonic kidney 293 cells.
Lexico-grammatical patterns 1.Semi-automatic identification of patterns: Wordsmith Tools 5 (Scott 2007) Starting point: Most frequent items and cluster in each corpus Analysis of the surrounding context Patterns should occur at least once per 10,000 words in either corpus 2.Comparison of frequencies Statistical test of significance Methods
Overall … Significant differences: Between student and published abstracts Across the two broad areas Results
PURPOSE: Life and Health Sciences (BIO) The AIM of this STUDY the present the our Results aim objective purpose aims objectives aim purpose objective goal aims Objectives purposes intent study work review paper study work Investigation Article Project Research Clinical trial paper
PURPOSE: Life and Health Sciences (BIO) (In this STUDY), we VERB (the/a) Results REPORT DESCRIBE INVESTIGATE SHOW ANALYSE EVALUATE DETERMINE … INVESTIGATE EXAMINE REPORT PROPOSE TEST HYPOTHESIZE DESCRIBE PRESENT SEEK TO ANALYSE EVALUATE DEMONSTRATE …
PURPOSE: Physical Sciences and Engineering (EXA) 1.The AIM of this STUDY 2.This STUDY VERB 3.(In this STUDY), we VERB (the/a) Results
RESULTS: 1. Results VERB (that/the) e.g. The results show that 2. we VERB (that/the) e.g. we found that Results
Main Contributions 1.Pedagogic applications a) Syllabus b) Teaching material 2.Development of writing tools Contribu tions
Pedagogic applications Overuse and underuse Patterns Results VERB (that/the) BE PARTICIPLE to VERB (e.g. was found to be) Items within patterns It BE observed that X It BE shown/found that Contribu tions
Writing Tools: AZEA Manual validation Contribu tions AZEA ++ New features to be considered: Lexico-grammatical patterns Multi-labels Disciplinary variations
Writing Tools Future Work Physical Sciences and Engineering Life and Health Sciences
Thank you! ELC 2010