Pedagogic uses of a corpus of student writing and their implications for sampling and annotation Alois Heuboeck University of Reading, UK.

Slides:



Advertisements
Similar presentations
Paul Thompson Applied Linguistics Corpora: Resources for the study of language.
Advertisements

Rationale for a multilingual corpus for machine translation evaluation Debbie Elliott Anthony Hartley Eric Atwell Corpus Linguistics 2003, Lancaster, England.
Variation and regularities in translation: insights from multiple translation corpora Sara Castagnoli (University of Bologna at Forlì – University of Pisa)
“In light of this, it is suggested…”: Comparing n-grams in Chinese and British students’ undergraduate assignments from UK universities Maria LeedhamICAME.
Uses of a Corpus “[E]xplore actual patterns of language use”
►Identify the importance of text complexity in disciplinary literacy. ►Compare the CCSS grade level expectations for text complexity. ►Identify the three.
Research Methods in Politics: 1: Introduction 1 Research Methods in Politics 1 Introduction.
The Sketch Engine -What is The Sketch Engine? -What is a corpus? -Looking at the BASE and the BAWE corpora. -How can this help.
What is a corpus?* A corpus is defined in terms of  form  purpose The word corpus is used to describe a collection of examples of language collected.
Corpus Linguistics. What is corpus linguistics? Method / Theory in Linguistics Analysis of collections of texts (corpora) Verifying/ Strengthening or.
Keyword extraction for metadata annotation of Learning Objects Lothar Lemnitzer, Paola Monachesi RANLP, Borovets 2007.
1/26 Corpus Linguistics. 2/26 Varieties of English Relevance of corpus linguistics to this course –Previously studies of stylistics were largely informal.
The origins of language curriculum development
BAAL Bristol 17/05 Variation in disciplinary culture: university tutors’ views on assessed writing tasks Hilary Nesi, with Sheena.
Using Corpora in Linguistics
Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.
Corpus Linguistics: session 2 Corpus Linguistics (2): The Tools of the Trade 669o4zt
Presented by Jennifer Robison TexTESOL II March 12, 2010 San Antonio, TX.
Corpus Linguistics What can a corpus tell us ? Levels of information range from simple word lists to catalogues of complex grammatical structures and.
Research methods in corpus linguistics Xiaofei Lu.
Memory Strategy – Using Mental Images
Labels: automation Adam Kilgarriff. Auckland 2012Kilgarriff / Labels: automation2 Which words are:  Most distinctive of business English?  Most often.
McEnery, T., Xiao, R. and Y.Tono Corpus-based language studies. Routledge. Unit A 2. Representativeness, balance and sampling (pp13-21)
Data collection and experimentation. Why should we talk about data collection? It is a central part of most, if not all, aspects of current speech technology.
U SING C ORPUS - BASED R ESEARCH FOR L ANGUAGE T EACHING AND L EARNING ENGLISH 510 Hee Sung (Grace) Jun & Kimberly LeVelle.
 What is the BNC?  What is Xaira?  How to use the BNC for: › Language teaching and learning › Research.
Representatıvness, balance and samplıng ın a corpus Lınguistıcs.
Reflections on Using Corpora Data in EFL Teaching CHEN BO Chongqing Jiaotong University 2006.
Researching language with computers Paul Thompson.
Contextual Information and Metadata Sheena Gardner and Dawn Hindle CELTE, University of Warwick Corpus Linguistics Birmingham July 2005.
ENG 626 CORPUS APPROACHES TO LANGUAGE STUDIES exploring frequencies in texts Bambang Kaswanti Purwo
Why We Need Corpora and the Sketch Engine Adam Kilgarriff Lexical Computing Ltd, UK Universities of Leeds and Sussex.
College and Career Readiness Conference Summer 2014.
Statistics for Business and Economics Chapter 1 Statistics, Data, & Statistical Thinking.
Access to "pedagogic rights" in social science teaching and learning: findings from a study of English universities The ‘Pedagogic quality and inequality.
TALC Applying some Developments in Corpus Building Technology to Language Teaching and Learning TALC 2006 Paris.
Tracking Language Development with Learner Corpora Xiaofei Lu CALPER 2010 Summer Workshop July 12, 2010.
Reading & Literature Standards Students Will Read to Comprehend a Variety of Texts Using Appropriate Strategies: 1.Thinking.
Seminar in Applied Corpus Linguistics: Introduction APLNG 597A Xiaofei Lu August 26, 2009.
Capturing patterns of linguistic interaction in a parsed corpus A methodological case study Sean Wallis Survey of English Usage University College London.
BioRAT: Extracting Biological Information from Full-length Papers David P.A. Corney, Bernard F. Buxton, William B. Langdon and David T. Jones Bioinformatics.
Translation Studies 9. The use of corpora in TS Krisztina Károly, Spring, 2006 Sources: Olohan, 2004; Tirkkonen-Condit, 2005.
Pedagogic Corpora for Content & Language Integrated Learning Applied English Linguistics Group Tübingen This project has been funded with support from.
Corpus approaches to discourse
Corpus Linguistics in Research Doctorate in Education University of Warwick 6th November 2008.
RESEARCH DESIGN & CORPUS COMPILATION. Corpus design is intrinsic and a fundamental part of the analysis. It is guided by the RQ and affects the results.
English for Specific Purposes
Automatic acquisition for low frequency lexical items Nuria Bel, Sergio Espeja, Montserrat Marimon.
Applying some Developments in Corpus Building Technology to Language Teaching and Learning TALC 2006 Paris.
Overview of Corpus Linguistics
Corpus Linguistics MOHAMMAD ALIPOUR ISLAMIC AZAD UNIVERSITY, AHVAZ BRANCH.
Text Encoding Issues The British Academic Written English (BAWE) project Corpus Linguistics University of Birmingham July 16 th, 2005.
1 COPYRIGHT PEJMAN HABIBIE 2010 "This work is the intellectual property of the author. Permission is granted for this material to be shared for non-commercial,educational.
Soomin Jwa & Justin Cubilo
What is a Corpus? What is not a corpus?  the Web  collection of citations  a text Definition of a corpus “A corpus is a collection of pieces of language.
Exploring Formulaicity for First Year Composition Students Robin Sulkosky.
King Faisal University جامعة الملك فيصل Deanship of E-Learning and Distance Education عمادة التعلم الإلكتروني والتعليم عن بعد [ ] 1 جامعة الملك فيصل عمادة.
Use of Concordancers A corpus (plural corpora) – a large collection of texts, written or spoken, stored on a computer. A concordancer – a computer programme.
PRIMENJENA LINGVISTIKA I NASTAVA JEZIKA II 3 rd class.
The vocabulary of academic speaking: an interdisciplinary perspective
The development of ESP.
In the Name of God.

Using Corpora in Linguistics
Computational and Statistical Methods for Corpus Analysis: Overview
عمادة التعلم الإلكتروني والتعليم عن بعد
Introduction Data Mining for Business Analytics.
Corpus-Based ELT CEL Symposium Creating Learning Designers
Using GOLD to Tracking L2 Development
Pedagogic uses of a corpus of student writing
Presentation transcript:

Pedagogic uses of a corpus of student writing and their implications for sampling and annotation Alois Heuboeck University of Reading, UK

The British Academic Written English (BAWE) corpus of student writing Project in progress at the universities of Reading, Warwick and Oxford Brookes Funded by the Economic and Social Research Council (project nr. RES )

Outline Corpora in LT: uses and purposes Accessing corpus information: interfaces Building corpora: requirements and decisions - the BAWE corpus

Using corpora in language pedagogy pedagogic uses purposes classroom materials description “motivational” “linguistic”

Interfaces (1): the concordance typical query options word form lemma wildcards (e.g. “investigat*”) grammatical (e.g. POS) patterns

Information & interfaces (2) statistics corpus items Frequencies, ratios e.g. word list, key words macrostructural properties and choices generic types, e.g. CARS model (Swales 1990) ad hoc statistics

Requirements: a “good corpus” for language pedagogy Representative: target variety Relevant: information, annotation Usable: e.g. interface, size

Representativeness –distribution and quantitative relations The corpus as a representative sample should reflect: –range of features Conflicting principles quantitative representativeness qualitative representativeness

Representativeness (2): the BAWE corpus A trade-off: stratified sampling AH SSLS PS EnglishHistoryLinguisticsClassicsArchaeologyHistory of ArtPhysicsChemistryMeteorologyMathematics Computer Science Engineering SociologyLawBusiness PoliticsAnthropologyPublishingMedicine Biological Sciences BiochemistryAgricultureFood Sciences Health & Social Care Frame 1: the university: corpus Σ=3,072 ass. Frame 2: 4 disciplinary groups à 768 ass. Frame 3: 4x6 disciplines à 128 ass. Frame 4: 4 levels per discipline à 32 ass.

Relevance Relevant information in corpus Significant query Corpus annotation Features: lexicogrammatical, structural etc.

Relevance (2): features annotated in the BAWE corpus “grammatical” textual: structure of “running text” typographical (lay-out) metatextual: numbering other “interesting” features

Corpus size “For the pedagogical analysis of many common grammatical phenomena a full-size research corpus is much too large.” (Osborne 2000) Specialised corpora Modularity: subcorpora

Conclusion: 3 views Qualitative vs. quantitative representation Corpus annotation and interfaces: query Corpus size: modularity corpus as representation of a (set of) target variety/varieties instances of lexicogrammatical (etc.) features and phenomena balanced samples of target variety/varieties

Pedagogic uses of a corpus of student writing and their implications for sampling and annotation Alois Heuboeck University of Reading, UK The British Academic Written English corpus