Guy Aston, Ylva Berglund Prytz, & Lou Burnard, Exploring BNC-XML with Xaira.

Slides:



Advertisements
Similar presentations
Uses of a Corpus “[E]xplore actual patterns of language use”
Advertisements

Dr. Radhika Mamidi Corpus. What is a Corpus? a corpus (plural corpora) or text corpus is a large and structured set of texts (now usually electronically.
Lou Burnard BNC-XML: an introduction.
The BNC XML edition Guy Aston
What is a corpus?* A corpus is defined in terms of  form  purpose The word corpus is used to describe a collection of examples of language collected.
How To Teach Vocabulary. Best Practices What does effective, comprehensive vocabulary instruction look like? It has identified four key components: 1.
Do you suffer from judgement creep? A group moderation session will soon put you right!
Introduction : corpora, corpus use, and the British National Corpus Dr. Ylva Berglund Prytz
Using the BNC for teaching and research. Teaching and learning.
Data-Driven South Asian Language Learning SALRC Pedagogy Workshop June 8, 2005 J. Scott Payne Penn State University
Beginning Oral Language and Vocabulary Development
Writing in Key Stage One. How is writing taught in Key Stage One In Key Stage One writing is fully integrated into our topics. There begins to be a bigger.
USING SHARED WRITING IN THE CLASSROOM
Welcome to Rochelle’s Intermediate Communication Skills Course Fall Quarter Fridays 8:30-12:30/Room 8 Computer Lab Time 9:30-10:30/Room 11 Break.
Presented by Clay Renick Span Apps: the bridge to better literacy Gordon County Schools /
Know your genre: What kind of movies does SALC have? What kinds do you like the most? What are some advantages of choosing to watch TV series rather than.
Assessing Reading Meeting Year 5 Expectations
Assessing Reading: Meeting Year 3 Expectations
McEnery, T., Xiao, R. and Y.Tono Corpus-based language studies. Routledge. Unit A 2. Representativeness, balance and sampling (pp13-21)
Linguistics, Pragmatics & Natural Grammar
State Testing Updates. 5 th and 8 th Grade Writing Assessments Will be given sometime in April Will be passage-based Will involve two tests per grade.
Translation Studies 8. Research methods in Translation Studies Krisztina Károly, Spring, 2006 Sources: Károly, 2002; Klaudy, 2003.
Parikrma All teachers Day 2: Background. Three Bodies of Research How people acquire a second language (Language Acquisition and Applied Linguistics)
 What is the BNC?  What is Xaira?  How to use the BNC for: › Language teaching and learning › Research.
English Language.
Reflections on Using Corpora Data in EFL Teaching CHEN BO Chongqing Jiaotong University 2006.
Researching language with computers Paul Thompson.
Spoken Language Learned from Textbooks and Evidenced in CANCODE Gao Li & Ji Jianli Northwest University.
Tracking Language Development with Learner Corpora Xiaofei Lu CALPER 2010 Summer Workshop July 12, 2010.
Academia Británica Pulling teeth UTTERANCE above ALL March ̍11 UTTERANCE above ALL Academia Británica Pulling teeth March ̍11 um, so...what are we talkin’about?
“I Can” Learning Targets 4 th English/Writing 5th Six Weeks.
Seminar in Applied Corpus Linguistics: Introduction APLNG 597A Xiaofei Lu August 26, 2009.
How Can Corpora Help Me To Be Successful in CO150?
What is a M.C. Cloze? Section C – Reading and Language System.
1 And yeah, it was really good! Positive stance in native and learner speech Sylive De Cock Centre for English Corpus Linguistics Université catholique.
RESEARCH DESIGN & CORPUS COMPILATION. Corpus design is intrinsic and a fundamental part of the analysis. It is guided by the RQ and affects the results.
Communicative and Academic English for the EFL Professional.
A More Detailed Look at the Traits of Writing +1.
Corpus search What are the most common words in English
IELTS Intensive Writing part two. IELTS Writing Two parts of ielts writing Part one writing about a Graph, chart, diagram Part two is an essay.
A2 ENGLISH LANGUAGE Language Investigation. INVESTIGATION THE INITIAL STAGES How to approach the initial stages of your investigation.
Chapter 6 Acquiring knowledge for L2 use
“I Can” Learning Targets 4 th English/Writing 6th Six Weeks.
KS2 SATS SPaG 2015 English - Spelling, Punctuation and Grammar Comprises 40 to 50 short-answer questions covering grammar, punctuation and vocabulary.
Objectives of session By the end of today’s session you should be able to: Define and explain pragmatics and prosody Draw links between teaching strategies.
What is a Corpus? What is not a corpus?  the Web  collection of citations  a text Definition of a corpus “A corpus is a collection of pieces of language.
CORPUS LINGUISTICS 1) A revision of corpus linguistics 2) Language corpora in the ESL/EFL classroom.
T H E D I R E C T M E T H O D DM. Background DM An outcome of a reaction against the Grammar- Translation Method. It was based on the assumption that.
Welcome to Rochelle’s Intermediate Communication Skills Course Spring Quarter (11 Class Meetings) Fridays 8:30-12:30/Room 8 Computer Lab Time.
Approaches to teaching English The differences between EAP and General EFL Louis Rogers.
Use of Concordancers A corpus (plural corpora) – a large collection of texts, written or spoken, stored on a computer. A concordancer – a computer programme.
ENGLISH AS AN ADDITIONAL LANGUAGE LEARNERS AT SECONDARY SCHOOL INVOLVING PARENTS.
Lou Burnard RESEARCH TECHNOLOGIES SERVICE Oxford University Computing Services BNC-XML and Xaira.
GGGE6533 LANGUAGE LEARNING STRATEGY INSTRUCTION SUCCESSFUL ENGLISH LANGUAGE LEARNING INVENTORY (SELL-IN) FINDINGS & IMPLICATIONS PREPARED BY: ZULAIKHA.
GCSE ENGLISH ENGLISH LANGUAGE Unit 1 group Oracy task 21 st and 22 nd November 2016 Unit 2 exam 6 th June 2017 Unit 3 exam 12 th June 2017 ENGLISH LITERATURE.
Welcome to Rochelle’s Intermediate Communication Skills Course
Collecting Written Data
E303 Part II The Context of Language Research
Welcome to Rochelle’s Intermediate Communication Skills Course
DOCUMENT STUDY any written material that contain information about the phenomena we wish to study. primary documents vs. secondary documents primary --
Teaching English to Speakers of Other Languages
Project editing IInd grade Project.
Contemporary English Language 1
GCSE ENGLISH English Language key for further education and jobs
Computational and Statistical Methods for Corpus Analysis: Overview
Grammar Workshop Thursday 9th June.
PSLE Revision Notes Paper 1.
THE NATURE of LEARNER LANGUAGE
TEACHING READING Indawan Syahri 12/8/2018 indawansyahri.
FCE (FIRST CERTIFICATE IN ENGLISH) General information.
Presentation transcript:

Guy Aston, Ylva Berglund Prytz, & Lou Burnard, Exploring BNC-XML with Xaira

What is the BNC?  a snapshot of British English, taken at the end of the 20 th century  100 million words in approx 4000 different text samples, both spoken (10%) and written (90%)  synchronic (1990-4), sampled, general purpose corpus  available under licence; latest edition is BNC- XML (13 mar 2007)

Distinctive features of the BNC  non-opportunistic design  standardized markup system  structural annotation  word class annotation  contextual information  general availability...in these respects, the BNC remains distinctive, twenty years on!

What's new in BNC-XML?  No systematic proofing, re-editing, or re-parsing...  Same as BNC World:  texts (minus duplicates)  POS tagging (but extended)  Additions  simpler pos codes  lemmata  Improvements  Duplications, categorizations, segmentations...  Coded descriptions

BNC-XML regroups texts using additional classification criteria

FACTSHEET WHAT IS AIDS ? AIDS ( Acquired Immune Deficiency Syndrome ) is a condition caused by a virus called HIV ( Human Immuno Deficiency Virus ). … …

What is the markup for?  It makes it possible for you to  distinguish aids=SUBST from aids=VERB  distinguish occurrences in writing from ones in speech  distinguish occurrences in headings from ones in paragraphs  identify contextual units like sentences and paragraphs  FACTSHEET WHAT IS AIDS? AIDS (Acquired Immune Deficiency Syndrome) is a condition caused by a virus called HIV (Human Immuno Deficiency Virus).

Has English moved on since the BNC?  types of text   web pages / blogs  SMS  personal letters  topics  globalization  internet  Elvis  Word Perfect  how comparable is the Web?

Out of date?  The composition (and date) of any corpus affects inferences drawn from it  There aren't many alternatives  Web-as-corpus: 85% of written texts aren't on the web - and spoken texts?  Results from monitor corpora non-replicable  Copyright permissions unrepeatable  Quantitative and qualitative comparative evaluations of BNC coverage are needed  but “it's surprising how much is there”

What can you do with it?  The BNC is a problematizing resource...  complements (and corrects) intuition  increases learner autonomy  critiques the myth of the native speaker ... for teacher and learner alike  XML makes it more accessible by non specialist software (eg A0S in web browser)

You can use XAIRA to...  find sample sentences  cloze tests  check what the text book says  grammar vs usage  (dis)confirm intuitions  find sample specialist texts  make serendipitous discoveries

Finding sample sentences  some phrases that take the gerund  there's no point....  how / what about...  generatable phrases  [comparative] and [comparative]  sentence structures  [s-initial interjection]

(Dis)confirming intuition  about choices  have a problem + infinitive or gerund?  do you make or take decisions?  about vocabulary  which nouns collocate with hard?  about grammar  I would be grateful if you [modal]?

Finding specialised texts  The BNC has an extraordinary range  travel agent brochures, weather reports, formal invitations, advertising, children's talk, academic discourse, doctor's consultations, marketing meetings, oral history, jokes and anecdotes, high literature, best-sellers, leaflets, personal diaries...  The problem is finding it  use WLD principle

For learners...  The same as teachers  Pointers to follow in the quest for idiomicity  collocations  colligations  semantic preferences  semantic prosodies/pragmatic associations  associations with particular genres/domains  Can learners use the BNC “autonomously”?

The ins and outs of autonomous use  Learners may need warning to...  focus on patterns which recur, without necessarily trying to explain all the data  avoid overgeneralisation ... and encouragement to  be curious  browse the context  investigate exceptions

What are ins and outs?  (and are they the same as ups and downs)?  50 occurrences, sort left 2  colligation: (all) the ins and outs of  semantic preference: know/learn/understand/keep up with/get to grips with/get down to/forget; explain/teach/guide through/give/look at  semantic prosody: difficulty(?)  analysis - mainly spoken conversation, but numbers too small for reliable inference

Exploring idioms make a pointthe point ispoint out have a pointhigh pointpoint to in point of factstarting point no point in point of viewat X pointwhat‘s the point to the pointsee/get/grasp the point Example: idioms with point

Exploring features of speech PS6NR >: [laugh] he's not a millionaire yet. PS6NM >: No so perhaps not, mm.Oh perhaps, perhaps he, perhaps he has the knowledge but has difficulty in er navigating his way to the betting shop to to do anything about it. PS6NR >: [laugh] PS6NM >: Anyway erm PS6NR >: Right I've... results see this is PS6NM >: Mm. PS6NR >: this is really what I'm [... ] PS6NM >: Yeah. PS6NR >: comparison of subjects within groups and between groups I thought that's PS6NM >: Yeah, mm. PS6NR >: like a typical [... ] Examples: spoken discourse markers and back channels

Exploring productivity of affixes  How many adjectives can you think of ending in -ish?  babyish, bearish,.... wankish, whorish, yobbish  How many nouns starting with anti-?  How about verbs?

Creative writing Paul Auster: City of Glass It was the wrong number that started it, the telephone ringing three times in the dead of the night, and the voice on the other end asking for someone he was not. Examples: story beginnings Ian McEwan: Saturday Everyone agrees, airliners look different these days, predatory and doomed.

Where can I get one?  BNC XML:  now available on DVD  standalone single user licence or institutional licence  discounted price till end June  XAIRA  Delivered free with the BNC (and also available free from  Usable with any XML corpus  Usable/ish on any platform