The BNC Design Model Adam Kilgarriff, Sue Atkins, Michael Rundell The Lexicography MasterClass
Birmingham Jul 2007Kilgarriff Atkins Rundell2 BNC Very widely used across Lexicography Linguistics Language technology Language teaching A spectacular success
Birmingham Jul 2007Kilgarriff Atkins Rundell3 The BNC design model Well planned Atkins Clear Ostler 1992 Produced a successful outcome A model for others (working on other languages) to follow
Birmingham Jul 2007Kilgarriff Atkins Rundell4 Czech National Corpus American National Corpus Hungarian National Corpus Hellenic National Corpus Croatian National Corpus Slovak National Corpus National Corpus for Ireland
Birmingham Jul 2007Kilgarriff Atkins Rundell5 Great! However
Birmingham Jul 2007Kilgarriff Atkins Rundell6
Birmingham Jul 2007Kilgarriff Atkins Rundell7
BNC Design Model past its sell-by Adam Kilgarriff, Sue Atkins, Michael Rundell The Lexicography MasterClass
Birmingham Jul 2007Kilgarriff Atkins Rundell9 BNC Design Model 1980s Eighteen years old Pre-web
Birmingham Jul 2007Kilgarriff Atkins Rundell10 Sue Atkins’ dream, ca 1985 The dream Gazillions of text More than we could possibly imagine The plan Let’s reach for the sky:
Birmingham Jul 2007Kilgarriff Atkins Rundell11 Amazing implausible ridiculous you won’t possibly do it 100 million: you must be kidding
Birmingham Jul 2007Kilgarriff Atkins Rundell : Google everyday access to eighty thousand times as much
Birmingham Jul 2007Kilgarriff Atkins Rundell13 Inference Vision behind the BNC (gazillions, reach for the sky) leads to 1980s: the BNC 2007: something quite different
Birmingham Jul 2007Kilgarriff Atkins Rundell14 BNC vision: other aspects A balance of text types Substantial share (10%) spoken No swingeing copyright constraints A reference corpus
Birmingham Jul 2007Kilgarriff Atkins Rundell15 Balance of text types Good goal – added value for BNC Design by linguists and publishers Reflects their ideas/interests Constrained by collection costs Prescribes not describes Costs now quite different Blogs etc are free “What is a good taxonomy of text types” Good open research question
Birmingham Jul 2007Kilgarriff Atkins Rundell16 Spoken language Many things are possible Online transcripts Hoffman: 300m words of Larry King show Web 2.0 cheese Whole BNC: 2,954 occurrences Spoken BNC: 456 occurrences Youtube: 34,900 videos Everyzing.com audio search (formerly podzinger) cheese: 37,030 files
Birmingham Jul 2007Kilgarriff Atkins Rundell17 Play Here 0:18:23...
Birmingham Jul 2007Kilgarriff Atkins Rundell18
Birmingham Jul 2007Kilgarriff Atkins Rundell19 Copyright BNC isn’t do-as-you-like Compare WordNet Corpus collectors are like search engines Copyright and Web 2: Website’s defense: “we did not know it was there and will promptly remove it” OK in US law
Birmingham Jul 2007Kilgarriff Atkins Rundell20 A reference corpus “a reference point for the language” Balanced Fixed Experiments are replicable Freely available Size might not be important Brown still works well for many questions
Birmingham Jul 2007Kilgarriff Atkins Rundell s Corpus building: expensive Wanted for many purposes One-size-fits-all BNC met many needs Many text types have too few documents Medical, technical, children’s Used ‘because it is available’ No affordable alternatives
Birmingham Jul 2007Kilgarriff Atkins Rundell Corpus building: cheap WebBootCaT Made-to-measure corpora Different research question: different corpus Is “general-purpose reference corpus” a useful idea?
Birmingham Jul 2007Kilgarriff Atkins Rundell23 In sum: BNC Design Model 1990s innovative and inspiring 2007 historic interest New thinking needed for a new situation