Today Writing: using the comma –Writing task Corpus linguistics talk, Part 2 Re-organize groups –Group news discussion
1. 2.He left the scene of the accident and tried to forget that it had happened. 3. Oil which is lighter than water rises to the surface. 4. Madame de Stael was an attractive gracious lady. 5. Nice is a word with many meanings and some of them are contradictory. 6. Taxicabs that are dirty are illegal in some cities. 7. The uninvited guest wore a dark blue tweed suit. 8. I hope that some day he will learn how to be polite. 8. Mark Twain's early novels I believe stand the test of time. 9. Write the editor of the Atlantic 8 Arlington Street Boston Massachusetts He replied "I have no idea what you mean." 11. After a good washing and grooming the pup looked like a new dog. 12. Men who are bald are frequently the ones who are the most authoritative on the subject of baldness. 13. Hello Kitty cellphones which are very popular in Japan have not really caught on in Taiwan.
Introduction to corpus linguistics Simon Smith & Adam Kilgarriff
Plan for today Short review of corpus basics 4 ages of corpus research – From pre-computer age, to SkE Functions of SkE Demonstration of SkE in use
Quiz What’s a (linguistic) corpus? What does the Latin word mean? What are corpora?corpora What’s the BNC? How big is the British National Corpus? What is the advantage of having a very big corpus? What can corpora be used for?
5 major uses for linguistic corpora Language learning and teaching Theoretical research on Language and Linguistics Literary research and analysis Language technology Lexicography (=dictionary making) – Cobuild, Longman, … – All learner dictionaries now use corpora
How do you make a dictionary? (What sources can you use?) Use your own knowledge of words Ask all your friends for their knowledge Consult other dictionaries – and copy them Read thousands of books – and take lots of notes Use a corpus
Taiwan, Dec 2006 Four ages of corpus research (in lexicography) Kilgarriff, Lexical Computing Slide: 8 Age 1: Pre-computer Age 2: KWIC concordance (KWIC=?) Age 3: Corpus query tools e.g. Sketch Engine
Taiwan, Dec 2006 Kilgarriff, Lexical Computing Slide: 9 Age 1: Pre-computer First Oxford English (1860) Dictionary: 20 million index cards – a word (usually rare) and a citation
Taiwan, Dec 2006 Kilgarriff, Lexical Computing Slide: 10 Age 2: KWIC Concordance
Taiwan, Dec 2006 Kilgarriff, Lexical Computing Slide: 11 Age 2 (~ ): KWIC Concordances Using computers List of lines which contain a keyword The keyword is in the middle
Taiwan, Dec 2006 Kilgarriff, Lexical Computing Slide: 12 4 person in an agreement/dispute 1 political association 4 person in an agreement/dispute 2 social event 5 to be party to something... 3 group of people The coloured pens method
Taiwan, Dec 2006 Kilgarriff, Lexical Computing Slide: 13 Age 2: limitations as corpora get bigger: too much data 50 lines for a word: read all 500 lines: could read all, takes a long time 5000 lines: impossible
Taiwan, Dec 2006
Why do corpora keep getting bigger? (anyone?) Improvements in technology – Price of storage is going down – Speed of access is going up Representativeness – Small corpus many examples of common words, maybe – But not enough examples of unusual words
Lexical distribution What’s the most common word in English? What % does it make up of a whole corpus? The 100 most common words make up __% of all the words in a corpus? The 7500 most common words make up __% Answers: – The, 5%, 45% and 90% So: – you need massive corpora, if you want to really represent rare words properly
18 Limitation of KWIC analysis A s corpora get bigger: too much data – 50 lines for a word: read all – 500 lines: could read all, takes a long time – 5000 lines: no Instead, look at a Word Sketch from Sketch Engine – a statistical summary of word usage – shows most common collocates
Taiwan, Dec
Taiwan, Dec
Taiwan, Dec 2006 Maybe stop here Kilgarriff, Lexical Computing Slide: 21
Functions of SkE KWIC concordance – Sorting, filtering etc Word sketch Automatic thesaurus Sketch difference – discriminate near-synonyms 22
23 Lexical approach to language learning Lewis (1993) and Schmitt (2000) say – the vocab is stored in the brain in collocations – Bacon is stored near eggs – 蛋 is stored near 炒飯 – scotch is stored with whisky Saying strong car or powerful tea or broken house seems very “foreign”
24 From - a lexical approach activity, based on a story textwww.teachingenglish.org
News task 4 sentences News story must be from the current week Please include the date when you print it Make two lists of adjectives: –(+) exciting; dramatic; unusual… –(-) dull; complicated; bloodthirsty… Choose the best story from your group –I’m not very keen on that story because… –I prefer this story because…
Collocations and sentences 5 words Use the SkE beta Say which corpus you used 3 collocations for each word –State the frequency –State the salience ( 顯著性 ) Example sentence from SkE should use one of the collocations you chose If you don’t understand the sentence, don’t use it!
Before this week’s reading, ask: How many different cuisines can you name, from around the world? Which cuisine do you think is the healthiest?