Corpora, Language Technology and Maltese

Slides:



Advertisements
Similar presentations
Grammar is to Meaning as the Law if to Good Behaviour Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of Leeds and Sussex.
Advertisements

The Cambridge Learner Corpus, English Profile, the Sketch Engine and the Kelly Project Adam Kilgarriff Lexical Computing Ltd
Finding multiwords of more than two words Adam Kilgarriff, Pavel Rychly, Vojtech Kovar, Vıt Baisa Lexical Computing Ltd; Masaryk Univ., Cz.
Concordancing at Upper-Intermediate Levels What it is not What you will get from this talk.
1 Lexicalise Your Lesson 2: Teaching Grammar Leo Selivan.
1 Corpora for all Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of Leeds and Sussex.
Linking Dictionary and Corpus Adam Kilgarriff Lexicography MasterClass Ltd Lexical Computing Ltd University of Sussex UK.
L EARNERS ’ D ICTIONARY Deny A. Kwary
1 Analysing and teaching meaning (3) Analysing and teaching meaning (3) SSIS Lazio - Lesson 3 prof. Hugo Bowles January 2007.
Macrostructure  Front matter  Body  Appendices Jackson, Howard Lexicography: An Introduction. London: Routledge, p. 25.
1 Chinese WordSketch Online, corpus-based summaries of word usage.
Using Corpora for Teaching Chinese Dr. Adam Kilgarriff Lexical Computing Ltd Leeds University UK.
The Sketch Engine -What is The Sketch Engine? -What is a corpus? -Looking at the BASE and the BAWE corpora. -How can this help.
Making useful wordlists for ELT Topical vocabulary from the WWW Simon Smith & Scott Sommers Ming Chuan University, Taipei Adam Kilgarriff, Lexical Computing.
Today Listening test Corpus linguistics talk, Part 3 News task NEOs Life on Mars.
Talking about your homework News story? –What made you choose…? One of your words? –What made you choose…? (Give your vocabulary books to another student.
1 Corpora for the coming decade Adam Kilgarriff Lexical Computing Ltd.
1/26 Corpus Linguistics. 2/26 Varieties of English Relevance of corpus linguistics to this course –Previously studies of stylistics were largely informal.
Today Writing: using the comma –Writing task Corpus linguistics talk, Part 2 Re-organize groups –Group news discussion.
Using Corpora in Linguistics
1 Vocab Assessment & Corpora and Concordancing Major vocabulary assessment tools Major corpora and concordancers.
Memory Strategy – Using Mental Images
Simple Maths for Keywords Adam Kilgarriff Lexical Computing Ltd.
1 Evaluating word sketches Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of Leeds and Sussex.
Tomaž Erjavec 1, Adam Kilgarriff 2, Irena Srdanović Erjavec 3 1 Jožef Stefan Institute, Slovenia 2 Lexical Computing Ltd. and University of Leeds, UK 3.
Using Corpora for Teaching Chinese Dr. Adam Kilgarriff Lexical Computing Ltd Leeds University UK.
Lexical patterning in academic talk
1 Corpora, Language Technology and Maltese Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd University of Sussex.
1 The Long Road from Text to Meaning Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of Leeds and Sussex.
Reflections on Using Corpora Data in EFL Teaching CHEN BO Chongqing Jiaotong University 2006.
Word senses Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of Leeds, Sussex.
GDEX: Automatically finding good dictionary examples in a corpus Adam Kilgarriff, Miloš Husák, Katy McAdam, Michael Rundell, Pavel Rychlý Lexical Computing.
1 Corpora, Dictionaries, and points in between in the age of the web Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of.
1 Chinese WordSketch Engine Online, corpus-based summaries of word usage.
Class 3 Corpora in language teaching. Current trends in FLT  Communicative Language Teaching  Trends within CLT authentic language contextualised language.
1 Corpora, Language Technology and Maltese Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd University of Sussex.
Why We Need Corpora and the Sketch Engine Adam Kilgarriff Lexical Computing Ltd, UK Universities of Leeds and Sussex.
Corpora by Web Services Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of Leeds and Sussex.
1 Evaluating word sketches and corpora Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of Leeds and Sussex.
Corpus Evaluation Adam Kilgarriff Lexical Computing Ltd Corpus evaluationPortsmouth Nov
Malta, May 2010Kilgarriff: Corpora by Web Services1 Corpora by Web Services Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities.
CL 2005, Birmingham Web as Corpus Workshop Intro: Adam Kilgarriff 1 Web as Corpus Workshop Co-chairs: Marco Baroni Adam Kilgarriff Sebastian Hoffman.
The Sketch Engine as Infrastructure for Large Scale Text Collections for Humanities Research Adam Kilgarriff Lexical Computing Ltd. & Univ of Leeds, UK.
Using Corpora in Linguistics and Lexicography Adam Kilgarriff Lexical Computing Ltd Universities of Leeds, Sussex, UK.
Sketch engine for Chinese Discussion notes. Wordsketch, subsequently Sketch Engine Was developed by Kilgarriff et al at Brighton Gives automatic, corpus-based.
Grammar is to Meaning as the Law if to Good Behaviour Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of Leeds and Sussex.
Learners' Dictionaries Oxford1948 Longman1978 Collins COBUILD1987 Macmillan2002 Macmillan2008 (bilingualized) Merriam-Webster2008 Jackson, Howard
GDEX: Automatically finding good dictionary examples in a corpus Auckland 2012Kilgarriff: GDEX1.
Exploring Variation in Lexis and Genre in the Sketch Engine Adam Kilgarriff Lexical Computing Ltd., UK Supported by EU Project PRESEMT.
What is a Corpus? What is not a corpus?  the Web  collection of citations  a text Definition of a corpus “A corpus is a collection of pieces of language.
GDEX: Automatically finding good dictionary examples in a corpus Kivik 2013Kilgarriff: GDEX1.
Use of Concordancers A corpus (plural corpora) – a large collection of texts, written or spoken, stored on a computer. A concordancer – a computer programme.
英语词汇学课程课件 课件名称:英语词典制作人:孙红梅、寻阳单位:曲阜师范大学外国语学院. Chapter 10 English Dictionaries.
PRIMENJENA LINGVISTIKA I NASTAVA JEZIKA II 3 rd class.
GDEX: Automatically finding good dictionary examples in a corpus.
Corpora and language learning
Corpora: a key part of a materials writer’s toolkit
Writing Inspirations, 2017 Aalto University
CORPUS LINGUISTICS Corpus linguistics is the study of language as expressed in samples (corpora) or "real world" text. An approach to derive at a set of.
Searching corpora.
Making useful wordlists for ELT
Using Corpora in Linguistics
Evaluating word sketches and corpora
Exploring the BNC Corpus
عمادة التعلم الإلكتروني والتعليم عن بعد
Introduction to Corpus Linguistics: Exploring Collocation
Introduction to Corpus Linguistics: Applications Lexicography
Corpora and Concordancers in ESL/EFL Class:
Tomaž Erjavec1, Adam Kilgarriff2, Irena Srdanović Erjavec3
Definition of a corpus Research on written or spoken texts can now be carried out with corpus linguistics. The notion of a corpus as the basis for a form.
Presentation transcript:

Corpora, Language Technology and Maltese Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd University of Sussex

How do you find out about a language? Native speakers Dictionaries and Grammars Corpus Kilgarriff, Lexical Computing

Four ages of corpus research Kilgarriff, Lexical Computing

Kilgarriff, Lexical Computing Age 1: Pre-computer James Murray, Chief Editor Oxford English Dictionary, vol 1 1879: 20 million index cards Kilgarriff, Lexical Computing

Age 2: KWIC Concordances From 1980 Computerised Kilgarriff, Lexical Computing

Kilgarriff, Lexical Computing Age 2: KWIC Concordance Kilgarriff, Lexical Computing

Age 2: KWIC Concordances From 1980 Computerised COBUILD project was innovator the coloured-pens method Kilgarriff, Lexical Computing

Kilgarriff, Lexical Computing The coloured pens method 1 political association 4 person in an agreement/dispute 2 social event 5 to be party to something... 3 group of people Kilgarriff, Lexical Computing

Kilgarriff, Lexical Computing Age 2: limitations as corpora get bigger: too much data 50 lines for a word: read all 500 lines: could read all, takes a long time 5000 lines: no Pre-computer corpus lexicography, at its most systematic, used an index card for each citation. The citation was of a word, and tended to be of a rare word; for common words it does not seem worthwhile to take down citations. For an account of the COBUILD project see Looking up, edited by John Sinclair (Collins, 1987). COBUILD was a collaboration between Birmingham University and Collins, and gave rise I n due course to the COBUILD English dictionary, for learners of English. Kilgarriff, Lexical Computing

Age 3: Collocation statistics Problem: too much data - how to summarise? Solution: list of words occurring in neighbourhood of headword, with frequencies Sorted by salience Kilgarriff, Lexical Computing

Kilgarriff, Lexical Computing Collocation listing For right collocates of save (>5 hits) word freq forests 6 life 36 $1.2 dollars 8 lives 37 costs 7 enormous thousands annually face 9 jobs 20 estimated money 64 your Kilgarriff, Lexical Computing

Kilgarriff, Lexical Computing Age 4: The word sketch A corpus-derived one-page summary of a word’s grammatical and collocational behaviour Kilgarriff, Lexical Computing

Kilgarriff, Lexical Computing Age 4: The word sketch Large well-balanced corpus Parse to find subjects, objects, heads, modifiers etc One list for each grammatical relation Statistics to sort each list Kilgarriff, Lexical Computing

Macmillan English Dictionary For Advanced Learners Ed: Rundell, 2002 Kilgarriff, Lexical Computing

Kilgarriff, Lexical Computing Developer: Pavel Rychly, Brno Users: OUP, Chambers, CUP Universities for teaching and research ELT textbook authors Demo: http://www.sketchengine.co.uk/ Self-registration for free account Kilgarriff, Lexical Computing