Using Corpora in Language Research Adam Kilgarriff Lexical Computing Ltd Universities of Leeds January 2013Adam Kilgarriff.

Slides:



Advertisements
Similar presentations
Grammar is to Meaning as the Law if to Good Behaviour Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of Leeds and Sussex.
Advertisements

The Cambridge Learner Corpus, English Profile, the Sketch Engine and the Kelly Project Adam Kilgarriff Lexical Computing Ltd
Uses of a Corpus “[E]xplore actual patterns of language use”
Botox, themself and slugs Corpus Methods in Many Places Adam Kilgarriff Lexical Computing Ltd.
1 Corpora for all Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of Leeds and Sussex.
Linking Dictionary and Corpus Adam Kilgarriff Lexicography MasterClass Ltd Lexical Computing Ltd University of Sussex UK.
1 Corpora for the coming decade Adam Kilgarriff. Dublin June 2009 Kilgarriff: Corpora for the coming decade2 How should they be different?  Bigger 
1 Linguistics and translation theory Mark Shuttleworth Teaching Translation Swansea, 20 January 2006.
Between Corpus and Dictionary Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of Leeds, Sussex.
Corpus Creation for Lexicography Adam Kilgarriff, Michael Rundell Lexicography MasterClass, UK Elaine Ui Dhonnchadha ITE (Linguistics Institute of Ireland)
Using Corpora for Teaching Chinese Dr. Adam Kilgarriff Lexical Computing Ltd Leeds University UK.
Recent Developments in Technological Tools for the Purpose of Facilitating SLA.
January 12, Statistical NLP: Lecture 2 Introduction to Statistical NLP.
Making useful wordlists for ELT Topical vocabulary from the WWW Simon Smith & Scott Sommers Ming Chuan University, Taipei Adam Kilgarriff, Lexical Computing.
Constructing and Evaluating Web Corpora: ukWaC Adriano Ferraresi University of Bologna Aston University Postgraduate Conference.
Talking about your homework News story? –What made you choose…? One of your words? –What made you choose…? (Give your vocabulary books to another student.
1 Corpora for the coming decade Adam Kilgarriff Lexical Computing Ltd.
LELA English Corpus Linguistics
Input-Output Relations in Syntactic Development Reflected in Large Corpora Anat Ninio The Hebrew University, Jerusalem The 2009 Biennial Meeting of SRCD,
Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.
What's on the Web? The Web as a Linguistic Corpus Adam Kilgarriff Lexical Computing Ltd University of Leeds.
Chapter 3: An Introduction to Corpus Linguistics Compiled by: Sajjad Ghadamyari Farhad Ghiasvand Presentation Date: Dec. 8, Monday.
Yuliya Morozova Institute for Informatics Problems of the Russian Academy of Sciences, Moscow.
Memory Strategy – Using Mental Images
But What Does It All Mean? Key Concepts for Getting the Most Out of Your Assessments Emily Moiduddin.
Simple Maths for Keywords Adam Kilgarriff Lexical Computing Ltd.
Labels: automation Adam Kilgarriff. Auckland 2012Kilgarriff / Labels: automation2 Which words are:  Most distinctive of business English?  Most often.
1 Evaluating word sketches Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of Leeds and Sussex.
Using Corpora for Teaching Chinese Dr. Adam Kilgarriff Lexical Computing Ltd Leeds University UK.
First International Sketch Grammar Workshop Ljubljana 3-4 February 2010.
1 Corpora, Language Technology and Maltese Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd University of Sussex.
1 The Long Road from Text to Meaning Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of Leeds and Sussex.
Terminology, translation, and PRESEMT; word frequency lists and KELLY 1 Adam Kilgarriff Lexical Computing Ltd SKEW-2, March 2011Kilgarriff: PRESEMT and.
Word senses Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of Leeds, Sussex.
GDEX: Automatically finding good dictionary examples in a corpus Adam Kilgarriff, Miloš Husák, Katy McAdam, Michael Rundell, Pavel Rychlý Lexical Computing.
1 Corpora, Dictionaries, and points in between in the age of the web Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of.
Researching language with computers Paul Thompson.
Why We Need Corpora and the Sketch Engine Adam Kilgarriff Lexical Computing Ltd, UK Universities of Leeds and Sussex.
Without data, nothing Adam Kilgarriff Lexical Computing Ltd University of Leeds.
UCREL: from LOB to REVERE Paul Rayson. November 1999CSEG awayday Paul Rayson2 A brief history of UCREL In ten minutes, I will present a brief history.
Comparable Corpora BootCaT (CCBC) (or: In Praise of BootCaT) Adam Kilgarriff, Jan Pomikalek, Avinesh PVS Lexical Computing Ltd. Work Supported by EU FP7.
1 Using Corpora in Language Research -also Introduction to the Sketch Engine (WS15) part 1 Adam Kilgarriff Lexical Computing Ltd Universities of Leeds.
1 Evaluating word sketches and corpora Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of Leeds and Sussex.
Corpus Evaluation Adam Kilgarriff Lexical Computing Ltd Corpus evaluationPortsmouth Nov
Malta, May 2010Kilgarriff: Corpora by Web Services1 Corpora by Web Services Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities.
1 CSI 5180: Topics in AI: Natural Language Processing, A Statistical Approach Instructor: Nathalie Japkowicz Objectives of.
Terminology-finding in the Sketch Engine Miloš Jakubíček, Adam Kilgarriff, Vojtěch Kovář, Pavel Rychlý, Vit Suchomel Lexical Computing Ltd., Brighton,
CL 2005, Birmingham Web as Corpus Workshop Intro: Adam Kilgarriff 1 Web as Corpus Workshop Co-chairs: Marco Baroni Adam Kilgarriff Sebastian Hoffman.
Chapter 1 What is Language? When we study human language, we are approaching what some might call the "human essence,” the distinctive qualities of mind.
Introduction to Linguistics Ms. Suha Jawabreh Lecture # 2.
The Sketch Engine as Infrastructure for Large Scale Text Collections for Humanities Research Adam Kilgarriff Lexical Computing Ltd. & Univ of Leeds, UK.
CSA2050 Introduction to Computational Linguistics Lecture 1 Overview.
What you have learned and how you can use it : Grammars and Lexicons Parts I-III.
CSA2050 Introduction to Computational Linguistics Lecture 1 What is Computational Linguistics?
Corpus Linguistics in Research Doctorate in Education University of Warwick 6th November 2008.
Auckland 2012Kilgarriff: NLP and Corpus Processing1 The contribution of NLP: corpus processing.
Introduction Chapter 1 Foundations of statistical natural language processing.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
Grammar is to Meaning as the Law if to Good Behaviour Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of Leeds and Sussex.
1 CPA: Where do we go from here? Research Institute for Information and Language Processing, University of Wolverhampton; UPF Barcelona; University of.
GDEX: Automatically finding good dictionary examples in a corpus Auckland 2012Kilgarriff: GDEX1.
Exploring Variation in Lexis and Genre in the Sketch Engine Adam Kilgarriff Lexical Computing Ltd., UK Supported by EU Project PRESEMT.
Welcome to All S. Course Code: EL 120 Course Name English Phonetics and Linguistics Lecture 1 Introducing the Course (p.2-8) Unit 1: Introducing Phonetics.
GDEX: Automatically finding good dictionary examples in a corpus Kivik 2013Kilgarriff: GDEX1.
Use of Concordancers A corpus (plural corpora) – a large collection of texts, written or spoken, stored on a computer. A concordancer – a computer programme.
Measuring Monolinguality
Making useful wordlists for ELT
Evaluating word sketches and corpora
Introduction to Linguistics
Introduction to Linguistics
Presentation transcript:

Using Corpora in Language Research Adam Kilgarriff Lexical Computing Ltd Universities of Leeds January 2013Adam Kilgarriff

May 2011 Adam Kilgarriff What is language?

May 2011 Adam Kilgarriff What is language? In our heads

May 2011 Adam Kilgarriff What is language? In our heads In texts and sound signals

May 2011 Adam Kilgarriff What is language? In our heads In texts and sound signals Both

May 2011 Adam Kilgarriff Methodology Study language in our heads Competence Chomsky “rationalist” (Descartes, Leibniz)‏

May 2011 Adam Kilgarriff Methodology Study language in our heads Competence Chomsky “rationalist” (Descartes, Leibniz)‏ Odd method for objective science Practical problems: coverage, arbitrariness

May 2011 Adam Kilgarriff Methodology Study text “empiricist” (Locke, Hume)‏ Physics: forces, matter Chemistry: chemicals, bonds Language: text, speech signals

May 2011 Adam Kilgarriff It goes against the grain What is important about a sentence? its meaning Corpus methodology: Throw away individual sentence meaning Find patterns

May 2011 Adam Kilgarriff Computer power Corpora bigger and bigger data sets Language technology tools lemmatizers, POS-taggers, parsers Machine learning, pattern-finding 20 years of rapid ascent

May 2011 Adam Kilgarriff All the linguisticses Theoretical Socio Psycho Developmental Law and Computational Contrastive Applied... linguistics

May 2011 Adam Kilgarriff Developmental CHILDES, TalkBank How children learn language Parents record all interactions Since 1980s Prof. Brian MacWhinney, Carnegie-Mellon Many languages Largest chunk: English, 23m words

May Adam Kilgarriff

May Adam Kilgarriff

May Adam Kilgarriff

May Adam Kilgarriff

May Adam Kilgarriff

May Adam Kilgarriff

May Adam Kilgarriff

May 2011 Adam Kilgarriff Language change Brown family Small but perfectly formed I m words 500 x 2000-word samples the same 15 text types Supports comparison American and British English 1931, 1961, 1991, 2006

May Adam Kilgarriff

May Adam Kilgarriff

May Adam Kilgarriff

May Adam Kilgarriff

May Adam Kilgarriff

May 2011 Adam Kilgarriff Language and gender When you see a dentist... What is now normal? Recent study they now the norm themself now needed despite what spellcheck says BNC (most text from 1989) 0.2/million EnTenTen (mostly 2009) 0.4/million

May 2011 Adam Kilgarriff Language and law Trade marks Hoover and similar trademark or generic Cases sabatier, botox, kettle chips Key evidence Do people tend to capitalize?

May 2011 Adam Kilgarriff English nouns: % capitalized

May 2011 Adam Kilgarriff Syntax and semantics

May Adam Kilgarriff

May Adam Kilgarriff

May 2011 Adam Kilgarriff DANTE Detailed account of English lexis Corpus-driven From word sketches Lexicographers assign to senses High precision Available at

May 2011 Adam Kilgarriff What data shall I use?

May 2011 Adam Kilgarriff Think hard

May 2011 Adam Kilgarriff Sometimes... Just-in-time corpus from the web Use case: Translator, French-to-English Translation task volcanoes In French I understand it OK, but I'm no vulcanologist, I don't know the English terminology BootCaT, Baroni and Bernardini

May Adam Kilgarriff

May Adam Kilgarriff

May Adam Kilgarriff

May Adam Kilgarriff

May Adam Kilgarriff

May Adam Kilgarriff

May Adam Kilgarriff

May Adam Kilgarriff

May 2011 Adam Kilgarriff Corpora in Sketch Engine Access-to-all 60 languages All major world languages Mostly large, web-crawled Various other CHILDES, Brown,... “My corpora” BootCat and other

May 2011 Thank you Adam Kilgarriff