1 The Generative Lexicon (GL) meets Corpus Pattern Analysis (CPA) Patrick Hanks Institute of Formal and Applied Linguistics, Charles University in Prague,

Slides:



Advertisements
Similar presentations
Building Wordnets Piek Vossen, Irion Technologies.
Advertisements

1 Lexical Semantics for the Semantic Web Patrick Hanks Masaryk University, Brno Czech Republic UFAL, Mathematics Faculty, Charles University.
ENG 626 CORPUS APPROACHES TO LANGUAGE STUDIES language teaching (1) Bambang Kaswanti Purwo
Definitions of pragmatics
KOS and the Conduct of Science© Straits Knowledge 2011 Knowledge Organisation Systems as Enablers to the Conduct of Science Patrick Lambe.
Mapping meaning onto use: a Pattern Dictionary of English Verbs Patrick Hanks Faculty of Informatics, Masaryk University, Brno, Czech Republic
1 Why do CPA? Patrick Hanks Research Institute for Information and Language Processing, University of Wolverhampton; Bristol Centre for Linguistics, University.
CL Research ACL Pattern Dictionary of English Prepositions (PDEP) Ken Litkowski CL Research 9208 Gue Road Damascus,
1 Corpora for all Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of Leeds and Sussex.
1 Elliptical Arguments Patrick Hanks Institute of Formal and Applied Linguistics, Charles University in Prague, Czech Republic ***
1 Computing Real Language Meaning for the Semantic Web Patrick Hanks Masaryk University, Brno Czech Republic UFAL, Mathematics Faculty,
L EARNERS ’ D ICTIONARY Deny A. Kwary
CALTS, UNIV. OF HYDERABAD. SAP, LANGUAGE TECHNOLOGY CALTS has been in NLP for over a decade. It has participated in the following major projects: 1. NLP-TTP,
Chapter 20: Natural Language Generation Presented by: Anastasia Gorbunova LING538: Computational Linguistics, Fall 2006 Speech and Language Processing.
Consistency of Assessment
What is a corpus?* A corpus is defined in terms of  form  purpose The word corpus is used to describe a collection of examples of language collected.
Cognitive Linguistics Croft & Cruse 10 An overview of construction grammars (part 2, through end)
PSY 369: Psycholinguistics Some basic linguistic theory part3.
Corpora and Language Teaching
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.
Presented by Jennifer Robison TexTESOL II March 12, 2010 San Antonio, TX.
Corpus Linguistics What can a corpus tell us ? Levels of information range from simple word lists to catalogues of complex grammatical structures and.
Chapter 3: An Introduction to Corpus Linguistics Compiled by: Sajjad Ghadamyari Farhad Ghiasvand Presentation Date: Dec. 8, Monday.
Lexical Patterns: from Hornby to Hunston and beyond
Deny A. Kwary Internal Structures of Dictionary Entries.
Memory Strategy – Using Mental Images
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
1 Syntagmatic Preferences Patrick Hanks Masaryk University In honour of Yorick Wilks BCS, London, June 22, 2007.
1 How People use words to make meanings __ How to compute the meaning of natural language utterances Patrick Hanks Professor in Lexicography University.
Linguistics, Pragmatics & Natural Grammar
The DVC project: Disambiguation of Verbs by Collocation ____ an introduction to the linguistic theory of norms and exploitations Patrick Hanks Research.
1 Corpora, Language Technology and Maltese Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd University of Sussex.
1 How to Compute the Meaning of Natural Language Utterances Patrick Hanks, Research Institute of Information and Language Processing, University of Wolverhampton.
Corpus Linguistics meets Lexical Semantic Theory James Pustejovsky Brandeis University University of Pavia December 15, 2004.
Reflections on Using Corpora Data in EFL Teaching CHEN BO Chongqing Jiaotong University 2006.
Word senses Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of Leeds, Sussex.
GDEX: Automatically finding good dictionary examples in a corpus Adam Kilgarriff, Miloš Husák, Katy McAdam, Michael Rundell, Pavel Rychlý Lexical Computing.
Class 3 Corpora in language teaching. Current trends in FLT  Communicative Language Teaching  Trends within CLT authentic language contextualised language.
1 Corpora, Language Technology and Maltese Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd University of Sussex.
Chapter 10 Thinking and Language.
Without data, nothing Adam Kilgarriff Lexical Computing Ltd University of Leeds.
The Communicative Language Teaching Lecture # 18.
The Current State of FrameNet CLFNG June 26, 2006 Fillmore.
Péter Schönhofen – Ad Hoc Hungarian → English – CLEF Workshop 20 Sep 2007 Performing Cross-Language Retrieval with Wikipedia Participation report for Ad.
Modern Lexicography – Developments, Prospects, and Problems Patrick Hanks Research Institute of Information and Language Processing University of Wolverhampton.
Using Corpora in Language Research Adam Kilgarriff Lexical Computing Ltd Universities of Leeds January 2013Adam Kilgarriff.
Introduction to Linguistics Ms. Suha Jawabreh Lecture # 2.
VOCABCHAPTER 10. CONCEPT A mental grouping of similar objects, events, ideas, or people.
IN THE NAME OF GOD IN THE NAME OF GOD. Grammar Grammar Chapter 2 Chapter 2.
What you have learned and how you can use it : Grammars and Lexicons Parts I-III.
This approach was developed by British applied linguists from 1930s to 1960s in Great Britain.
Introduction Chapter 1 Foundations of statistical natural language processing.
WORDS The term word is much more difficult to define in a technical sense, and like many other linguistic terms, there are often arguments about what exactly.
Introduction to Linguistics Ms. Suha Jawabreh Lecture # 1.
1 Corpus Pattern Analysis (CPA) Patrick Hanks Research Institute of Information and Language Processing, University of Wolverhampton ***
Have we had Hard Times or Cosy Times? A Discourse Analysis of Opinions Expressed over Socio-political Events in News Editorials Bal Krishna Bal Information.
1 STO A Lexical Database of Danish for Language Technology Applications Anna Braasch Center for Sprogteknologi Copenhagen SPINN Seminar, October 27, 2001.
Lecture 1 Ling 442.
Lesson 4 Grammar - Chapter 13.
1 CPA: Where do we go from here? Research Institute for Information and Language Processing, University of Wolverhampton; UPF Barcelona; University of.
Genre Knowledge and Genre Analysis: 2 readings on Genre Thanks in part to Dr. Angela Rounsaville, Assistant Professor, Department of Writing and Rhetoric,
In this lecture, we will learn about: Translation.
Semantic Roles and Ontologies Ontologies Growing interest in the data structures known as ontologies Language expressions covering the.
Approaches to teaching English The differences between EAP and General EFL Louis Rogers.
Use of Concordancers A corpus (plural corpora) – a large collection of texts, written or spoken, stored on a computer. A concordancer – a computer programme.
Introduction to RST (Rhetorical Structure Theory)
A common-sense paradigm for linguistic research
Searching corpora.
Introduction to Corpus Linguistics: Applications Lexicography
Corpora, Language Technology and Maltese
Presentation transcript:

1 The Generative Lexicon (GL) meets Corpus Pattern Analysis (CPA) Patrick Hanks Institute of Formal and Applied Linguistics, Charles University in Prague, Czech Republic ***

Where I’m coming from British lexicography (Collins, Oxford) The Firthian tradition in linguistics (Sinclair, Cobuild) 25 years of corpus analysis (CPA) Analysing data, asking what sort of theory accounts for observable patterns, how to map meaning onto use, … Painful discoveries: –Patterns of linguistic behaviour are everywhere in corpora –Meanings can be mapped onto patterns –All too often, speculative linguistic theory (SLT) doesn’t match the evidence, or is not accurately focused GL provides an apparatus for CPA CPA provides empirical support for (some) GL 2

Empirical Recogniton of Patterns When you first open a concordance, patterns start leaping out at you. –Collocations make patterns: one word goes with another –To see how words make meanings, we need to analyse collocations The more you look, the more patterns you see. BUT When you try to formalize the patterns, you start to see more and more exceptions. The boundaries are fuzzy and there are many outlying cases. 3

The linguistic ‘double-helix’ hypothesis A language is a system of rule-governed behaviour. Not one, but TWO (interlinked) sets of rules: 1.Rules governing the normal uses of words to make meanings 2.Rules governing the exploitation of norms 4

Exploitations People exploit the rules of normal usage for various purposes: For economy and speed: –Conversation is quick –Listeners (and readers) get bored easily –Words that are ‘obvious’ can sometimes be omitted To say new things (reporting discoveries, registering patents,...) To say old things in new ways For rhetoric, humour, poetry, politics … 5

Lexicon and prototypes Each word is typically used in one or more patterns of usage (valency + collocations) Each pattern is associated with a meaning: –a meaning is a set of prototypical beliefs –In CPA, meanings are expressed as ‘anchored implicatures’. –few patterns are associated with more than one meaning. Corpus data enables us to discover the patterns that are associated with each word. 6

What is a pattern? The verb is the pivot of the clause. A pattern is a statement of the clause structure (valency) associated with a meaning of a verb, –together with typical semantic values of each argument, realized by salient collocates Different semantic values of arguments activate different meanings of each verb. 7

Pattern are contrastive fire, verb 1.[[Human]] fire [[Firearm]] (at [[Phys Obj = Target]]) 2.[[Human]] fire [[Projectile]] (from [[Firearm]]) (at [[Phys Obj = Target]]) 3.[[Human 1]] fire [[Human 2]] 4.[[Anything]] fire [[Human]] {with enthusiasm} 5.[[Human]] fire [NO OBJ].... (= 1 or 2, not 3 or 4) Etc. 8

Types and Qualia in CPA The apparatus needed for analysing nouns is different from that needed for verbs –Plug and socket Verbs need event typing and argument structure Nouns need qualia –What sort of thing is it? –What’s it for? –What properties does it have? AND –Is it good or bad (and for whom?)? 9

Each argument of each verb is a complex lcp [[Event | Human]] calm [[Animate]] –calm a hysterical patient –calm the horses –But can you *calm a cockroach? Not part of the lcp for “calm [[Animate]]” – not a norm –Calm {[POSDET] {nerves | anxiety} [= properties of [[Animate]] ] –Calm a riot [= behaviour of [[Animate]] ] –Calm the market [[= Location = Activity in Location = Human Group Acting in Location]] 10

Semantic types and semantic roles sentence, v. PATTERN : [[Human 1 = Judge]] sentence [[Human 2 = Convicted Criminal]] to [[{Time Period | Event} = Punishment]] IMPLICATURE : [[Human 1]] SECONDARY IMPLICATURE : [[Time Period]] is a jail sentence EXAMPLE : Mr Woods sentenced Bailey to 7 years. Note that the implicature is “anchored” to the pattern. 11

Semantic Types and Ontology Items in double square brackets are semantic types. Semantic types are being gathered together into a shallow ontology. –(This is work in progress in the currect CPA project) –Preliminary outline in Pustejovsky, Rumshisky, and Hanks 2004 Each type in the ontology will (eventually) be populated with a set of lexical items on the basis of what’s in the corpus under each relevant pattern. 12

Shimmering lexical sets Lexical sets are not stable – not „all and only”. Example from Hanks and Jezek (2008): –[[Human]] attend [[Event]] –[[Event]] = meeting, wedding, funeral, etc. –But not all events: not thunderstorm, suicide. –and not only events: attend school, attend a clinic Contrast with another pattern for attend: – [[Human 1]] attend [[Human 2 = High Status]] 13

Meanings and boundaries Boundaries of all linguistic and lexical categories are fuzzy. –There are many borderline cases. Instead of fussing about boundaries, we should focus instead on identifying prototypes Then we can decide what goes with what –Many decision will be obvious. –Some decisions – especially about boundary cases – will be arbitrary. 14

The Idiom Principle (Sinclair) In word use, there is tension between the „terminological tendency” and the „phraseological tendency”: –The terminological tendency: the tendency for words to have meaning in isolation –The phraseological tendency: the tendency for the meaning of a word to be activated by the context in which it is used. 15

Current work in progress Hanks (forthcoming): Analyzing the Lexicon: Norms and Exploitations. MIT Press –A corpus-driven, lexically based theory of meaning in language Linked to PDEV (A Pattern Dictionary of English Verbs) by CPA (Corpus Pattern Analysis) –A basic infrastructure resource –468 verbs analyzed and released, freely available – –Experiments with automating the analytical procedure and applying the results for NLP (IR, MT, …) and language teaching (lexical syllabus design) –Building a shallow ontology is in progress 16

Thanks The late John Sinclair & colleagues (Cobuild project) Bob Taylor, Marie-Claire van Leunen & the late Digital Equipment Corporation Systems Research Center in Palo Alto (Hector project) James Pustejovsky, Anna Rumshisky, & Brandeis U. Masaryk U., Brno & Karel Pala, Pavel Rychly, and Adam Rambousek Institute of Formal and Applied Linguistics, Charles U., Prague, & Jan Hajic, Martin Holub Various Czech agencies for funding You, for listening 17