1 Elliptical Arguments Patrick Hanks Institute of Formal and Applied Linguistics, Charles University in Prague, Czech Republic ***

Slides:



Advertisements
Similar presentations
1 Lexical Semantics for the Semantic Web Patrick Hanks Masaryk University, Brno Czech Republic UFAL, Mathematics Faculty, Charles University.
Advertisements

CODE/ CODE SWITCHING.
Psycholinguistic what is psycholinguistic? 1-pyscholinguistic is the study of the cognitive process of language acquisition and use. 2-The scope of psycholinguistic.
Language Use and Understanding BCS 261 LIN 241 PSY 261 CLASS 12: BRANIGAN ET AL.: PRIMING.
Modality Lecture 10. Language is not merely used for conveying factual information A speaker may wish to indicate a degree of certainty to try to influence.
1 The Generative Lexicon (GL) meets Corpus Pattern Analysis (CPA) Patrick Hanks Institute of Formal and Applied Linguistics, Charles University in Prague,
Mapping meaning onto use: a Pattern Dictionary of English Verbs Patrick Hanks Faculty of Informatics, Masaryk University, Brno, Czech Republic
1 Why do CPA? Patrick Hanks Research Institute for Information and Language Processing, University of Wolverhampton; Bristol Centre for Linguistics, University.
CL Research ACL Pattern Dictionary of English Prepositions (PDEP) Ken Litkowski CL Research 9208 Gue Road Damascus,
Cognitive Linguistics Croft & Cruse 9
Statistical NLP: Lecture 3
Introduction to phrases & clauses
1 Computing Real Language Meaning for the Semantic Web Patrick Hanks Masaryk University, Brno Czech Republic UFAL, Mathematics Faculty,
Language, Mind, and Brain by Ewa Dabrowska Chapter 2: Language processing: speed and flexibility.
PSY 369: Psycholinguistics Some basic linguistic theory part3.
Basic Scientific Writing in English Lecture 3 Professor Ralph Kirby Faculty of Life Sciences Extension 7323 Room B322.
Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.
Syntax.
1. Introduction Which rules to describe Form and Function Type versus Token 2 Discourse Grammar Appreciation.
VERB PHRASE. What are verbs? Verbs provide the focal point of the clause. The main verb in a clause determines the other clause elements that can occur.
Phonetics, Phonology, Morphology and Syntax
Lexical Patterns: from Hornby to Hunston and beyond
Deny A. Kwary Internal Structures of Dictionary Entries.
Albert Gatt LIN 3098 Corpus Linguistics. In this lecture Some more on corpora and grammar Construction Grammar as a theoretical framework Collostructional.
1 Syntagmatic Preferences Patrick Hanks Masaryk University In honour of Yorick Wilks BCS, London, June 22, 2007.
McEnery, T., Xiao, R. and Y.Tono Corpus-based language studies. Routledge. Unit A 2. Representativeness, balance and sampling (pp13-21)
Introduction to English Syntax Level 1 Course Ron Kuzar Department of English Language and Literature University of Haifa Chapter 2 Sentences: From Lexicon.
1 How People use words to make meanings __ How to compute the meaning of natural language utterances Patrick Hanks Professor in Lexicography University.
Linguistics, Pragmatics & Natural Grammar
The DVC project: Disambiguation of Verbs by Collocation ____ an introduction to the linguistic theory of norms and exploitations Patrick Hanks Research.
1 How to Compute the Meaning of Natural Language Utterances Patrick Hanks, Research Institute of Information and Language Processing, University of Wolverhampton.
Linguistics and Language
Word senses Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of Leeds, Sussex.
GDEX: Automatically finding good dictionary examples in a corpus Adam Kilgarriff, Miloš Husák, Katy McAdam, Michael Rundell, Pavel Rychlý Lexical Computing.
1 Brief Review of Research Model / Hypothesis. 2 Research is Argument.
Practice Examples 1-4. Def: Semantics is the study of Meaning in Language  Definite conclusions Can be arrived at concerning meaning.  Careful thinking.
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
The Current State of FrameNet CLFNG June 26, 2006 Fillmore.
Ed 555 Teacher As Researcher. Research In research we are always making decisions about what we see. We are also making decisions about what not to.
Chapter 6. Semantics is the study of the meaning of words, phrases and sentences. In semantic analysis, there is always an attempt to focus on what the.
The Problem page, Coherence, ideology How an ideological message is conveyed through language, and particularly through the following aspects of textual.
Interpretative Theories BASIC IDEAS The social world is a world made up of purposeful actors who acquire, share, and interpret a set of meanings, rules,
LOGIC AND ONTOLOGY Both logic and ontology are important areas of philosophy covering large, diverse, and active research projects. These two areas overlap.
Lecture 1 Sentence Structure. Teaching Contents 1.1. Clause elements 1.1. Clause elements 1.2. Basic clause types and their transformation and expansion.
SPEECH AND WRITING. Spoken language and speech communication In a normal speech communication a speaker tries to influence on a listener by making him:
Linguistic Essentials
Semantic Construction lecture 2. Semantic Construction Is there a systematic way of constructing semantic representation from a sentence of English? This.
1 KINDS OF PARAGRAPH. There are at least seven types of paragraphs. Knowledge of the differences between them can facilitate composing well-structured.
Rules, Movement, Ambiguity
Introduction Chapter 1 Foundations of statistical natural language processing.
CONDITIONAL SENTENCES
Topic and the Representation of Discourse Content
Levels of Linguistic Analysis
Yule: “Words themselves do not refer to anything, people refer” Reference and inference Pragmatics: Reference and inference.
Lecture 1 Ling 442.
Lesson 4 Grammar - Chapter 13.
In this lecture, we will learn about: Translation.
2. The standards of textuality: cohesion Traditional approach to the study of lannguage: sentence as conventional object of study Structuralism (Bloofield,
SEMANTICS DEFINITION: Semantics is the study of MEANING in LANGUAGE Try to get yourself into the habit of careful thinking about your language and the.
Semantic Roles and Ontologies Ontologies Growing interest in the data structures known as ontologies Language expressions covering the.
CONDITIONAL SENTENCES
A common-sense paradigm for linguistic research
Statistical NLP: Lecture 3
CONDITIONAL SENTENCES
CONDITIONAL SENTENCES
Levels of Linguistic Analysis
Traditional Grammar VS. Generative Grammar
Pragmatics: Reference and inference
CONDITIONAL SENTENCES
Deconstructing a text.
Presentation transcript:

1 Elliptical Arguments Patrick Hanks Institute of Formal and Applied Linguistics, Charles University in Prague, Czech Republic ***

Outline of the talk A task for e-lexicographers: identifying syntagmatic patterns (or constructions) in corpora, and establishing what they mean. –Patterns can include quite a lot of variation. We also need a lexically driven theory of language: accounting for rules governing the normal and abnormal uses of words Abnormal uses cause problems for lexical analysts This presentation will discuss two such problems. Conclude with an on-line demo. 2

Corpus Pattern Analysis (CPA) The lexicographical task is to establish how words are used, not just what they mean –Such an investigation must be based on corpus analysis, not guesswork and imagination –Invented examples have a tendency to distort. BUT authenticity alone is not enough –Bizarre authentic examples also distort, e.g.: – “I hazarded various Stuartesque destinations like Florida, Bali, Crete and Western Turkey.” – J. Barnes –“Always vacuum your moose from the snout up.” – Massachusetts Journal of Taxidermy,

The need for patterns We need to establish, through painstaking corpus analysis, the patterns of usage that are associated with each word. And we need a reliable theoretical base: Some mixture of components such as –Herbst et al Valency Dictionary of English. –Fillmore et al.: FrameNet –Miller, Fellbaum: WordNet –Pustejovsky. 1995: The Generative Lexicon. Different patterns of usage around a lexeme activate different meanings. We need to distinguish patterns from abnormal, innovative linguistic behaviour. 4

Empirical recogniton of patterns When you first open a concordance, patterns leap out at you. –Collocations make patterns: one word goes with another –To see how words make meanings, we need to analyse collocations The more you look, the more patterns you see. BUT When you try to formalize the patterns, you start to see more and more exceptions. The boundaries are fuzzy and there are many outlying cases. Speakers and writers exploit the norms of language. 5

The linguistic ‘double-helix’ hypothesis A language is a system of rule-governed behaviour. Not one, but TWO (interlinked) sets of rules: 1.Rules governing the normal uses of words to make meanings 2.Rules governing the exploitation of norms 6

What is a pattern? The verb is the pivot of the clause. A pattern is a statement of the clause structure (valency) associated with a meaning of a verb, –together with typical semantic values of each argument, realized by salient collocates Different semantic values of arguments activate different meanings of each verb. 7

Pattern are contrastive fire, verb 1.[[Human]] fire [[Firearm]] (at [[Phys Obj = Target]]) 2.[[Human]] fire [[Projectile]] (from [[Firearm]]) (at [[Phys Obj = Target]]) 3.[[Human 1]] fire [[Human 2]] 4.[[Anything]] fire [[Human]] {with enthusiasm} 5.[[Human]] fire [NO OBJ].... Etc. 8

Semantic Types and Ontology Items in double square brackets are semantic types. Semantic types are being gathered together into a shallow ontology. –(This is work in progress in the currect CPA project) –Preliminary outline in Pustejovsky, Rumshisky, and Hanks 2004 Each type in the ontology will (eventually) be populated with a set of lexical items on the basis of what’s in the corpus under each relevant pattern. 9

Exploitations People exploit the rules of normal usage for various purposes: For economy and speed: –Conversation is quick –Listeners (and readers) get bored easily –Words that are ‘obvious’ can sometimes be omitted To say new things (reporting discoveries, registering patents,...) To say old things in new ways –For rhetoric, humour, poetry, politics … 10

Anomalous collocates exploit norms “… a brick arrived through my living room window.” — (BNC) M. Grist, Life at the tip. –Normally, people (travellers) and vehicles arrive – not bricks. Whatever the intention, rehabilitation does punish people; in particular, it allows people to be put into institutions where they would rather not be. —(BNC) Bob Roshier, Controlling Crime. –Normally, people punish people – not procedures such as rehabilitation. 11

The null object alternation Earlier in this talk, I said: –“Invented examples have a tendency to distort; Bizarre authentic examples also distort.” Someone might ask, “distort what?” But when I said this, I assumed you know what such examples distort – common knowledge between us – so I don’t need to say it. Omitting – eliding – ‘unnecessary’ words is a very common pattern of linguistic behavior. 12

Ellipsis Absence of an expected collocate is a type of exploitation. –The police fired [[]] into the crowd. –The police fired rubber bullets [[]]. –He gave the order and they fired [[]] [[]]. The valency pattern of this sense of fire, v., requires SUBJECT, OBJECT, and ADVERBIAL: –[[Human]] fire [[Projectile] [Adv[Direction]] Correct description of valency requires syntactic analysis and semantic typing of arguments. 13

Ellipsis and ambiguity Corpus example: Later that morning he changed. –What is the meaning of change here? a?At breakfast he was still wearing a black tie and crumpled dinner jacket from the night before. Later that morning he changed. b? At breakfast he greeted us with a cheerful grin and seemed not to have a care in the world. Later that morning he changed. c? He got on at Köln thinking that it was a through train to Berlin, but the ticket inspector told him that it would terminate at Hannover. Later that morning he changed. 14

Only primary norms are exploited by elision (?) Many small farmers, unable to cultivate successfully, turned to the sale or renting of land. –BUT NOT: *He had many friends in America but in England he was unable to cultivate successfully. We punish too much—and … we imprison too much. –BUT NOT: He offered one to the Englishman, who declined. –“Whatever is reported as having been declined has already been named, mentioned, or indicated with sufficient clarity; so that the reader, arriving at the word declined, need be in no doubt about what would be a suitable object or infinitive clause.” –Sinclair (1991) 15

Types and Qualia in CPA The apparatus needed for analysing nouns is different from that needed for verbs –Plug and socket Verbs need event typing and argument structure Nouns need analysis of their qualia structure [Pustejovsky’s term]: –What sort of thing is it? –What’s it for? –What properties does it have? AND their semantic prosody: is it good or bad? (and if so, for whom?) AND their verb preferences 16

Each argument of each verb is a complex lcp [[Event | Human]] calm [[Animate]] –calm a hysterical patient –calm the horses –But can you *calm a cockroach? Not part of the lcp for “calm [[Animate]]” – not a norm –Calm {[POSDET] {nerves | anxiety} [= properties of [[Animate]] ] –Calm a riot [= behaviour of [[Animate]] ] –Calm the market [[= Location = Activity in Location = Human Group Acting in Location]] 17

Semantic types and semantic roles sentence, v. PATTERN : [[Human 1 = Judge]] sentence [[Human 2 = Convicted Criminal]] to [[{Time Period | Event} = Punishment]] IMPLICATURE : [[Human 1]] SECONDARY IMPLICATURE : [[Time Period]] is a jail sentence EXAMPLE : Mr Woods sentenced Bailey to 7 years. Note that the implicature is “anchored” to the pattern. 18

ON-LINE DEMO (?) Choose Web Access Log-in: guest Password: guest 19

Shimmering lexical sets Lexical sets are not stable – not „all and only”. Example from Hanks and Jezek (2008): –[[Human]] attend [[Event]] –[[Event]] = meeting, wedding, funeral, etc. –But not all events: not thunderstorm, suicide. –and not only events: attend school, attend a clinic Contrast with another pattern for attend: – [[Human 1]] attend [[Human 2 = High Status]] 20

Meanings and boundaries Boundaries of all linguistic and lexical categories are fuzzy. –There are many borderline cases. Instead of fussing about boundaries, we should focus instead on identifying prototypes Then we can decide what goes with what –Many decision will be obvious. –Some decisions – especially about boundary cases – will be arbitrary. 21

The Idiom Principle (Sinclair) In word use, there is tension between the „terminological tendency” and the „phraseological tendency”: –The terminological tendency: the tendency for words to have meaning in isolation –The phraseological tendency: the tendency for the meaning of a word to be activated by the context in which it is used. 22

Current work in progress Hanks (forthcoming): Lexical Analysis: Norms and Exploitations. MIT Press –A corpus-driven, lexically based theory of meaning in language Linked to PDEV (A Pattern Dictionary of English Verbs) by CPA (Corpus Pattern Analysis) –A basic infrastructure resource –468 verbs analyzed and released, freely available – –Experiments with automating the analytical procedure and applying the results for NLP (IR, MT, …) and language teaching (lexical syllabus design) –Building a shallow ontology is in progress 23

Semantic Frames: FrameNet “Word Meanings must be described in relation to semantic frames—schematic representations of the conceptual structures and patterns of beliefs, practices, institutions, images, etc., that provide a foundation for meaningful interaction in a given speech community.” —Fillmore et al. in International Journal of Lexicography 16 (3): p

FrameNet and Valency “Syntactic valence information is usually specified in terms of the phrase type of the possible complements, and in terms of the grammatical functions … expressed in terms of subcategorization frames.” – ibid, p. 236 SOME PROBLEMS WITH THIS: –Aiming at all possible complementation frames of a verb may be too ambitious –Better to aim at all normal complementation frames –In a slot-and-filler grammatical model (Halliday), not a generative model –“Subcategorization” carries theoretical assumptions that may be incompatible with empirical data analysis 25

A methodological problem? a.“ look at examples of one particular word, [How many? How chosen?] b.for each frame element that occurs with that word, look for other words with similar meanings that also take that kind of complement, c.notice which complement types cluster together with groups of meaning-sharing words, d.given two types of complement that both occur with the target word, if one complement regularly occurs with one group of related words, and the other with a different group …, this is strong evidence for a a sense distinction (based on a frame distinction).” —Atkins et al. in IJL 16 (3): p. 255 QUESTION: Does (should?) FrameNet proceed frame by frame? Or verb by verb? Or both at the same time? 26

Thanks The late John Sinclair & colleagues (Cobuild project) Bob Taylor, Marie-Claire van Leunen & the late Digital Equipment Corporation Systems Research Center in Palo Alto (Hector project) James Pustejovsky, Anna Rumshisky, & Brandeis U. Masaryk U., Brno & Karel Pala, Pavel Rychly, and Adam Rambousek Institute of Formal and Applied Linguistics, Charles U., Prague, & Jan Hajic, Martin Holub Various Czech agencies for funding You, for listening 27