The DVC project: Disambiguation of Verbs by Collocation ____ an introduction to the linguistic theory of norms and exploitations Patrick Hanks Research.

Slides:



Advertisements
Similar presentations
1 Lexical Semantics for the Semantic Web Patrick Hanks Masaryk University, Brno Czech Republic UFAL, Mathematics Faculty, Charles University.
Advertisements

ENG 626 CORPUS APPROACHES TO LANGUAGE STUDIES language teaching (1) Bambang Kaswanti Purwo
ENG 626 CORPUS APPROACHES TO LANGUAGE STUDIES language teaching (3) Bambang Kaswanti Purwo
CODE/ CODE SWITCHING.
Language Assessment What it measures and how Jill Kerper Mora, Ed.D.
Uses of a Corpus “[E]xplore actual patterns of language use”
1 The Generative Lexicon (GL) meets Corpus Pattern Analysis (CPA) Patrick Hanks Institute of Formal and Applied Linguistics, Charles University in Prague,
Mapping meaning onto use: a Pattern Dictionary of English Verbs Patrick Hanks Faculty of Informatics, Masaryk University, Brno, Czech Republic
1 Why do CPA? Patrick Hanks Research Institute for Information and Language Processing, University of Wolverhampton; Bristol Centre for Linguistics, University.
CL Research ACL Pattern Dictionary of English Prepositions (PDEP) Ken Litkowski CL Research 9208 Gue Road Damascus,
Cognitive Linguistics Croft & Cruse 9
Semantics Semantics is the branch of linguistics that deals with the study of meaning, changes in meaning, and the principles that govern the relationship.
Does the CEF require different materials or teaching? Hugh Dellar Thomson / The University of Westminster.
Statistical NLP: Lecture 3
1 Computing Real Language Meaning for the Semantic Web Patrick Hanks Masaryk University, Brno Czech Republic UFAL, Mathematics Faculty,
What is a corpus?* A corpus is defined in terms of  form  purpose The word corpus is used to describe a collection of examples of language collected.
January 12, Statistical NLP: Lecture 2 Introduction to Statistical NLP.
CALL 2008 Antwerp Choosing words and their order for vocabulary CALL Cornelia Tschichold Swansea.
Language, Mind, and Brain by Ewa Dabrowska Chapter 2: Language processing: speed and flexibility.
PSY 369: Psycholinguistics Some basic linguistic theory part3.
Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.
Corpus Linguistics Lexicography. Questions for lexicography in corpus linguistics How common are different words? How common are the different senese.
1. Introduction Which rules to describe Form and Function Type versus Token 2 Discourse Grammar Appreciation.
Corpus Linguistics What can a corpus tell us ? Levels of information range from simple word lists to catalogues of complex grammatical structures and.
Sociolinguistics.
Chapter 3: An Introduction to Corpus Linguistics Compiled by: Sajjad Ghadamyari Farhad Ghiasvand Presentation Date: Dec. 8, Monday.
LING 304 SEMANTICS YANBU UNIVERSITY COLLEGE APPLIED LINGUISTICS DEPARTMENT FIRST SEMESTER-131 Prepared by : Ms. Sahar Deknash.
1 Syntagmatic Preferences Patrick Hanks Masaryk University In honour of Yorick Wilks BCS, London, June 22, 2007.
GRAMMAR APPROACH By: Katherine Marzán Concepción EDUC 413 Prof. Evelyn Lugo.
Grammar Focus Phrasal Verbs. Phrasal verbs are idiomatic expressions combining verbs and prepositions to make new verbs whose meaning is often not obvious.
Phrasal Verbs Acquisition through the Analysis of its Particles
1 Figurative Language and its Applications in Language Teaching Instructor: Siaw-Fong Chung Department of English National Chengchi University.
Machine Translation Dr. Radhika Mamidi. What is Machine Translation? A sub-field of computational linguistics It investigates the use of computer software.
Linguistics, Pragmatics & Natural Grammar
Lemmatization Tagging LELA /20 Lemmatization Basic form of annotation involving identification of underlying lemmas (lexemes) of the words in.
1 Statistical NLP: Lecture 10 Lexical Acquisition.
1 How to Compute the Meaning of Natural Language Utterances Patrick Hanks, Research Institute of Information and Language Processing, University of Wolverhampton.
 1.Books  2.CD-ROMs  3.Internet BooksCD-ROMSInternet Advantages  Familiarity  Ownership  Fast retrieval  Lots of information  Light-weight 
WordNet ® and its Java API ♦ Introduction to WordNet ♦ WordNet API for Java Name: Hao Li Uni: hl2489.
Word Sense Disambiguation (WSD)
Vocabulary connections:multi- word items in English.
Lecture 2 What Is Linguistics.
SARA ARMANDA PIZÁ H. LUZ COLEGIO LA SALLE ACAPULCO.
WORD SENSE DISAMBIGUATION STUDY ON WORD NET ONTOLOGY Akilan Velmurugan Computer Networks – CS 790G.
UNIT 7 DEIXIS AND DEFINITENESS
Ms.Lujain Weak forms Chapter 10 Week 12 April
Introduction to CL & NLP CMSC April 1, 2003.
인공지능 연구실 황명진 FSNLP Introduction. 2 The beginning Linguistic science 의 4 부분 –Cognitive side of how human acquire, produce, and understand.
11 Chapter 19 Lexical Semantics. 2 Lexical Ambiguity Most words in natural languages have multiple possible meanings. –“pen” (noun) The dog is in the.
ENG 626 CORPUS APPROACHES TO LANGUAGE STUDIES interpreting concordance lines Bambang Kaswanti Purwo
1 Statistical NLP: Lecture 7 Collocations. 2 Introduction 4 Collocations are characterized by limited compositionality. 4 Large overlap between the concepts.
This approach was developed by British applied linguists from 1930s to 1960s in Great Britain.
March 2006Introduction to Computational Linguistics 1 CLINT Tokenisation.
Corpus-Driven Analysis of Noun Use
Idiomaticity and Translation in the Context of Contemporary Applied Linguistics. Zinaida Camenev, doctor conferenţiar, ULIM, Chişinău,Moldova Olga Pascari,
M ODERN E NGLISH G RAMMAR A ND U SAGE 1.
Corpus search What are the most common words in English
Making it Meaningful  Dialects of American English as YOU see them Dialects of American English  Does everyone speak using a dialect? Information about.
SIMS 296a-4 Text Data Mining Marti Hearst UC Berkeley SIMS.
1 CPA: Where do we go from here? Research Institute for Information and Language Processing, University of Wolverhampton; UPF Barcelona; University of.
Meaning, Phraseology, and Lexicography: A Corpus-Driven Approach Patrick Hanks Research Institute of Information and Language Processing, University of.
In this lecture, we will learn about: Translation.
Introduction to Corpus Linguistics
A common-sense paradigm for linguistic research
Statistical NLP: Lecture 3
SYNTAX.
Phrasal verbs.
Introduction to Corpus Linguistics: Applications Lexicography
Parts of speech - overview
Natural Language Processing
Presentation transcript:

The DVC project: Disambiguation of Verbs by Collocation ____ an introduction to the linguistic theory of norms and exploitations Patrick Hanks Research Institute of Information and Language Processing, University of Wolverhampton 1

Words are very ambiguous; dictionaries are misleading In any dictionary, more than one sense is usually given for each word. –Often, many senses. –For example, in MWALED (Merriam Websters’ Advanced Learner’s English Dictionary) the verb blow has 12 senses, plus 6 subsenses, plus 7 phrasal verbs (each with between 1 and 6 senses), plus 15 idiomatic phrases. –The noun is even more complicated. Dictionaries do not tell the user (a learner or a programmer) how to distinguish one sense of a word from another. WSD (word sense disambiguation) projects in NLP, using dictionaries, have failed, according to leaders in the field (e.g. Ide and Wilks 2006). 2

Phraseological patterns of word use Most utterances consists of words used in familiar patterns, e.g.: –The wind was blowing from the east; –the wind blew the napkin off the table; –the referee blew his whistle for the end of the match; –he blew his nose. –They blew up the bridge; –the bridge blew up. These are examples of phraseological ‘norms’ associated with blow. Unconsciously, ordinary language users repeat the same norms (patterns) over and over again, with minor variations in the various slots in the patterns. –e.g. ‘east’ alternates with ‘west’, ‘north’, ‘south’, etc. 3

Patterns are unambiguous Unlike words, patterns are unambiguous. ‘He blew up a bridge’ and ‘He blew up a balloon’ have quite distinct, unambiguous meanings –even though the words blow, bridge, and balloon can all be ambiguous when taken in isolation, out of context. –The verb is the pivot of the clause. –Each verb is associated with one or more stereotypical phraseological patterns. For NLP and language teaching alike, there is a great need for a dictionary or inventory of normal phraseological patterns. A pattern is a statistical probability, not a cut-and-dried certainty. The aim must be to inventorize all normal usage, not all possible usage. 4

Norms and exploitations The DVC project at RIILP is developing a method (Corpus Pattern Analysis) for identifying and building an inventory of prototypical phraseological norms. Each pattern consists of a syntagmatic structure plus lexical sets of collocations. Understanding meaning depends on matching the wording of an actual utterance with a pattern. –Best match wins! Speakers and writers sometimes exploit norms in various ways, for example to create new metaphors. The DVC project is also studying the rules governing exploitations of phraseological norms. 5