1 Lexical Semantics for the Semantic Web Patrick Hanks Masaryk University, Brno Czech Republic UFAL, Mathematics Faculty, Charles University.

Slides:



Advertisements
Similar presentations
May 23, 2004OWL-S straw proposal for SWSL1 OWL-S Straw Proposal Presentation to SWSL Committee May 23, 2004 David Martin Mark Burstein Drew McDermott Deb.
Advertisements

1 Knowledge Representation Introduction KR and Logic.
Artificial Intelligence: Natural Language and Prolog
Chapter 3 Introduction to Quantitative Research
Chapter 3 Introduction to Quantitative Research
Slide 1 of 18 Uncertainty Representation and Reasoning with MEBN/PR-OWL Kathryn Blackmond Laskey Paulo C. G. da Costa The Volgenau School of Information.
Relational Database and Data Modeling
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Building Wordnets Piek Vossen, Irion Technologies.
Ontological Resources and Top-Level Ontologies Nicola Guarino LADSEB-CNR, Padova, Italy
Jeopardy Q 1 Q 2 Q 3 Q 4 Q 5 Q 6Q 16Q 11Q 21 Q 7Q 12Q 17Q 22 Q 8Q 13Q 18 Q 23 Q 9 Q 14Q 19Q 24 Q 10Q 15Q 20Q 25 Final Jeopardy Writing Terms.
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
DIVIDING INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
Module 2 Sessions 10 & 11 Report Writing.
SADC Course in Statistics Tests for Variances (Session 11)
Evaluating Provider Reliability in Risk-aware Grid Brokering Iain Gourlay.
© 2009 IBM Corporation iEA16 Defining and Aligning Requirements using System Architect and DOORs Paul W. Johnson CEO / President Pragmatica Innovations.
LABELING TURKISH NEWS STORIES WITH CRF Prof. Dr. Eşref Adalı ISTANBUL TECHNICAL UNIVERSITY COMPUTER ENGINEERING 1.
Correction, feedback and assessment: Their role in learning
Purpose To create the distribution of contents from the unit three.
Sentiment Analysis and The Fourth Paradigm MSE 2400 EaLiCaRA Spring 2014 Dr. Tom Way.
Safety Cases: Purpose, Process and Prospects John McDermid, OBE FREng University of York UK.
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing Semantics (Chapter 17) Muhammed Al-Mulhem March 1, 2009.
Machine Learning: Intro and Supervised Classification
Science as a Process Chapter 1 Section 2.
What You Should Learn • Represent and classify real numbers.
Dr. Alexandra I. Cristea XHTML.
CILC2011 A framework for structured knowledge extraction and representation from natural language via deep sentence analysis Stefania Costantini Niva Florio.
CHAPTER 15: Tests of Significance: The Basics Lecture PowerPoint Slides The Basic Practice of Statistics 6 th Edition Moore / Notz / Fligner.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 14 From Randomness to Probability.
Critical Reading Section 3: Strategies. PSAT Critical Reading Sections There are TWO 25 minute Critical Reading test Sections on the PSAT that consist.
Psychological Advertising: Exploring User Psychology for Click Prediction in Sponsored Search Date: 2014/03/25 Author: Taifeng Wang, Jiang Bian, Shusen.
1 The Generative Lexicon (GL) meets Corpus Pattern Analysis (CPA) Patrick Hanks Institute of Formal and Applied Linguistics, Charles University in Prague,
1 Why do CPA? Patrick Hanks Research Institute for Information and Language Processing, University of Wolverhampton; Bristol Centre for Linguistics, University.
CL Research ACL Pattern Dictionary of English Prepositions (PDEP) Ken Litkowski CL Research 9208 Gue Road Damascus,
1 Elliptical Arguments Patrick Hanks Institute of Formal and Applied Linguistics, Charles University in Prague, Czech Republic ***
1 Computing Real Language Meaning for the Semantic Web Patrick Hanks Masaryk University, Brno Czech Republic UFAL, Mathematics Faculty,
1 Why the “Word Sense Disambiguation Problem” can't be solved, and what should be done instead Patrick Hanks Masaryk University, Brno Czech Republic
Erasmus University Rotterdam Frederik HogenboomEconometric Institute School of Economics Flavius Frasincar.
References Kempen, Gerard & Harbusch, Karin (2002). Performance Grammar: A declarative definition. In: Nijholt, Anton, Theune, Mariët & Hondorp, Hendri.
Stemming, tagging and chunking Text analysis short of parsing.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
1 Syntagmatic Preferences Patrick Hanks Masaryk University In honour of Yorick Wilks BCS, London, June 22, 2007.
The DVC project: Disambiguation of Verbs by Collocation ____ an introduction to the linguistic theory of norms and exploitations Patrick Hanks Research.
1 Statistical NLP: Lecture 10 Lexical Acquisition.
1 How to Compute the Meaning of Natural Language Utterances Patrick Hanks, Research Institute of Information and Language Processing, University of Wolverhampton.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
Ontology-Based Information Extraction: Current Approaches.
{ The writing process Welcome. In the prewriting stage the follow must be considered:   factual information pertaining to topic   clear definition.
Annotating Words using WordNet Semantic Glosses Julian Szymański Department of Computer Systems Architecture, Faculty of Electronics, Telecommunications.
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
WORD SENSE DISAMBIGUATION STUDY ON WORD NET ONTOLOGY Akilan Velmurugan Computer Networks – CS 790G.
Mining Topic-Specific Concepts and Definitions on the Web Bing Liu, etc KDD03 CS591CXZ CS591CXZ Web mining: Lexical relationship mining.
BAA - Big Mechanism using SIRA Technology Chuck Rehberg CTO at Trigent Software and Chief Scientist at Semantic Insights™
인공지능 연구실 황명진 FSNLP Introduction. 2 The beginning Linguistic science 의 4 부분 –Cognitive side of how human acquire, produce, and understand.
For Monday Read chapter 26 Last Homework –Chapter 23, exercise 7.
WORDS The term word is much more difficult to define in a technical sense, and like many other linguistic terms, there are often arguments about what exactly.
Commonsense Reasoning in and over Natural Language Hugo Liu, Push Singh Media Laboratory of MIT The 8 th International Conference on Knowledge- Based Intelligent.
SIMS 296a-4 Text Data Mining Marti Hearst UC Berkeley SIMS.
1 CPA: Where do we go from here? Research Institute for Information and Language Processing, University of Wolverhampton; UPF Barcelona; University of.
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
Jean-Yves Le Meur - CERN Geneva Switzerland - GL'99 Conference 1.
Selecting Relevant Documents Assume: –we already have a corpus of documents defined. –goal is to return a subset of those documents. –Individual documents.
King Faisal University جامعة الملك فيصل Deanship of E-Learning and Distance Education عمادة التعلم الإلكتروني والتعليم عن بعد [ ] 1 جامعة الملك فيصل عمادة.
Computing Natural-Language Meaning for the Semantic Web
Statistical NLP: Lecture 9
Information Retrieval
Statistical NLP : Lecture 9 Word Sense Disambiguation
Statistical NLP: Lecture 10
Presentation transcript:

1 Lexical Semantics for the Semantic Web Patrick Hanks Masaryk University, Brno Czech Republic UFAL, Mathematics Faculty, Charles University in Prague

Outline of the talk A neglected aspect of Tim Berners-Lees vision: –Introducing semantics to the semantic web –Computing meaning and inferences in free text Patterns in text and how to use them Building a resource that encodes patterns – linking meanings (implicatures) to patterns (not to words) –A pattern dictionary –What does the pattern dictionary look like? –The role of an ontology in a pattern dictionary 2

3 Aims of the Semantic Web To enable computers to manipulate data meaningfully Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully. Berners-Lee et al., Scientific American, 2001

A neglected aspect of Berners-Lees vision Web technology must not discriminate between the scribbled draft and the polished performance. T. Berners-Lee et al., Scientific American, 2001 The vision includes being able to process the meaning and implicatures of free text not just pre-processed tagged texts – Wikis, names, addresses, appointments, and suchlike. 4

5 A paradox Traditional KR systems typically have been centralized, requiring everyone to share exactly the same definition of common concepts such as 'parent' or 'vehicle'. –Berners-Lee et al –Implying that SW is more tolerant? –Apparently not: Human languages thrive when using the same term to mean somewhat different things, but automation does not. - -Ibid.

6 The root of the problem Scientists from Leibniz to the present have wanted word meaning to be precise and certain. –But it isnt. Meaning in natural language is vague and probabilistic Some theoretical linguists (and CL researchers), not liking fuzziness in data, have preferred to disregard data in order to preserve theory Do not allow SW research to fall into this trap To fulfil Berners-Lees dream, we need to be able to compute the meaning of un-pre-processed documents

7 What NOT to do for the SW The meaning of the English noun second is vague: a short unit of time or 1/60 of a minute. –Wait a second. –He looked at her for a second. It is also a very precisely defined technical term in certain scientific contexts – the basic SI unit of time: –the duration of 9,192,631,770 cycles of radiation corresponding to the transition between two hyperfine levels of the ground state of an atom of caesium 133. If we try to stipulate a precise meaning for all terms in advance of using them, well never be able to fulfil the dream – and we will invent an unusable language

8 Precision and vagueness Stipulating a precise definition for an ordinary word such as second removes it from ordinary language. When it is given a precise, stipulative definition, an ordinary word becomes a technical term. An adequate definition of a vague concept must aim not at precision but at vagueness; it must aim at precisely that level of vagueness which characterizes the concept itself. –Wierzbicka 1985, pp.12-13

9 The paradox of natural language Word meaning may be vague and fuzzy, but people use words to make very precise statements This can be done because text meaning is holistic, e.g. –fire in isolation is very ambiguous; –But He fired the bullet that was recovered from the girl's body is not at all ambiguous –Ithaca is ambiguous; –But Ithaca, NY is much less ambiguous. Even the tiniest bit of (relevant) context helps.

10 What is to be done? Process only the (strictly defined) mark-up of documents, not their linguistic content? –And so abandon the dream of enabling computers to manipulate linguistic content? Force humans to conform to formal requirements when writing documents? –Not a serious practical possibility Teach computers to deal with natural language in all its fearful fuzziness? –Maybe this is what we need to do

11 Hypertext and relevance The power of hypertext is that anything can link to anything. –Berners-Lee et al., 2001 Yes, but we need procedures for determining (automatically) what counts as a relevant link, e.g. –Firing a person is relevant to employment law. –Firing a gun is relevant to warfare and armed robbery.

How do we know who is doing what to whom? Through context (a standard, uncontroversial answer) But teasing out relevant context is tricky: –Firing a person: [[Person]] MUST be mentioned –Whereas firing a gun occurs in patterns where neither [[Firearm]] nor [[Projectile]] are mentioned, e.g. –The police fired into the crowd/over their heads/wide. Negative evidence can be important: –He fired cannot mean he dismissed someone from employment Relevant context is cumulative –So correlations among arguments are often needed 12

How to compute meaning for the Semantic Web STEP 1. Identify all the normal patterns of normal utterances by data analysis STEP 2. Develop a resource that says precisely what the basic implicatures of each pattern are, e.g. [[Human]] fire [Adv[Direction]] = [[Human]] causes [[Firearm]] to discharge [[Projectile]] STEP 3. Populate the semantic types in an ontology STEP 4. Develop a linguistic theory that distinguishes norms from exploitations Abandon the received theories of speculative linguists STEP 5. Develop procedures for finding best matches between a free text statement and a pattern. 13

The double helix of language: norms and exploitations A natural language consists of TWO kinds of rule- governed behaviour: –Using words normally –Exploiting the norms We dont even know what the norms of any language are, still less the exploitation rules People have assumed that norms of usage are obvious –But only some of the things that are obvious are true –We need to identify the norms by painstaking empirical analysis of evidence There is not a sharp dividing line between norm and exploitation Todays norm is tomorrows exploitation 14

15 Corpus Pattern Analysis (CPA) 1.Identifies normal usage patterns for each word –Each pattern include a verb, its valencies, and the semantic type(s) of each argument (valence) 2.Associates a meaning (implicature) with each pattern (NOT with each word) 3.Provides a basis for matching occurrences of target words in unseen texts to their nearest pattern (norm) 4.CPA is the basis for a Pattern Dictionary (demo): – –Click on web access in line 1

Focusing arguments by semantic- type alternation You can calm a person, calm a horse, calm someones nerves, fears, or anxiety. –These all activate the same meaning of the verb calm. Anxiety does not have the required semantic type (anxiety is not [[Animate]]) –However, the expected animate argument is present – but only as a possessive. And even if there is no possessive, being an attribute of [[Animate]] is part of the meaning of nerves, fear, anxiety, etc. Regular alternations such as these have a focusing function. They do not activate different senses. Other examples: –Repair a car, repair the engine (of a car), repair the damage –Treat a person, treat her injuries, treat her injured arm 16

17 Ontologies The arguments of CPA patterns are expressed as semantic types, related to a shallow semantic ontology. The term ontology is – has become – highly ambiguous: SW ontologies are, typically, interlinked networks of things like address lists, dates, events, and websites, with html mark-up showing attributes and values They differ from philosophical ontologies, which are theories about the nature of all the things in the universe that exist They also differ from lexical ontologies such as WordNet, which are networks of words with supposed conceptual relations The CPA shallow ontology is a device for grouping semantically similar words together to facilitate meaning processing

The CPA Shallow Ontology The CPA Shallow Ontology is a bag of bags of words Developed, bottom-up, by cluster analysis of corpora: –The nouns that NORMALLY occur in the same syntagmatic slot in relation to a given verb are grouped into a cluster –A cluster of different nouns activate the same meaning of the verb –The cluster is named with a semantic type, e.g. [[Human]], [[Event]], [[Abstract]], [[Artefact]], etc. –Each cluster is compared with similar clusters occurring with other verbs. Each combination of clusters constitutes a lexical set. –Identically named clusters contain slightly different members (lexical items) –Therefore, lexical sets shimmer. 18

The Predictive Power of Lexical Sets EXAMPLE: A noun, meeting has been classified with semantic type [[Event]] at both arrange and attend Suppose meeting is found in the direct object slot after leave or runbut not frequently enough to have been included in a cluster for those verbs in the Ontology However, the patterns [[Human]] leave [[Event]] and [[Human]] run [[Event]] will be found in the Pattern Dictionary – Then there is a high probability that meeting belongs there (even though not listed as typical), activating probable implicatures: – leave = "go away from – run = "organize and cause to function efficiently 19

Phraseology in Computational Linguistics Computational linguists are turning away from word-by- word analysis (the Lego bricks method, inherited from Frege) to phraseological analysis. E.g. –Marine Carpuat and Dekai Wu How phrase sense disambiguation outperforms word sense disambiguation for statistical machine translation. In Proceedings, Conference on Theoretical and Methodological Issues in Machine Translation (TMI 2007). Skovde, Sweden The Pattern Dictionary provides an inventory of patterns –A benchmark for NLP researchers using patterns –A benchmark for introducing semantics to the Semantic Web 20

The English Pattern Dictionary: current status Focuses on verbs –Specifically, the correlations among the lexical and semantic values of the arguments of each sense of each verb 700 verbs analysed so far –400 verbs complete, finalized, checked and released –300 more are work in progress, awaiting checking –There are approximately 6000 verbs in English, so we have done about 10% Shallow ontology in development New lexically driven theory of language, which is precise about the vague phenomenon of language –Hanks (forthcoming): Analysing the Lexicon: Norms and Exploitations. MIT Press 21

The English Pattern Dictionary: the future 5,400 more verbs to analyse (then the adjectives) Develop a different procedure for nouns (noun-y nouns) Finalize the CPA shallow ontology and populate it Pattern dictionaries for other languages –Czech –German (A. Geyken, Berlin) –Italian (E. Jezek, U. of Pavia) Theoretical work: –Typology of exploitations –Implications of CPA for parsing theory –Alternation of semantic types in arguments –Relationship between semantic types and semantic roles –Links between the Pattern Dictionary and FrameNet – 22

23 Conclusions To enable computers to manipulate data meaningfully (the raw data itself, not just tags added to the data), we need: an inventory of patterns of normal usage for each word –a pattern dictionary a theory that distinguishes normal usage from exploitations of norms for rhetorical, poetic, and other purposes pattern-matching procedures: text pattern dictionary a statistical, probabilistic approach to identifying meaning. –Only then will computers be able to compute the meaning of texts, understand the implicatures, translate them, retrieve data from them, and manipulate them in other ways –At that point, we shall be a little closer to realizing Berners-Lees 2001 dream