Introduction to Lexical Semantics Vasileios Hatzivassiloglou University of Texas at Dallas.

Slides:



Advertisements
Similar presentations
Three-Step Database Design
Advertisements

Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Building Wordnets Piek Vossen, Irion Technologies.
So What Does it All Mean? Geospatial Semantics and Ontologies Dr Kristin Stock.
Page 1 Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence Integrating Multiple Data Sources using a Standardized XML.
January 12, Statistical NLP: Lecture 2 Introduction to Statistical NLP.
CS652 Spring 2004 Summary. Course Objectives  Learn how to extract, structure, and integrate Web information  Learn what the Semantic Web is  Learn.
IR & Metadata. Metadata Didn’t we already talk about this? We discussed what metadata is and its types –Data about data –Descriptive metadata is external.
Integrating data sources on the World-Wide Web Ramon Lawrence and Ken Barker U. of Manitoba, U. of Calgary
Semantics For the Semantic Web: The Implicit, the Formal and The Powerful Amit Sheth, Cartic Ramakrishnan, Christopher Thomas CS751 Spring 2005 Presenter:
PSY 369: Psycholinguistics Some basic linguistic theory part3.
Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu.
XML on Semantic Web. Outline The Semantic Web Ontology XML Probabilistic DTD References.
تمرين شماره 1 درس NLP سيلابس درس NLP در دانشگاه هاي ديگر ___________________________ راحله مکي استاد درس: دکتر عبدالله زاده پاييز 85.
Article by: Feiyu Xu, Daniela Kurz, Jakub Piskorski, Sven Schmeier Article Summary by Mark Vickers.
Comments on Guillaume Pitel: “Using bilingual LSA for FrameNet annotation of French text from generic resources” Gerd Fliedner Computational Linguistics.
Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Ontology Learning and Population from Text: Algorithms, Evaluation and Applications Chapters Presented by Sole.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
AQUAINT Kickoff Meeting – December 2001 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
What is a term? A term can be considered the linguistic representation of a concept within a specific field of knowledge. It can consist of a single word.
Claudia Marzi Institute for Computational Linguistics, “Antonio Zampolli” – Italian National Research Council University of Pavia – Dept. of Theoretical.
Modeling Documents by Combining Semantic Concepts with Unsupervised Statistical Learning Author: Chaitanya Chemudugunta America Holloway Padhraic Smyth.
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
CountryData Technologies for Data Exchange SDMX Information Model: An Introduction.
© Copyright 2013 ABBYY NLP PLATFORM FOR EU-LINGUAL DIGITAL SINGLE MARKET Alexander Rylov LTi Summit 2013 Confidential.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
Dimitrios Skoutas Alkis Simitsis
CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.
Terminology and documentation*  Object of the study of terminology:  analysis and description of the units representing specialized knowledge in specialized.
Approaches to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way.
BAA - Big Mechanism using SIRA Technology Chuck Rehberg CTO at Trigent Software and Chief Scientist at Semantic Insights™
Lecture 12 Applications and demos. Building applications Previous lectures have discussed stages in processing: algorithms have addressed aspects of language.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
October 2005CSA3180 NLP1 CSA3180 Natural Language Processing Introduction and Course Overview.
Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,
The interface between model-theoretic and corpus-based semantics
Wordnet - A lexical database for the English Language.
1/21 Automatic Discovery of Intentions in Text and its Application to Question Answering (ACL 2005 Student Research Workshop )
For Friday Finish chapter 24 No written homework.
For Monday Read chapter 26 Last Homework –Chapter 23, exercise 7.
Collocations and Terminology Vasileios Hatzivassiloglou University of Texas at Dallas.
1 Masters Thesis Presentation By Debotosh Dey AUTOMATIC CONSTRUCTION OF HASHTAGS HIERARCHIES UNIVERSITAT ROVIRA I VIRGILI Tarragona, June 2015 Supervised.
Presented By- Shahina Ferdous, Student ID – , Spring 2010.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 1 (03/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Introduction to Natural.
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Feature Assignment LBSC 878 February 22, 1999 Douglas W. Oard and Dagobert Soergel.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Semantic search-based image annotation Petra Budíková, FI MU CEMI meeting, Plzeň,
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Enhancing Text Clustering by Leveraging Wikipedia Semantics.
AQUAINT Mid-Year PI Meeting – June 2002 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
Constructing A Yami Language Lexicon Database from Yami Archiving Projects Meng-Chien Yang(Providence University, Taiwan) D. Victoria Rau(National Chung.
An Adaptive Learning with an Application to Chinese Homophone Disambiguation from Yue-shi Lee International Journal of Computer Processing of Oriental.
LingWear Language Technology for the Information Warrior Alex Waibel, Lori Levin Alon Lavie, Robert Frederking Carnegie Mellon University.
Lexicons, Concept Networks, and Ontologies
Approaches to Machine Translation
Sentiment analysis algorithms and applications: A survey
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
CSC 594 Topics in AI – Applied Natural Language Processing
Statistical NLP: Lecture 9
Approaches to Machine Translation
CSE 635 Multimedia Information Retrieval
Topics for Presentations
Statistical NLP : Lecture 9 Word Sense Disambiguation
Presentation transcript:

Introduction to Lexical Semantics Vasileios Hatzivassiloglou University of Texas at Dallas

What this course is about Recent advances in NLP Advances in the area of “lexical semantics” Semantics = meaning Lexical = related to words

Language Constraints Several mechanisms operate to control allowable messages in a language and their meaning Basic block: a letter / grapheme Letters combine to form morphemes (e.g., re-) and words

Types of constraints Men dogs walks (syntax) Colorless green ideas sleep furiously (semantics) The stock market made a gain (lexical preferences) Discourse/pragmatics –inference, missing information, implicature, appropriateness

Word meaning Partly compositional (derivations) Mostly arbitrary Also not unique, in many ways How to represent a word’s meaning?

Meaning representation Logical form Attributes / properties Relationships with other words –Specialization –Synonymy –Opposition –Meronymy

Polysemy Multiple meanings for a word A central issue for interpreting/understanding text

Contrastive Polysemy Weinreich (1964) (1)a. The bank of the river b. The richest bank in the city (2) a. The defendant approached the bar b. The defendant was in the pub at the bar 25+ senses of bar

Complementary Polysemy (1)The bank raised interest rates yesterday. The store is next to the new bank. (2) Mary painted the door. Mary walked through the door. (3) Sam enjoyed the lamb. The lamb is running on the field.

Metaphor and Metonymy All the world's a stage, And all the men and women merely players They have their exits and their entrances The White House said... The pen is mightier than the sword

Synecdoche, Allegory, Hyperbole Synecdoche –Part for whole: head for cattle –Whole for part: the police, the Pentagon –Species for genus: kleenex –Genus for species: PC

Main Questions How can we model lexical semantics? –Discuss properties or attributes relating to word meaning, constraints on word use How can we learn those properties and constraints? What can we use them for? –Focus on applications in bioinformatics

Dictionaries Representing meaning via definitions, examples Core vocabulary The problem of circular reference Automated construction

Ontologies Representing word meaning via inheritance/specialization Manual and automated construction Domain vs. general ontologies Specific ontologies (PenMan, SENSUS)

Lexical Databases Representing meaning via intersections of concepts and links (semantic nets) WordNet, manual construction and verification Automating lexical relationship extraction Multiple languages

Context as a means for determining lexical relationships A word is known by the company it keeps Statistical tests for word use, compositional preferences Measures for coincidence, estimation issues

Disambiguation Selecting among multiple meanings Dictionary and corpus-based approaches Training and avoiding training data Evaluations (SENSEVAL) Role of domain and discourse Multiple levels

Non-compositional preferences Collocations –Non-compositional (kick the bucket) –Non-substitutable (white wine) –Non-transformable Types of collocations How to find them Domain specialization, translation

Lexical properties Lexical relationships (specialization, synonymy, antonymy, meronymy) Orientation Markedness Domain/register applicability

Semantic Similarity Used for classification, organization, clustering Vector representations of context Similarity based on vector comparison, probabilistic models, LSI Robustness and bias Clustering and content-based smoothing

Orientation and Ordering Semantic orientation or polarity Lexical vs. document level (review) Semantic strength Linguistic scales and implicature

Text mining Using large quantities of unnanotated text for learning lexical properties The web as corpus

Mapping across languages Static mapping (bilingual dictionaries) Dynamic mapping in MT Interlingua representations Statistical transfer

Evaluation Issues Suitable reference standards Agreement between evaluators Avoiding bias

Selectional constraints Preposition/Article selection Text generation Lexical cohesion (for rewriting, but also for selecting words) –math/statistics vs. math/food

Terminology Deciding what is a term Terminological databases Issues of consistency, reference concepts, currency, coverage Automatic detection of terms Constraining and classifying terms Definitions for terms

Bioinformatics Emerging field Meaning of technical terms Disambiguation (e.g., protein/gene) Classification Functional roles Abbreviations

List of topics Dictionaries, ontologies, databases Measures for word coincidence, similarity Disambiguation Collocations Word categorization and clustering Orientation and ordering Text mining, the web as corpus Evaluation Multilingual issues Selectional constraints and cohesion Terminology Bioinformatics