WSD for Applications Bill Dolan SenseEval 2004. Where is WSD useful?  Lots of work in the field, but still no clear answer Where WSD = classical, dictionary-sense.

Slides:



Advertisements
Similar presentations
Language Processing Technology Machines and other artefacts that use language.
Advertisements

LEARNING TO WRITE IN TWO LANGUAGES Professor Anthony Liddicoat University of South Australia Bilingual Schools Network Camberwell PS, March 2013.
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Probabilistic Language Processing Chapter 23. Probabilistic Language Models Goal -- define probability distribution over set of strings Unigram, bigram,
INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING NLP-AI IIIT-Hyderabad CIIL, Mysore ICON DECEMBER, 2003.
For Friday No reading Homework –Chapter 23, exercises 1, 13, 14, 19 –Not as bad as it sounds –Do them IN ORDER – do not read ahead here.
ImageCLEF breakout session Please help us to prepare ImageCLEF2010.
For Monday Read Chapter 23, sections 3-4 Homework –Chapter 23, exercises 1, 6, 14, 19 –Do them in order. Do NOT read ahead.
Research topics Semantic Web - Spring 2007 Computer Engineering Department Sharif University of Technology.
1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.
Introduction to CL Session 1: 7/08/2011. What is computational linguistics? Processing natural language text by computers  for practical applications.
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.
Corpora and Language Teaching
Comments on Guillaume Pitel: “Using bilingual LSA for FrameNet annotation of French text from generic resources” Gerd Fliedner Computational Linguistics.
1 The Web as a Parallel Corpus  Parallel corpora are useful  Training data for statistical MT  Lexical correspondences for cross-lingual IR  Early.
1 Functional Testing Motivation Example Basic Methods Timing: 30 minutes.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
A New Approach for Cross- Language Plagiarism Analysis Rafael Corezola Pereira, Viviane P. Moreira, and Renata Galante Universidade Federal do Rio Grande.
Finding parallel texts on the web using cross-language information retrieval Achim Ruopp Joint work with Fei Xia University of Washington.
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
Writing Research Papers. Research papers are often required of students in high school and in higher education.
Lemmatization Tagging LELA /20 Lemmatization Basic form of annotation involving identification of underlying lemmas (lexemes) of the words in.
The use of machine translation tools for cross-lingual text-mining Blaz Fortuna Jozef Stefan Institute, Ljubljana John Shawe-Taylor Southampton University.
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
CACAO Training Fall Community Assessment of Community Annotation with Ontologies (CACAO)
An Integrated Approach for Arabic-English Named Entity Translation Hany Hassan IBM Cairo Technology Development Center Jeffrey Sorensen IBM T.J. Watson.
CLEF 2004 – Interactive Xling Bookmarking, thesaurus, and cooperation in bilingual Q & A Jussi Karlgren – Preben Hansen –
Hands segmentation Pat Jangyodsuk. Motivation Alternative approach of finding hands Instead of finding bounding box, classify each pixel whether they’re.
Scott Duvall, Brett South, Stéphane Meystre A Hands-on Introduction to Natural Language Processing in Healthcare Annotation as a Central Task for Development.
Introduction to Parsing Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
Patterns and the UML. Patterns? Patterns are structured, packaged problem solutions in literary form Pattern languages are collections of related patterns.
©2003 Paula Matuszek CSC 9010: Text Mining Applications Document Summarization Dr. Paula Matuszek (610)
Relevance Detection Approach to Gene Annotation Aid to automatic annotation of databases Annotation flow –Extraction of molecular function of a gene from.
BAA - Big Mechanism using SIRA Technology Chuck Rehberg CTO at Trigent Software and Chief Scientist at Semantic Insights™
11 Chapter 19 Lexical Semantics. 2 Lexical Ambiguity Most words in natural languages have multiple possible meanings. –“pen” (noun) The dog is in the.
Service Service metadata what Service is who responsible for service constraints service creation service maintenance service deployment rules rules processing.
THE ESSAY From the French ‘essai’ - attempt English ‘assay’ – ‘try’ or ‘to weigh’
Math 104 Calculus I Part 6 INFINITE SERIES. Series of Constants We’ve looked at limits and sequences. Now, we look at a specific kind of sequential limit,
For Monday Read chapter 24, sections 1-3 Homework: –Chapter 23, exercise 8.
For Friday Finish chapter 24 No written homework.
For Monday Read chapter 26 Last Homework –Chapter 23, exercise 7.
National Technical University of Ukraine “Kiev Polytechnic Institute” Heat and energy design faculty Department of automation design of energy processes.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 1 (03/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Introduction to Natural.
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
TypeCraft Software Evaluation 21/02/ :45 Powered by None Complete: 10 On, Partial: 0 Off, Excluded: 0 Off Country: All, Region:
Focus on Writing How to Identify a Good Writing The Writing Process:Pre-Writing The Writing Process:Drafting and Editing Designing Controlled and Guided.
Combining Text and Image Queries at ImageCLEF2005: A Corpus-Based Relevance-Feedback Approach Yih-Cheng Chang Department of Computer Science and Information.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Lesson objective: to prepare for Paper 1 Section A of the English Language exam by understanding the terms purpose & audience and being able to answer.
From Natural Language to LTL: Difficulties Capturing Natural Language Specification in Formal Languages for Automatic Analysis Elsa L Gunter NJIT.
Machine Learning in Practice Lecture 6 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
NATURAL LANGUAGE PROCESSING
Discussion Focus: New Criticism/Formalism & Reader’s Response Theory AN INTRODUCTION TO CRITICAL LITERARY THEORY.
Feature Assignment LBSC 878 February 22, 1999 Douglas W. Oard and Dagobert Soergel.
Computational Models of Discourse Analysis Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Linguistic Graph Similarity for News Sentence Searching
Approaches to Machine Translation
THE QUESTIONS—SKILLS ANALYSE EVALUATE INFER UNDERSTAND SUMMARISE
Text Analytics in ITS 2.0: Annotation of Named Entities
Teaching Listening Based on Active Learning.
This sh*t just got real again!
Approaches to Machine Translation
Language Model Approach to IR
Reading and effective note-making
Information Retrieval
Presentation transcript:

WSD for Applications Bill Dolan SenseEval 2004

Where is WSD useful?  Lots of work in the field, but still no clear answer Where WSD = classical, dictionary-sense resolution

Intuitive Motivations  Automates something we already do with dictionaries  Many applications seem to require WSD Information Retrieval/Question Answering Cross-language information retrieval Information extraction Proofing tools, e.g. synonym replacement Translation

Pragmatic Motivations  Splitting off WSD yields a pleasing division of the NLP problem space  manageable in size  clear success metrics  readily available training data: annotated and unannotated

But where are the applications?  Why is it so hard to find a convincing app?  Hopeful answer: the quality bar just hasn’t been met yet But even experimentally, little/no evidence that WSD helps any application  Alternatively: maybe we’re trying to automate the wrong task Then what is the right task?

An Application-centric view  What do apps actually need? Information Retrieval/Question Answering Cross-language information retrieval Information extraction Proofing tools, e.g. synonym replacement Translation  Not a sense, a cluster of related words, etc. Instead: The ability to map one string into another that’s superficially distinct Regardless of length or language  Paraphrase

Question Answering  The genome of the fungal pathogen that causes Sudden Oak Death has been sequenced by US scientists  Researchers announced Thursday they've completed the genetic blueprint of the blight- causing culprit responsible for sudden oak death  Scientists have figured out the complete genetic code of a virulent pathogen that has killed tens of thousands of California native oaks  The East Bay-based Joint Genome Institute said Thursday it has unraveled the genetic blueprint for the diseases that cause the sudden death of oak trees

Information Extraction  The genome of the fungal pathogen that causes Sudden Oak Death has been sequenced by US scientists  Researchers announced Thursday they've completed the genetic blueprint of the blight- causing culprit responsible for sudden oak death  Scientists have figured out the complete genetic code of a virulent pathogen that has killed tens of thousands of California native oaks  The East Bay-based Joint Genome Institute said Thursday it has unraveled the genetic blueprint for the diseases that cause the sudden death of oak trees

Cross-lingual Information Retrieval  The genome of the fungal pathogen that causes Sudden Oak Death has been sequenced by US scientists  Researchers announced Thursday they've completed the genetic blueprint of the blight- causing culprit responsible for sudden oak death  Scientists have figured out the complete genetic code of a virulent pathogen that has killed tens of thousands of California native oaks  The East Bay-based Joint Genome Institute said Thursday it has unraveled the genetic blueprint for the diseases that cause the sudden death of oak trees

Proofing: rewriting tool  The genome of the fungal pathogen that causes Sudden Oak Death has been sequenced by US scientists  Researchers announced Thursday they've completed the genetic blueprint of the blight- causing culprit responsible for sudden oak death  Scientists have figured out the complete genetic code of a virulent pathogen that has killed tens of thousands of California native oaks  The East Bay-based Joint Genome Institute said Thursday it has unraveled the genetic blueprint for the diseases that cause the sudden death of oak trees

A different take on the problem  What’s missing is a basic enabling technology Paraphrase identification/generation capability  The applications for WSD that have been suggested over the years really need more general paraphrase identification/generation skills Resolving lexical associations is just one aspect of this  Problem begins to look more like an MT problem Map one chunk of text to another, similar or not Not clear that explicit WSD useful

Some Apps  Machine Translation Data-driven techniques predominate, work pretty well  No explicit WSD, just learned associations between bilingual pairings Lexical mappings learned through statistical association  not perfect, but given the right data, pretty good  Different language pairs require different sense breakdowns  Paraphrase/MT are the same problem  Cross-language IR  What else but MT?  Proofing tools, e.g. thesaurus-level replacements  But often not terribly useful; as any writer knows, there’s usually no good synonym, and a complete rewrite is necessary  Question Answering/IR Map a query to a piece of text to semantically similar but potentially formally distinct prose  For all of these apps, problem is less individual words than whole sequences

Direction?  The applications that have been suggested for WSD are all just aspects of the larger paraphrase problem Even MT is a paraphrase problem, though a bit more extreme than the monolingual case  Focus on the broader paraphrase problem, rather than on individual words