1 Toward Semantics-Based Answer Pinpointing in Webclopedia Eduard Hovy, Ulf Hermjakob, Chin-Yew Lin, Mike Junk, Laurie Gerber, Deepak Ravichandran Information.

Slides:



Advertisements
Similar presentations
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.
Advertisements

CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)
COGEX at the Second RTE Marta Tatu, Brandon Iles, John Slavick, Adrian Novischi, Dan Moldovan Language Computer Corporation April 10 th, 2006.
QA-LaSIE Components The question document and each candidate answer document pass through all nine components of the QA-LaSIE system in the order shown.
Scattergories...The Thinking game 1.Parts of a cell 2.Different foods 3.Part of the human body 4. Any science word 5. Four letter word 6. Things found.
Spectral lines Photon energy Atomic structure Spectral lines Elements in stars Masses of stars Mass-luminosity relation Reading: sections 16.5, 16.7, 6.2.
A Maximum Entropy-based Model for Answer Extraction Dan Shen IGK, Saarland University Supervisors:Prof. Dietrich Klakow Dr. ir. Geert-Jan M. Kruijff.
QA and Language Modeling (and Some Challenges) Eduard Hovy Information Sciences Institute University of Southern California.
TextMap: An Intelligent Question- Answering Assistant Project Members:Ulf Hermjakob Eduard Hovy Chin-Yew Lin Kevin Knight Daniel Marcu Deepak Ravichandran.
Tools for Text Review. Algorithms The heart of computer science Definition: A finite sequence of instructions with the properties that –Each instruction.
Shallow Processing: Summary Shallow Processing Techniques for NLP Ling570 December 7, 2011.
Inducing Information Extraction Systems for New Languages via Cross-Language Projection Ellen Riloff University of Utah Charles Schafer, David Yarowksy.
Learning Surface Text Patterns for a Question Answering System Deepak Ravichandran Eduard Hovy Information Sciences Institute University of Southern California.
Selectively using linguistic resources in the QA Raffaella Bernardi Gilad Mishne Valentin Jijkoun Maarten de Rijke Projects , , ,
Use of Patterns for Detection of Answer Strings Soubbotin and Soubbotin.
NATURAL LANGUAGE TOOLKIT(NLTK) April Corbet. Overview 1. What is NLTK? 2. NLTK Basic Functionalities 3. Part of Speech Tagging 4. Chunking and Trees 5.
Oxford English Dictionary (1989) factoid, n. and a. A. n. Something that becomes accepted as a fact, although it is not (or may not be) true; spec. an.
A Web-based Question Answering System Yu-shan & Wenxiu
The Metric System.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A semantic approach for question classification using.
7 th Grade Science By Matt Olson American College of Education Scenario Questions Credits 7 th Science, Matt Olson, 2014.
Lecture 6 Hidden Markov Models Topics Smoothing again: Readings: Chapters January 16, 2013 CSCE 771 Natural Language Processing.
Abstract Question answering is an important task of natural language processing. Unification-based grammars have emerged as formalisms for reasoning about.
Natural Language Based Reformulation Resource and Web Exploitation for Question Answering Ulf Hermjakob, Abdessamad Echihabi, Daniel Marcu University of.
AQUAINT BBN’s AQUA Project Ana Licuanan, Jonathan May, Scott Miller, Ralph Weischedel, Jinxi Xu 3 December 2002.
Date: 2014/02/25 Author: Aliaksei Severyn, Massimo Nicosia, Aleessandro Moschitti Source: CIKM’13 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Building.
QUESTION AND ANSWERING. Overview What is Question Answering? Why use it? How does it work? Problems Examples Future.
Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.
CIKM Recognition and Classification of Noun Phrases in Queries for Effective Retrieval Wei Zhang 1 Shuang Liu 2 Clement Yu 1
1 The Webclopedia Eduard Hovy, Ulf Hermjakob, Chin-Yew Lin, Mike Junk, Laurie Gerber Information Sciences Institute University of Southern California.
What’s the Matter?!? (Describing Matter Unit). What does our standard say? Students will examine the scientific view of the nature of matter. We will.
Derived Units Foreign Language? Units We discussed standard units like the meter for distance, seconds for time, and gram for mass Today we.
What is... MATTER.
Triplet Extraction from Sentences Technical University of Cluj-Napoca Conf. Dr. Ing. Tudor Mureşan “Jožef Stefan” Institute, Ljubljana, Slovenia Assist.
Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.
August 17, 2005Question Answering Passage Retrieval Using Dependency Parsing 1/28 Question Answering Passage Retrieval Using Dependency Parsing Hang Cui.
Supertagging CMSC Natural Language Processing January 31, 2006.
Chapter 1 Lesson 2 “The Scientific Method” Key Concepts… What is the scientific method? What is the SI System? Why should we learn the SI System?
Chapter 1 Science Skills. Science and Technology “Science” derives from Latin scientia, meaning “knowledge” Science: a system of knowledge and the methods.
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
How to use the Jeopardy Template Copy the presentation to your hard drive. Open up the game board slide, determine the category of questions you want.
Mr C Johnston ICT Teacher
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
Answer Mining by Combining Extraction Techniques with Abductive Reasoning Sanda Harabagiu, Dan Moldovan, Christine Clark, Mitchell Bowden, Jown Williams.
1 Answering English Questions by Computer Jim Martin University of Colorado Computer Science.
EMuse “Rapid Screenplay Prototyping Language” Mark Ayzenshtat Elena Filatova Kristina Holst Vlad Shchogolev.
Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.
Question Classification Don Metzler. What is the proper name for a female walrus? (animal) What is Nicholas Cage's profession? (bio) What is the population.
1 Question Answering and Logistics. 2 Class Logistics  Comments on proposals will be returned next week and may be available as early as Monday  Look.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Measurement Chapter 2. Units in Lab In lab we cannot always measure in SI units. In lab we cannot always measure in SI units.Mass Grams (g) Volume Milliliters.
EXPERIMENT 5 Thermochemistry : Specific Heat of a Metal.
Question Answering Passage Retrieval Using Dependency Relations (SIGIR 2005) (National University of Singapore) Hang Cui, Renxu Sun, Keya Li, Min-Yen Kan,
Matter is Anything that takes up space and has mass. Everything you can touch is made of matter. If it is made of anything, that anything is matter.
Unit 2. Measurement. Do Now  In your own words, what do you think is the difference between:  Accuracy and Precision?
Question Classification Ling573 NLP Systems and Applications April 25, 2013.
Measurement Why Accuracy is Important. Oops! History of Measurement History of Measurement for Kids.
Nature of Science Metric System- International System of Units (SI) (SI)
Creating Database Objects
Kindergarten Science Review Jeopardy Rules This game is designed as part of an end of the year end review of Kindergarten Science topics. It is meant.
CSCE 590 Web Scraping – Information Retrieval
Web IR: Recent Trends; Future of Web Search
Introduction CI612 Compiler Design CI612 Compiler Design.
CPSC 388 – Compiler Design and Construction
Measurements Number followed by a Unit
Measurements Number followed by a Unit from a measuring device
Are you a Treasure Seeker?
Creating Database Objects
Review for the Midterm. Overview (Chapter 1):
Presentation transcript:

1 Toward Semantics-Based Answer Pinpointing in Webclopedia Eduard Hovy, Ulf Hermjakob, Chin-Yew Lin, Mike Junk, Laurie Gerber, Deepak Ravichandran Information Sciences Institute University of Southern California

2 …need semantic and numerical correctness Where are zebras most likely found? —in the dictionary Where do lobsters like to live? — on the table — at the same rate as regular lobsters How many people live in Chile? — nine

3 TREC-9 QA Track, top 5 (50-byte answers) SMU (41.92 bytes) IR with feedback loops, parser with QA patterns, abduction Waterloo (48.75 bytes) IR, Qword extraction from Q, window matcher ISI (20.94 bytes) IR, parser for semantic Qtargets, QA patterns, window matcher IBM (49.43 bytes) IBM (49.28 bytes) Maximum Entropy models of corresp. Q-A ngrams, bags of words, POS tags, QA patterns

4 Overview: Getting semantic correctness What semantic/numerical types are there?  QA Typology How do you find them from the question? And in candidate answers?  parse Qtarget (plus Qargs) How well can you match the two?  answer patterns, Qtarget matches, and fallback

5 QA Typology and semantic hierarchy Q: Where do zebras like to live? What’s the typical home of the zebra? Where do you usually find zebras? A: In Africa East Africa The savannahs of Kenya Q A QA Typology Derived from 17,384 QA pairs WHY-FAMOUS YES:NO ABBREVIATION LOCATION Semantic hierarchy 10,000+ nodes (WordNet and other) allows backoff to more general classes C-LOCATION C-CITY C-STATE-DISTRICT C-STATE C-COUNTRY C-NAMED -ENTITY QA Typology Semantic hierarchy

6 Webclopedia QA Typology Each node with associated patterns of expression of Q and A, as templates

7 Semantic hierarchy Not only classes: many instances of cities, countries, planets, metals, etc. (THING ((AGENT (NAME (FEMALE-FIRST-NAME (EVE MARY...)) (MALE-FIRST-NAME (LAWRENCE SAM...)))) (COMPANY-NAME (BOEING AMERICAN-EXPRESS)) JESUS ROMANOFF...) (ANIMAL-HUMAN (ANIMAL (WOODCHUCK YAK...)) PERSON) (ORGANIZATION (SQUADRON DICTATORSHIP...)) (GROUP-OF-PEOPLE (POSSE CHOIR...)) (STATE-DISTRICT (TIROL MISSISSIPPI...)) (CITY (ULAN-BATOR VIENNA...)) (COUNTRY (SULTANATE ZIMBABWE...)))) (PLACE (STATE-DISTRICT (CITY COUNTRY...)) (GEOLOGICAL-FORMATION (STAR CANYON...)) AIRPORT COLLEGE CAPITOL...) (ABSTRACT (LANGUAGE (LETTER-CHARACTER (A B...))) (QUANTITY (NUMERICAL-QUANTITY INFORMATION-QUANTITY MASS-QUANTITY MONETARY-QUANTITY TEMPORAL-QUANTITY ENERGY-QUANTITY TEMPERATURE-QUANTITY ILLUMINATION-QUANTITY (SPATIAL-QUANTITY (VOLUME-QUANTITY AREA-QUANTITY DISTANCE-QUANTITY))... PERCENTAGE))) (UNIT ((INFORMATION-UNIT (BIT BYTE... EXABYTE)) (MASS-UNIT (OUNCE...)) (ENERGY-UNIT (BTU...)) (CURRENCY-UNIT (ZLOTY PESO...)) (TEMPORAL-UNIT (ATTOSECOND... MILLENIUM)) (TEMPERATURE-UNIT (FAHRENHEIT KELVIN CELCIUS)) (ILLUMINATION-UNIT (LUX CANDELA)) (SPATIAL-UNIT ((VOLUME-UNIT (DECILITER...)) (DISTANCE-UNIT (NANOMETER...)))) (AREA-UNIT (ACRE))... PERCENT)) (TANGIBLE-OBJECT ((FOOD (HUMAN-FOOD (FISH CHEESE...))) (SUBSTANCE ((LIQUID (LEMONADE GASOLINE BLOOD...)) (SOLID-SUBSTANCE (MARBLE PAPER...)) (GAS-FORM-SUBSTANCE (GAS AIR))...)) (INSTRUMENT (DRUM DRILL (WEAPON (ARM GUN))...) (BODY-PART (ARM HEART...)) (MUSICAL-INSTRUMENT (PIANO)))... *GARMENT *PLANT DISEASE)

8 Identifying Qtargets CONTEX parser (Hermjakob 97, 00, 01): –Deterministic Shift-Reduce parser –Learns grammar (parse rules) from treebank –Produces syntactic and semantic labels –Trained on English (augmented Penn Treebank), Korean, Japanese, and German—good performance For QA: train on questions, to identify Qtargets

9 Qtargets identified by parser in TREC-9 (((I-EN-PROPER-PERSON S-PROPER-NAME))) [98] q4.1 Who is the author of the book, "The Iron Lady: A Biography of Margaret Thatcher"? q4.5 What is the name of the managing director of Apricot Computer? (((C-DATE) (C-TEMP-LOC-WITH-YEAR) (C-DATE-RANGE)) ((EQ C-TEMP-LOC))) [66] q4.15 When was London's Docklands Light Railway constructed? q4.22 When did the Jurassic Period end? (((I-EN-NUMERICAL-QUANTITY) (I-ENUM-CARDINAL))) [51] q4.70 How many lives were lost in the China Airlines' crash in Nagoya, Japan? q4.82 How many consecutive baseball games did Lou Gehrig play? (((I-EN-MONETARY-QUANTITY))) [12] q4.2 What was the monetary value of the Nobel Peace Prize in 1989? q4.4 How much did Mercury spend on advertising in 1993? (((Q-DEFINITION))) [35] q4.30 What are the Valdez Principles? q4.115 What is Head Start? (((Q-WHY-FAMOUS-PERSON))) [35] q4.207 What is Francis Scott Key best known for? q4.222 Who is Anubis? (((Q-ABBREVIATION-EXPANSION))) [16] q4.224 What does laser stand for? q4.238 What does the abbreviation OAS stand for? Ontology enables backoff, for weaker/more general type matching

10 Pinpointing the answer Qtarget and parse tree allow pinpointing of exact syntactic expression (TREC-9 answer length <25 bytes) Where is Thimphu? “Three other private tour agencies also have been established in Thimphu, the Bhutanese capital.” LA , the Bhutanese capital LA Bhutanese capital LA capital LA have been established in Thimphu, the Bhutanese S-NP S-PP S-VP inThimphu theBhutanesecapital S-VC havebeenestablished S-NP,

11 Weaknesses Qtargets the parser did not find q4.61 What brand of white rum is made in Cuba? q4.63 What nuclear-powered Russian submarine sank in the Norwegian Sea on April 7, 1989? q4.68 What does El Nino mean in Spanish? q4.79 What did Shostakovich write for Rostropovich? q4.3 What does the Peugeot company make?... Need world knowledge Need more Qtarget rules Need new Qtarget for translation-of Configurations of nodes (QA patterns) “If Van Dyke is the nominee…” “Juan Gomez, former President of Costa Rica…”, “Who led the Branch Davidians?” — answers near “followers of” Need patterns for negation, hypothesis,... Need entailed- word patterns

12 QA Patterns Built 500 QA patterns by hand Performance in TREC-9 disappointing: So: learn QA patterns –Many more answers than questions  noisy channel model, Bayes’ Rule: P(A|Q) = argmax P(Q|A). P(A) –Starting with Location Qs only –Model I: single-node level P(A) = prob that sentence contains node that is a location P(Q|A) = prob that location Q ‘generated’ from sentence Qtarget from parserQtarget correct ~85% QA patternpattern succeeds 5%+ Qtarget with matchmatch succeeds~25% Qword + window fallbackmatch succeeds~15%

13 Extending ontology Testing possibility of learning instantial knowledge Experiment 1: differentiate types of locations (city, state, territory, mountain, water, artifact) in text: –Ran text through BBN’s IdentiFinder –Learned classifying features for each type of location, using Bayesian classifier and C4.5 decision tree –Extended with MemRun—record most confident guesses per text; use those –Decision Tree+MemRun best results Then extract locations from TREC corpus, for ontology

14 Questions?

15 Architecture IR Steps: create query from question (WordNet-expand) retrieve top 1000 documents Engines: MG (Sydney)—(Lin) AT&T (TREC)—(Lin) Segmentation Steps:segment each document into topical segments Engines: fixed-length (not used) TexTiling (Hearst 94)—(Lin) C99 (Choi 00)—(Lin) MAXNET (Lin 00, not used) Ranking Steps: score each sentence in each segment, using WordNet expansion rank segments Engines: FastFinder (Junk) Matching Steps: match general constraint patterns against parse trees match desired semantic type against parse tree elements match desired words against words in sentences Engines: matcher (Junk) Ranking and answer extraction Steps: rank candidate answers extract and format them Engines: part of matcher (Junk) Question parsing Steps: parse question find desired semantic type Engines: IdentiFinder (BBN) CONTEX (Hermjakob) QA typology Categorize QA types in taxonomy (Gerber) Constraint patterns Identify likely answers in relation to other parts of the sentence (Gerber) Retrieve documents Segment documents Rank segments Parse top segments Parse question Input question Match segments against question Rank and prepare answers Create query Output answers Segment Parsing Steps: parse segment sentences Engines: CONTEX (Hermjakob)

16 Candidate answer sentence parse tree: (:SURF "Ouagadougou, the capital of Burkina Faso" :CAT S-NP :LEX "Ouagadougou" :CLASS I-EN-PROPER-PLACE :SUBS (((HEAD) (:SURF "Ouagadougou" :CAT S-NP :LEX "Ouagadougou" :CLASS I-EN-PROPER-PLACE :SUBS (((HEAD) (:SURF "Ouagadougou" :CAT S-PROPER-NAME :LEX "Ouagadougou" :CLASS I-EN-PROPER-PLACE))))) ((DUMMY) (:SURF "," :CAT D-COMMA :LEX "," :SPAN ((35 36)))) ((MOD) (:SURF "the capital of Burkina Faso" :CAT S-NP :LEX "capital" :CLASS I-EN-CAPITAL)))...) Question type constraints: constraint_set(question,1.0) { (sem_equal [question OBJ] I-EN-INTERR-PRONOUN-WHAT) (set x? [question SUBJ]) (sem_equal [question HEAD] I-EV-BE) } Answer type constraints: constraint_set(qtarget2,1.0) { (set answer? [answer.*]) (sem_equal answer? I-EN-PROPER-PLACE) (syn_equal answer? S-PROPER-NAME) }