Download presentation
Presentation is loading. Please wait.
Published byTobias Olden Modified over 9 years ago
1
1 Toward Semantics-Based Answer Pinpointing in Webclopedia Eduard Hovy, Ulf Hermjakob, Chin-Yew Lin, Mike Junk, Laurie Gerber, Deepak Ravichandran Information Sciences Institute University of Southern California
2
2 …need semantic and numerical correctness Where are zebras most likely found? —in the dictionary Where do lobsters like to live? — on the table — at the same rate as regular lobsters How many people live in Chile? — nine
3
3 TREC-9 QA Track, top 5 (50-byte answers) SMU 0.580 (41.92 bytes) IR with feedback loops, parser with QA patterns, abduction Waterloo 0.321 (48.75 bytes) IR, Qword extraction from Q, window matcher ISI 0.318 (20.94 bytes) IR, parser for semantic Qtargets, QA patterns, window matcher IBM-1 0.315 (49.43 bytes) IBM-2 0.309 (49.28 bytes) Maximum Entropy models of corresp. Q-A ngrams, bags of words, POS tags, QA patterns
4
4 Overview: Getting semantic correctness What semantic/numerical types are there? QA Typology How do you find them from the question? And in candidate answers? parse Qtarget (plus Qargs) How well can you match the two? answer patterns, Qtarget matches, and fallback
5
5 QA Typology and semantic hierarchy Q: Where do zebras like to live? What’s the typical home of the zebra? Where do you usually find zebras? A: In Africa East Africa The savannahs of Kenya Q A QA Typology Derived from 17,384 QA pairs WHY-FAMOUS YES:NO ABBREVIATION LOCATION Semantic hierarchy 10,000+ nodes (WordNet and other) allows backoff to more general classes C-LOCATION C-CITY C-STATE-DISTRICT C-STATE C-COUNTRY C-NAMED -ENTITY QA Typology Semantic hierarchy
6
6 Webclopedia QA Typology Each node with associated patterns of expression of Q and A, as templates
7
7 Semantic hierarchy Not only classes: many instances of cities, countries, planets, metals, etc. (THING ((AGENT (NAME (FEMALE-FIRST-NAME (EVE MARY...)) (MALE-FIRST-NAME (LAWRENCE SAM...)))) (COMPANY-NAME (BOEING AMERICAN-EXPRESS)) JESUS ROMANOFF...) (ANIMAL-HUMAN (ANIMAL (WOODCHUCK YAK...)) PERSON) (ORGANIZATION (SQUADRON DICTATORSHIP...)) (GROUP-OF-PEOPLE (POSSE CHOIR...)) (STATE-DISTRICT (TIROL MISSISSIPPI...)) (CITY (ULAN-BATOR VIENNA...)) (COUNTRY (SULTANATE ZIMBABWE...)))) (PLACE (STATE-DISTRICT (CITY COUNTRY...)) (GEOLOGICAL-FORMATION (STAR CANYON...)) AIRPORT COLLEGE CAPITOL...) (ABSTRACT (LANGUAGE (LETTER-CHARACTER (A B...))) (QUANTITY (NUMERICAL-QUANTITY INFORMATION-QUANTITY MASS-QUANTITY MONETARY-QUANTITY TEMPORAL-QUANTITY ENERGY-QUANTITY TEMPERATURE-QUANTITY ILLUMINATION-QUANTITY (SPATIAL-QUANTITY (VOLUME-QUANTITY AREA-QUANTITY DISTANCE-QUANTITY))... PERCENTAGE))) (UNIT ((INFORMATION-UNIT (BIT BYTE... EXABYTE)) (MASS-UNIT (OUNCE...)) (ENERGY-UNIT (BTU...)) (CURRENCY-UNIT (ZLOTY PESO...)) (TEMPORAL-UNIT (ATTOSECOND... MILLENIUM)) (TEMPERATURE-UNIT (FAHRENHEIT KELVIN CELCIUS)) (ILLUMINATION-UNIT (LUX CANDELA)) (SPATIAL-UNIT ((VOLUME-UNIT (DECILITER...)) (DISTANCE-UNIT (NANOMETER...)))) (AREA-UNIT (ACRE))... PERCENT)) (TANGIBLE-OBJECT ((FOOD (HUMAN-FOOD (FISH CHEESE...))) (SUBSTANCE ((LIQUID (LEMONADE GASOLINE BLOOD...)) (SOLID-SUBSTANCE (MARBLE PAPER...)) (GAS-FORM-SUBSTANCE (GAS AIR))...)) (INSTRUMENT (DRUM DRILL (WEAPON (ARM GUN))...) (BODY-PART (ARM HEART...)) (MUSICAL-INSTRUMENT (PIANO)))... *GARMENT *PLANT DISEASE)
8
8 Identifying Qtargets CONTEX parser (Hermjakob 97, 00, 01): –Deterministic Shift-Reduce parser –Learns grammar (parse rules) from treebank –Produces syntactic and semantic labels –Trained on English (augmented Penn Treebank), Korean, Japanese, and German—good performance For QA: train on questions, to identify Qtargets
9
9 Qtargets identified by parser in TREC-9 (((I-EN-PROPER-PERSON S-PROPER-NAME))) [98] q4.1 Who is the author of the book, "The Iron Lady: A Biography of Margaret Thatcher"? q4.5 What is the name of the managing director of Apricot Computer? (((C-DATE) (C-TEMP-LOC-WITH-YEAR) (C-DATE-RANGE)) ((EQ C-TEMP-LOC))) [66] q4.15 When was London's Docklands Light Railway constructed? q4.22 When did the Jurassic Period end? (((I-EN-NUMERICAL-QUANTITY) (I-ENUM-CARDINAL))) [51] q4.70 How many lives were lost in the China Airlines' crash in Nagoya, Japan? q4.82 How many consecutive baseball games did Lou Gehrig play? (((I-EN-MONETARY-QUANTITY))) [12] q4.2 What was the monetary value of the Nobel Peace Prize in 1989? q4.4 How much did Mercury spend on advertising in 1993? (((Q-DEFINITION))) [35] q4.30 What are the Valdez Principles? q4.115 What is Head Start? (((Q-WHY-FAMOUS-PERSON))) [35] q4.207 What is Francis Scott Key best known for? q4.222 Who is Anubis? (((Q-ABBREVIATION-EXPANSION))) [16] q4.224 What does laser stand for? q4.238 What does the abbreviation OAS stand for? Ontology enables backoff, for weaker/more general type matching
10
10 Pinpointing the answer Qtarget and parse tree allow pinpointing of exact syntactic expression (TREC-9 answer length <25 bytes) Where is Thimphu? “Three other private tour agencies also have been established in Thimphu, the Bhutanese capital.” 795.015 LA093090-0040, the Bhutanese capital 798.016 LA093090-0040 Bhutanese capital 798.016 LA093090-0040 capital 810.246 LA093090-0040 have been established in Thimphu, the Bhutanese S-NP S-PP S-VP inThimphu theBhutanesecapital S-VC havebeenestablished S-NP,
11
11 Weaknesses Qtargets the parser did not find q4.61 What brand of white rum is made in Cuba? q4.63 What nuclear-powered Russian submarine sank in the Norwegian Sea on April 7, 1989? q4.68 What does El Nino mean in Spanish? q4.79 What did Shostakovich write for Rostropovich? q4.3 What does the Peugeot company make?... Need world knowledge Need more Qtarget rules Need new Qtarget for translation-of Configurations of nodes (QA patterns) “If Van Dyke is the nominee…” “Juan Gomez, former President of Costa Rica…”, “Who led the Branch Davidians?” — answers near “followers of” Need patterns for negation, hypothesis,... Need entailed- word patterns
12
12 QA Patterns Built 500 QA patterns by hand Performance in TREC-9 disappointing: So: learn QA patterns –Many more answers than questions noisy channel model, Bayes’ Rule: P(A|Q) = argmax P(Q|A). P(A) –Starting with Location Qs only –Model I: single-node level P(A) = prob that sentence contains node that is a location P(Q|A) = prob that location Q ‘generated’ from sentence Qtarget from parserQtarget correct ~85% QA patternpattern succeeds 5%+ Qtarget with matchmatch succeeds~25% Qword + window fallbackmatch succeeds~15%
13
13 Extending ontology Testing possibility of learning instantial knowledge Experiment 1: differentiate types of locations (city, state, territory, mountain, water, artifact) in text: –Ran text through BBN’s IdentiFinder –Learned classifying features for each type of location, using Bayesian classifier and C4.5 decision tree –Extended with MemRun—record most confident guesses per text; use those –Decision Tree+MemRun best results Then extract locations from TREC corpus, for ontology
14
14 Questions?
15
15 Architecture IR Steps: create query from question (WordNet-expand) retrieve top 1000 documents Engines: MG (Sydney)—(Lin) AT&T (TREC)—(Lin) Segmentation Steps:segment each document into topical segments Engines: fixed-length (not used) TexTiling (Hearst 94)—(Lin) C99 (Choi 00)—(Lin) MAXNET (Lin 00, not used) Ranking Steps: score each sentence in each segment, using WordNet expansion rank segments Engines: FastFinder (Junk) Matching Steps: match general constraint patterns against parse trees match desired semantic type against parse tree elements match desired words against words in sentences Engines: matcher (Junk) Ranking and answer extraction Steps: rank candidate answers extract and format them Engines: part of matcher (Junk) Question parsing Steps: parse question find desired semantic type Engines: IdentiFinder (BBN) CONTEX (Hermjakob) QA typology Categorize QA types in taxonomy (Gerber) Constraint patterns Identify likely answers in relation to other parts of the sentence (Gerber) Retrieve documents Segment documents Rank segments Parse top segments Parse question Input question Match segments against question Rank and prepare answers Create query Output answers Segment Parsing Steps: parse segment sentences Engines: CONTEX (Hermjakob)
16
16 Candidate answer sentence parse tree: (:SURF "Ouagadougou, the capital of Burkina Faso" :CAT S-NP :LEX "Ouagadougou" :CLASS I-EN-PROPER-PLACE :SUBS (((HEAD) (:SURF "Ouagadougou" :CAT S-NP :LEX "Ouagadougou" :CLASS I-EN-PROPER-PLACE :SUBS (((HEAD) (:SURF "Ouagadougou" :CAT S-PROPER-NAME :LEX "Ouagadougou" :CLASS I-EN-PROPER-PLACE))))) ((DUMMY) (:SURF "," :CAT D-COMMA :LEX "," :SPAN ((35 36)))) ((MOD) (:SURF "the capital of Burkina Faso" :CAT S-NP :LEX "capital" :CLASS I-EN-CAPITAL)))...) Question type constraints: constraint_set(question,1.0) { (sem_equal [question OBJ] I-EN-INTERR-PRONOUN-WHAT) (set x? [question SUBJ]) (sem_equal [question HEAD] I-EV-BE) } Answer type constraints: constraint_set(qtarget2,1.0) { (set answer? [answer.*]) (sem_equal answer? I-EN-PROPER-PLACE) (syn_equal answer? S-PROPER-NAME) }
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.