Presentation is loading. Please wait.

Presentation is loading. Please wait.

AQUAINT BBN’s AQUA Project Ana Licuanan, Jonathan May, Scott Miller, Ralph Weischedel, Jinxi Xu 3 December 2002.

Similar presentations


Presentation on theme: "AQUAINT BBN’s AQUA Project Ana Licuanan, Jonathan May, Scott Miller, Ralph Weischedel, Jinxi Xu 3 December 2002."— Presentation transcript:

1 AQUAINT BBN’s AQUA Project Ana Licuanan, Jonathan May, Scott Miller, Ralph Weischedel, Jinxi Xu 3 December 2002

2 2 AQUAINT BBN’s Approach to QA Theme: Use document retrieval, entity recognition, & proposition recognition Analyze the question –Reduce question to propositions and a bag of words –Predict the type of the answer Rank candidate answers using passage retrieval from primary corpus (the Aquaint corpus) Other knowledge sources (e.g. the Web) are optionally used to rerank answers Re-rank candidates based on propositions Estimate confidence for answers

3 3 AQUAINT System Diagram Question Classification Web Search NP Labeling Treebank Name Annotation Name Extraction Parsing Description Classification Proposition Finding Document Retrieval Confidence Estimation Passage Retrieval Question Answer & Confidence Score Name Extraction Regularization Proposition Bank

4 AQUAINT Question Classification

5 5 AQUAINT Question Classification A hybrid approach based on rules and statistical parsing & question templates –Match question templates against statistical parses –Back off to statistical bag-of-word classification Example features used for classification –The type of WHNP starting the question (e.g. “Who”, “What”, “When” …) –The headword of the core NP –WordNet definition –Bag of words –Main verb of the question Performance –TREC8&9 questions for training –~85% when testing on TREC10

6 6 AQUAINT Examples of Question Analysis Where is the Taj Mahal? –WHNP=where –Answer type: Location or GPE Which pianist won the last International Tchaikovsky Competition? –Headword of core NP=pianist, –WordNet definition=person –Answer type: Person

7 7 AQUAINT Question-Answer Types TypeSubtype ORGANIZATION CORPORATION EDUCATIONAL GOVERNMENT HOSPITAL HOTEL MUSEUM OTHER POLITICAL RELIGIOUS LOCATIONCONTINENT LAKE_SEA_OCEAN OTHER REGION RIVER BORDER FACAIRPORT ATTRACTION BRIDGE BUILDING HIGHWAY_STREET OTHER GAME PRODUCTDRUG OTHER VEHICLE WEAPON NATIONALITYNATIONALITY OTHER POLITICAL RELIGION LANGUAGE FAC_DESCAIRPORT ATTRACTION BRIDGE BUILDING HIGHWAY_STREET OTHER MONEY GPE_DESCCITY COUNTRY OTHER STATE_PROVINCE ORG_DESC CORPORATION EDUCATIONAL GOVERNMENT HOSPITAL HOTEL MUSEUM OTHER POLITICAL RELIGIOUS CONTACT_INFOADDRESS OTHER PHONE WORK_OF_ARTBOOK OTHER PAINTING PLAY SONG *Thanks to USC/ISI and IBM groups for sharing the conclusions of their analyses.

8 8 AQUAINT Question Answer Types (cont’d) PRODUCT_DESCOTHER VIHICLE WEAPON PERSON EVENTHURRICAN OTHER WAR SUBSTANCECHEMICAL DRUG FOOD OTHER PER_DESC PRODCUTOTHER ORDINAL ANIMAL QUANTITY 1D 1D_SPACE 2D 2D_SPACE 3D 3D_SPACE ENERGY OTHER SPEED WEIGHT TEMPERATURE GPECITY COUNTRY OTHER STATE_PROVINCE DISEASE CARDINAL AGE TIME PLANT PERCENT LAW DATEAGE DATE DURATION OTHER

9 9 AQUAINT Frequency of Q Types

10 AQUAINT Interpretation

11 11 AQUAINT IdentiFinder TM Status Current IdentiFinder performance on types IdentiFinder easily trainable for other languages, e.g., Arabic and Chinese

12 12 AQUAINT Proposition Indexing A shallow semantic representation –Deeper than bags of words –But broad enough to cover all the text Characterizes documents by –The entities they contain –Propositions involving those entities Resolves all references to entities –Whether named, described, or pronominal Represents all propositions that are directly stated in the text

13 13 AQUAINT Proposition Finding Example Propositions (e1: “Dell”) (e2: “Comaq”) (e3: “the most PCs”) (e4: “2001”) (sold subj:e1, obj:e3, in:e4) (beating subj:e1, obj:e2) Question: Which company sold the most PCs in 2001? Text: Dell, beating Compaq, sold the most PCs in 2001. Passage retrieval would select the wrong answer Answer

14 14 AQUAINT Proposition Recognition Strategy Start with a lexicalized, probabilistic (LPCFG) parsing model Distinguish names by replacing NP labels with NPP Currently, rules normalize the parse tree to produce propositions At a later date, extend the statistical model to –Predict argument labels for clauses –Resolve references to entities

15 15 AQUAINT Confidence Estimation Compute probability P(correct|Q,A) from the following features P(correct|Q,A)  P(correct|type(Q),, PropSat) –type(Q): question type –m: question length –n: number of matched question words in answer context –PropSat: whether answer satisfies propositions in the question Confidence for answers found on the Web P(correct|Q,A)  P(correct|Freq, InTrec) –Freq=Number of Web hits, using Google –InTrec=Whether Q was also a top answer from Aquaint corpus

16 16 AQUAINT Dependence of Answer Correctness on Question Type

17 17 AQUAINT Dependence on Proposition Satisfaction

18 18 AQUAINT Dependence on Number of Matched Words

19 19 AQUAINT Dependence of Answer Correctness on Web Frequency

20 20 AQUAINT Official Results of TREC2002QA RunTag Unranked Average Precision Ranked Average Precision Upper- bound BBN2002A0.1860.2570.498 BBN2002B0.2880.4680.646 BBN2002C0.2840.4990.641 BBN2002A did not use Web BBN2002B&C used Web Unranked average precision=percentage of questions for which the first answer is correct Ranked average precision=Confidence weighted score, the official metric for TREC2002 Upper-bound=confidence weighted score given perfect confidence estimation

21 21 AQUAINT Recent Progress In the last six months, we have: –Retrained our name tagger (IdentiFinder TM ) for roughly 29 question types –Distributed the re-trained English version of IdentiFinder to other sites –Participated in the Question Answering track of TREC 2002 –Participated in a pilot evaluation of automatically answering definitional/biographical questions –Developed a demonstration of our question answering system AQUA against streaming news


Download ppt "AQUAINT BBN’s AQUA Project Ana Licuanan, Jonathan May, Scott Miller, Ralph Weischedel, Jinxi Xu 3 December 2002."

Similar presentations


Ads by Google