Download presentation
Presentation is loading. Please wait.
1
Learning Surface Text Patterns for a Question Answering System Deepak Ravichandran Eduard Hovy Information Sciences Institute University of Southern California
2
From Proceedings of the ACL Conference, 2002
3
Goal Explore power of surface text patterns for open-domain QA systems
4
Why This Paper Fall 2001 NLP project - QA system
5
Winning Team Matt Myers & Henry Longmore –"If we were asked to design another question answering system, we would keep the same basic system as a foundation. We would then use more patterns and variations of patterns in the NE recognizer. We would use Machine Learning techniques, particularly for learning patterns for the NE recognizer."
6
Meanwhile, back at the batcave... Automatic learning of surface text patterns for open-domain question answering
7
Recent Open Domain Systems External knowledge, tools –Named Entity taggers –WordNet –parsers –hand-tagged corpora –ontology lists
8
Recent O-D Systems (cont.) Recent TREC-10 evaluation –winning system used just 1 resource –extensive list of surface patterns –surprised many
9
Basic Idea Investigate potential of surface patterns –Learn patterns –Measure accuracy
10
Characteristic Phrases "When was born” –Typical answers "Mozart was born in 1756.” "Gandhi (1869-1948)...” –Suggests phrases like " was born in ” " ( -” –as Regular Expressions can help locate correct answer
11
Auto-learn Patterns from Web Tagged corpus using AltaVista Hand-crafted examples of each question type Bootstrapping to build large tagged corpus as in Information Extraction (Riloff, 96) Abundance of data on web - reliable statistical estimates
12
The System Assume sentence is a simple sequence of words Search for repeated word orderings Evidence for useful answer phrases
13
System (cont.) Suffix trees to extract substrings of optimal length Suffix trees from Computational Biology (Gusfield, 97) Used to detect DNA sequences Linear time on size of corpus Don't restrict length of substrings
14
Pattern Learning Algorithm Select example for question type –BIRTHYEAR questions select "Mozart 1756” "Mozart" is question term "1756" is answer term Submit Q & A terms to AltaVista Require both terms to be present
15
Pattern Learning (cont.) Download top 1000 documents returned Apply sentence breaker to documents Keep only those sentences with both terms present
16
Pattern Learning (cont.) Terms can be present in various forms –e.g. Mozart as: Wolfgang Amadeus Mozart Mozart, Wolfgang Amadeus Amadeus Mozart Mozart
17
Pattern Learning (cont.) Specify ways in which Q term and A term can be specified in text Easy to do for BIRTHDATE Not so for Q types like DEFINITION –Many acceptable answers, all answers need to be used to ensure high confidence in precision
18
Pattern Learning (cont.) Process (tokenize, smooth whitespace, remove tags, etc.) –simplify input for egrep (or other regular expression tool) Pass sentence through suffix tree constructor –finds substrings (and counts) of all lengths
19
Pattern Learning (cont.) Example: “The great composer Mozart (1756-1791) achieved fame at a young age” “Mozart (1756-1791) was a genius” “The whole world would always be indebted to the great music of Mozart (1756-1791)” –Longest matching substring for all 3 sentences is "Mozart (1756-1791)” –Suffix tree would extract "Mozart (1756-1791)" as an output, with score of 3
20
Pattern Learning (cont.) Filter phrases in suffix tree Keep phrases containing Q & A terms Replace question term with Replace answer term with
21
Pattern Learning (cont.) Repeat with different examples of same question type –“Gandhi 1869”, “Newton 1642”, etc. Some patterns learned for BIRTHDATE –a. born in, –b. was born on, –c. ( - –d. ( - )
22
Pattern Learning (last one!) Strings partly overlapping (c & d) saved separately –Separate counts of occurrence frequencies –Can distinguish (in this case) between pattern for person still living (d) and more general pattern (c)
23
Calculate Precision Submit query to AltaVista using only Q term ("Mozart") Download top 1000 returned documents Segment into sentences as in pattern learning algorithm Keep sentences containing Q term
24
Calculate Precision (cont.) For each pattern learned, check presence of pattern in sentence –pattern with tag matched by any word –pattern with tag matched by correct A term Mozart was born in Mozart was born in 1756
25
Calculate Precision (cont.) Calculate precision of each pattern P = Ca/Co –Ca = total # of patterns w/answer term present –Co = total # of patterns w/answer term replaced by any word Keep only patterns matching sufficient # of examples (e.g. >5)
26
Calculate Precision (cont.) Obtain table of Regular Expression patterns 1 table per question type –Precision of pattern –precision is probability pattern containing answer –principle of maximum likelihood estimation
27
Calculate Precision (cont.) BIRTHDATE table: 1.0 ( - ) 0.85 was born on, 0.6 was born in 0.59 was born 0.53 was born 0.50- ( 0.36 ( -
28
Calculate Precision (cont.) Good range of patterns obtained with as few as 10 examples Rather long list difficult to come up with manually Largest number of examples the system required to get a good range of patterns?
29
Calculate Precision (cont.) Precision of patterns learned from one QA- pair calculated for other examples of same question type Helps eliminate dubious patterns –Contents of two or more sites are the same –Same document appears in search engine output for learning & precision stages
30
Finding Answers To new questions! Use existing QA system (Hovy et al., 2002b;2001) Determine type of new question Identify Question term
31
Finding Answers (cont.) Create query from Q term & do IR –use answer document corpus such as TREC-10 or web search Segment returned documents into sentences & process as before Replace Q term by Q tag –e.g. in case of BIRTHYEAR type
32
Finding Answers (cont.) Using pattern table developed for Q type, search for presence of each pattern Select words matching as potential answer Sort answers by pattern's precision scores Discard duplicate answers (string compare) Return top 5
33
Experiments 6 different Q types –from Webclopedia QA Typology (Hovy et al., 2002a) BIRTHDATE LOCATION INVENTOR DISCOVERER DEFINITION WHY-FAMOUS
34
Experiments (cont.) (BIRTHYEAR - previously shown) INVENTOR 1.0 invents 1.0the was invented by 1.0 invented the in – all have precision of 1.0
35
Experiments (cont.) DISCOVERER 1.0when discovered 1.0 's discovery of 0.9 was discovered by in DEFINITION 1.0 and related 1.0form of, 0.94as, and
36
Experiments (cont.) WHY-FAMOUS 1.0 called 1.0laureate 0.71 is the of LOCATION 1.0 's 1.0regional : : 0.92near in
37
Experiments (cont.) For each Q type, extract questions from TREC-10 set Run through testing phase (precision) Two sets of experiments
38
Experiments (cont.) Set one –TREC corpus is input –IR done by IR component of their QA system (Lin, 2002) Set two –Web is input –IR performed by AltaVista
39
Results Measured by Mean Reciprocal Rank (?) TREC Question type# of Q'sMRR BIRTHYEAR80.48 INVENTOR60.17 DISCOVERER40.13 DEFINITION1020.34 WHY-FAMOUS30.33 LOCATION160.75
40
Results (cont.) Web Q type# of Q’sMRR BIRTHYEAR80.69 INVENTOR60.58 DISCOVERER40.88 DEFINITION1020.39 WHY-FAMOUS30.00 LOCATION160.86
41
Results (cont.) System performs better on web data than on TREC corpus Abundant web data makes it easier for system to locate answers with high precision scores TREC corpus does not have enough candidate answers with high precision score –must settle for answers from low precision patterns WHY-FAMOUS exception - may be due to small # of test Q's
42
Shortcomings & Extensions Need for POS &/or semantic types "Where are the Rocky Mountains?” "Denver's new airport, topped with white fiberglass cones in imitation of the Rocky Mountains in the background, continues to lie empty” in NE tagger &/or ontology could enable system to determine "background" is not a location
43
Shortcomings... (cont.) DEFINITION Q's - match term too general, though correct technically "What is nepotism?”, "...in the form of widespread bureaucratic abuses: graft, nepotism...” "What is sonar?” and related "...while its sonar and related underseas systems are built...”
44
Shortcomings... (cont.) Long distance dependencies "Where is London?” "London, which has one of the most busiest airports in the world, lies on the banks of the river Thames” would require pattern like:, ( )*, lies on –Abundance & variety of Web data helps system to find an instance of patterns w/o losing answers to long distance dependencies
45
Shortcomings... (cont.) More info in patterns regarding length of expected answer phrase –Searches in range of 50 bytes of answer phrase to capture pattern –fails under some conditions "When was Lyndon B. Johnson born?” "...lost to democratic Sen. Lyndon B. Johnson, who ran for both re-election and the vice presidency” -
46
Shortcomings... (cont.) Lacks info that in this case should be exactly replaced by 1 word Could extend system to search for answer in range of 1-2 chunks –basic English phrases, NP, VP, PP, etc.
47
Shortcomings... (cont.) System doesn't work for Q types requiring multiple words from question to be in answer "In which county does the city of Long Beach lie?” "Long Beach is situated in Los Angeles County” required pattern: situated in
48
Shortcomings... (cont.) Performance of system depends greatly on having only 1 anchor word Multiple anchor points –would help eliminate candidate answers –require all anchor words be present in candidate answer sentence
49
Shortcomings... (cont.) Does not use case "What is micron?” "...a spokesman for Micron, a maker of semiconductors, said SIMMs are..." If Micron had been capitalized in question, would be a perfect answer
50
Shortcomings... (cont.) Canonicalization of words BIRTHDATE for Gandhi: 1869; Oct. 2, 1869; 2nd October 1869; October 2 1869; 02 October 1869; etc. –Use date tagger to cluster all variations and tag with same term –Extend idea to smooth out variations in Q term for names: Gandhi, Mahatma Gandhi, Mohandas Karamchand Gandhi, etc.
51
Conclusion Web results easily outperform TREC results Suggests need to integrate outputs from Web & TREC Word count to help eliminate unlikely answers + BIRTHDATE, LOCATION ? DEFINITION
52
Conclusion (cont.) But what about DEFINITION? 102 Q's in TREC Corpus and in Web Most Q's of all types MRR-TREC == 0.34 MRR-Web == 0.39 All other Q's have # < 20, most < 10 If enough Q's are asked, will difference in performance on Web data vs. TREC data diminish?
53
Conclusion (cont.) Simplicity - "perfect" for multilingual system QA –Low resource requirement - no NE taggers, no parsers, no ontologies, etc. –No adaptation of these to new language required –Need to create manual training terms & use appropriate web search engine
54
Regular Expressions from ask_iggy “place called\\s+($cap_pattern+)”“home called\\s+($cap_pattern+)” “at\\s((the)?\\s+($cap_pattern+))” “to\\s+($cap_pattern+)” “place\\s+in\\s+($cap_pattern+)called\\s+($cap_pattern+)” “in\\s+($cap_pattern+)”“up\\s+($cap_pattern+)” “left\\s+($cap_pattern+)” “(($cap_pattern+)[Ii]slands)” “(northern|southern|eastern|western)\\s+($cap_pattern+)” “from\\s+($cap_pattern+)” “far\\s+as\\s+($cap_pattern+)” “place\\s+in\\s+($cap_pattern+)”“home\\s+town” “city\\s+of\\s+($cap_pattern+)” “middle\\s+of\\s+((the)?\\s+($cap_pattern+))” “(($cap_pattern+)[Ii]slands\\s+of\\s+($cap_pattern+))” “place{1,1}d?\\s+near\\s+($cap_pattern+)” “((above|over)\\s+($cap_pattern+))”
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.