Discovery of Inference Rules for Question Answering Dekang Lin and Patrick Pantel (Univ Alberta, CA) [ Lin now at Google, Pantel now at ISI ] as (mis-)interpreted.

Discovery of Inference Rules for Question Answering Dekang Lin and Patrick Pantel (Univ Alberta, CA) [ Lin now at Google, Pantel now at ISI ] as (mis-)interpreted by Peter Clark (Nov 2007)

Preview This is a most remarkable piece of work! –Extent: Hasegawa paraphrase database: 4,600 rules Sekine paraphrase database: 211,000 rules DIRT: 12,000,000 rules –Quality: ~50% are sensible (personal opinion) –Content: not just paraphrases but world knowledge

Overview Problem: There are many ways of saying the same thing “Who is the author of the Star Spangled Banner?” …Francis Scott Key wrote the “Star Spangled Banner” in 1814. …comedian-actress Roseanne Barr sang her famous shrieking rendition of the “Star Spangled Banner” before a San Diego Padres-Cincinnati Reds Game.

Some Manual “Paraphrases” (from ISI)

Problem… there are lots of paraphrases! Can we learn them automatically?

DIRT Method 1.Take a binary relation, e.g., X buys Y. 2.Collect examples of “X buys…”, and “…buys Y”

Method 1.Take a binary relation, e.g., X buys Y. 2.Collect examples of “X buys…”, and “…buys Y” - build a “tuple” database The following can be bought: shares (145) managers (43) stakes (27) power (25) securities (17) people (13) amounts (12) … The following can buy: things (116) people (37) companies (34) Chicago (33) buyers (18) groups (6) …

Method 1.Take a binary relation, e.g., X buys Y. 2.Collect examples of “X buys…”, and “…buys Y” 3.Find other relations with a similar distribution The following can be bought: shares (145) managers (43) stakes (27) power (25) securities (17) people (13) amounts (12) … The following can buy: things (116) people (37) companies (34) Chicago (33) buyers (18) groups (6) … The following can be purchased: shares (56) companies (23) securities (21) power (20) dogs (14) Macs (10) firms (2) … The following can purchase: things (97) companies (23) people (21) businesses (13) governments (5) …

Method 1.Take a binary relation, e.g., X buys Y. 2.Collect examples of “X buys…”, and “…buys Y” 3.Find other relations with a similar distribution The following can be bought: shares (145) managers (43) stakes (27) power (25) securities (17) people (13) amounts (12) … The following can buy: things (116) people (37) companies (34) Chicago (33) buyers (18) groups (6) … The following can be purchased: shares (56) companies (23) securities (21) power (20) dogs (14) Macs (10) firms (2) … The following can purchase: things (97) companies (23) people (21) businesses (13) governments (5) … 4. Collect as rules: “X buys Y” ↔ “X purchases Y”

The Math Can’t simply compare counts –as some words are more frequent than others The following can buy: things (116) people (37) companies (34) Chicago (33) buyers (18) groups (6) … The following can purchase: things (97) companies (23) people (21) businesses (13) governments (5) … “people” counts not very indicative of similarity, as “people” occurs with many relations

The Math Mutual information gives better notion of “importance” The following can buy: things (116) 0.7 people (37) 0.5 companies (34) 8.4 Chicago (33) 8.1 buyers (18) 4.5 groups (6) 2.3 … mi(x,y) = log [ p(x,y) / p(x) p(y) ] P(“company buy”)/p(“company”)p(“buy”) mi(“company buy”) = log [ freq(“company buy *”) x freq(“* * *”) freq(“company * *”) x freq(“* buy *”) Can compute from frequency counts: ]

The Math Now use a similarity measure, and sum over all words… The following can buy: things (116) 0.7 people (37) 0.5 companies (34) 8.4 Chicago (33) 8.1 buyers (18) 4.5 groups (6) 2.3 … The following can purchase: things (97) 0.3 companies (23) 8.2 people (21) 0.2 businesses (13) 4.3 governments (5) 7.6 … sim(“X buy”,”X purchase”) = Sum mi(“X buy”) + Sum mi(“X purchase”) Sum [ mi(“X buy”) + mi(“X purchase”) ] Xs that can “buy” Xs that can “purchase” Xs that can “buy” AND “purchase”

The Math Now use a similarity measure, and sum over all words… The following can buy: things (116) 0.7 people (37) 0.5 companies (34) 8.4 Chicago (33) 8.1 buyers (18) 4.5 groups (6) 2.3 … The following can purchase: things (97) 0.3 companies (23) 8.2 people (21) 0.2 businesses (13) 4.3 governments (5) 7.6 … sim(“X buy”,”X purchase”) = Sum mi(“X buy”) + Sum mi(“X purchase”) Sum [ mi(“X buy”) + mi(“X purchase”) ] Xs that can “buy” Xs that can “purchase” Xs that can “buy” AND “purchase” (.7+.5+8.4+8.1+4.5+2.3) + (.3+8.2+.2+4.3+7.6) (.7+.3) “things” + (8.4+8.2) “companies” + (.5+.2) “people” = = 0.41 (ignoring other words not shown on this page)

The Math Then combine “X verb *” and “* verb Y” scores: (take geometric average) sim(“X buy Y”,”X purchase Y”) = sim(“X buy”,”X purchase”)sim(“buy Y”,” purchase Y”)

The Results

Use for Question-Answering Answer: Yes! I have general knowledge that: IF X is provided with Y THEN X obtains Y Here: X = Iran, Y = materials Thus, here: We are told in T: Iran is provided with materials Thus it follows that: Iran obtains materials In addition, I know: "obtain" and "receive" mean roughly the same thing Hence: Iran received decontamination materials. …China provided Iran with decontamination materials… Question: “Iran received decontamination materials?” DIRT Para- phrase WordNet

;;; T: "William Doyle works for an auction house in Manhattan." ;;; H: "The auction house employs William Doyle?" ;;; DIRT rule "N:for:V V:subj:N -> N:subj:V V:obj:N" fired Yes! I have general knowledge that: IF Y works for X THEN X hires Y Here: X = the house, Y = Doyle Thus, here: We are told in T: Doyle works for the house Thus it follows that: the house hires Doyle In addition, I know: "hire" and "employ" mean roughly the same thing Hence: The auction house employs William Doyle. SCORE!!!!

;;; T: "The technician cooled the room." ;;; H: "The technician was cooled by the room?" ;;; DIRT rule "N:obj:V V:subj:N -> N:subj:V V:obj:N" fired Yes! I have general knowledge that: IF Y cools X THEN X cools Y Here: X = the room, Y = the technician Thus, here: We are told in T: the technician cools the room Thus it follows that: the room cools the technician Hence: The technician was cooled by the room. Rats, WRONG again

Comments: The good… I simplified to just “X verb Y” patterns –In fact does general “X Y” (uses minpar parser to find paths) Finds world knowledge, not just paraphrases –Also finds junk… Amazingly large and (relatively) high quality It helped us (a bit) in our question-answering task –In our (limited) system on 800 test questions DIRT rules were used 59 times (41 right + 18 wrong)

Critique Is limited to “X pattern 1 Y” → “X pattern 2 Y” rules –E.g., out of scope: “John sold a car” → “John received money” No word senses “X turns Y” → “X transforms Y”, “X rolls Y”, “X sells Y” No constraints on the classes (types) X and Y “driven to X by Y” → “Y has a child in X”, “Y provides medical care for Y” –“The president was driven to the memorial by the general.” Lin and Pantel point to this as a possible future direction ~50% is noisy –Antonyms often found to correlate “X loves Y” → “X hates Y” No notion of time, tense, etc “Fred bought a car” → “Fred owns a car” “Fred sold a car” → “Fred owns a car”

Summary and Comments Is this The Answer to AI/Machine Reading? Highly impressive knowledge collection Can suggest implicit (unstated) knowledge in text –but: Learns only one inference pattern –(can imagine extending to others) –Noisy –(can imagine using more data for more reliability) More significantly…. –Rules only tell us atomic things which might happen…

Summary and Comments (cont) Rules only tell us atomic things which might happen –Can reduce the noise, but in the end the rules are only possibilities “Illinois Senator Barack Obama visited the state's most diverse high school to deliver a comprehensive K-12 education plan…” “Obama arrived in the high school” “Obama toured the high school” ? “Obama flew to the high school” ? “Obama told a reporter in the high school” !! “Obama inaugurated the high school” ?! “Obama headed a delegation to the high school” !! “Obama was arrested in the high school” ?! “Obama thanked the high school’s government” “Obama stopped in the high school” !! “Obama held a summit in the high school” !! “Obama was evacuated from the high school” “Obama was accompanied to the high school” …

Summary and Comments (cont) What is still needed: Requires notions of: –Coherence; –macro-sized structures of how things combine (eg “scripts”); –reasoning about the implications; –knowledge integration; –search; –plausibility assessment and uncertainty reasoning constructing a “most coherent representation” (model) from the pieces which the knowledge base suggests.

Summary and Comments (cont) The bottom line: Rule learning from text: –A key bit of the puzzle of AI, overcomes part of the knowledge acquisition bottleneck –Provides much-needed bottom-up input to language understanding BUT: only part of the solution: –provides low-level “knowledge fodder” but not the mechanism for processing it into coherent and useful representations Not the larger-level structures (scripts, typical scenarios) also required for understanding and “organizing the fodder”

Discovery of Inference Rules for Question Answering Dekang Lin and Patrick Pantel (Univ Alberta, CA) [ Lin now at Google, Pantel now at ISI ] as (mis-)interpreted.

Similar presentations

Presentation on theme: "Discovery of Inference Rules for Question Answering Dekang Lin and Patrick Pantel (Univ Alberta, CA) [ Lin now at Google, Pantel now at ISI ] as (mis-)interpreted."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Discovery of Inference Rules for Question Answering Dekang Lin and Patrick Pantel (Univ Alberta, CA) [ Lin now at Google, Pantel now at ISI ] as (mis-)interpreted.

Similar presentations

Presentation on theme: "Discovery of Inference Rules for Question Answering Dekang Lin and Patrick Pantel (Univ Alberta, CA) [ Lin now at Google, Pantel now at ISI ] as (mis-)interpreted."— Presentation transcript:

Similar presentations

About project

Feedback