Interactive Reasoning in Large and Uncertain RDF Knowledge Bases Martin Theobald Joint work with: Maximilian Dylla, Timm Meiser, Ndapa Nakashole, Christina.

Slides:



Advertisements
Similar presentations
Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 1 SOFIE: A Self-Organizing Framework for Information Extraction Fabian.
Advertisements

CS188: Computational Models of Human Behavior
10 Years of Probabilistic Querying – What Next? Martin Theobald University of Antwerp Joint work with Maximilian Dylla, Sairam Gurajada, Angelika Kimmig,
Understanding Tables on the Web Jingjing Wang. Problem to Solve A wealth of information in the World Wide Web Not easy to access or process by machine.
Research Internships Advanced Research and Modeling Research Group.
Discriminative Training of Markov Logic Networks
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Discriminative Structure and Parameter.
UIUC CS 497: Section EA Lecture #2 Reasoning in Artificial Intelligence Professor: Eyal Amir Spring Semester 2004.
Proofs from SAT Solvers Yeting Ge ACSys NYU Nov
Queries with Difference on Probabilistic Databases Sanjeev Khanna Sudeepa Roy Val Tannen University of Pennsylvania 1.
PAPER BY : CHRISTOPHER R’E NILESH DALVI DAN SUCIU International Conference on Data Engineering (ICDE), 2007 PRESENTED BY : JITENDRA GUPTA.
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 20
Chapter 10: Estimating with Confidence
Markov Logic Networks Instructor: Pedro Domingos.
Querying Probabilistic XML Databases Asma Souihli Oct. 24 th 2012 Network and Computer Science Department.
URDF Query-Time Reasoning in Uncertain RDF Knowledge Bases Ndapandula Nakashole Mauro Sozio Fabian Suchanek Martin Theobald.
A Probabilistic Framework for Information Integration and Retrieval on the Semantic Web by Livia Predoiu, Heiner Stuckenschmidt Institute of Computer Science,
CSE 574 – Artificial Intelligence II Statistical Relational Learning Instructor: Pedro Domingos.
CSE 574: Artificial Intelligence II Statistical Relational Learning Instructor: Pedro Domingos.
Semantics For the Semantic Web: The Implicit, the Formal and The Powerful Amit Sheth, Cartic Ramakrishnan, Christopher Thomas CS751 Spring 2005 Presenter:
1 Discrete Structures CS 280 Example application of probability: MAX 3-SAT.
Computer vision: models, learning and inference Chapter 10 Graphical Models.
1 Probabilistic/Uncertain Data Management -- IV 1.Dalvi, Suciu. “Efficient query evaluation on probabilistic databases”, VLDB’ Sen, Deshpande. “Representing.
Modeling Missing Data in Distant Supervision for Information Extraction Alan Ritter Luke Zettlemoyer Mausam Oren Etzioni 1.
Propositional Logic Reasoning correctly computationally Chapter 7 or 8.
DBrev: Dreaming of a Database Revolution Gjergji Kasneci, Jurgen Van Gael, Thore Graepel Microsoft Research Cambridge, UK.
Real-time population of Knowledge Bases: Opportunities and Challenges Ndapa Nakashole Gerhard Weikum AKBC Workshop at NAACL 2012.
1  Special Cases:  Query Semantics: (“Marginal Probabilities”)  Run query Q against each instance D i ; for each answer tuple t, sum up the probabilities.
Markov Logic And other SRL Approaches
1 Passive Network Tomography Using Bayesian Inference Lili Qiu Joint work with Venkata N. Padmanabhan and Helen J. Wang Microsoft Research Internet Measurement.
Logical Agents Logic Propositional Logic Summary
CPSC 322, Lecture 23Slide 1 Logic: TD as search, Datalog (variables) Computer Science cpsc322, Lecture 23 (Textbook Chpt 5.2 & some basic concepts from.
Learning to “Read Between the Lines” using Bayesian Logic Programs Sindhu Raghavan, Raymond Mooney, and Hyeonseo Ku The University of Texas at Austin July.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Natural Language Questions for the Web of Data 1 Mohamed Yahya, Klaus Berberich, Gerhard Weikum Max Planck Institute for Informatics, Germany 2 Shady Elbassuoni.
Most of contents are provided by the website Introduction TJTSD66: Advanced Topics in Social Media Dr.
Modeling Speech Acts and Joint Intentions in Modal Markov Logic Henry Kautz University of Washington.
CPSC 422, Lecture 11Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11 Oct, 2, 2015.
KNOWLEDGE BASED SYSTEMS
1 NAGA: Searching and Ranking Knowledge Gjergji Kasneci Joint work with: Fabian M. Suchanek, Georgiana Ifrim, Maya Ramanath, and Gerhard Weikum.
HANDLING UNCERTAINTY IN INFORMATION EXTRACTION Maurice van Keulen and Mena Badieh Habib URSW 23 Oct 2011.
DeepDive Model Dongfang Xu Ph.D student, School of Information, University of Arizona Dec 13, 2015.
1 Scalable Probabilistic Databases with Factor Graphs and MCMC Michael Wick, Andrew McCallum, and Gerome Miklau VLDB 2010.
CSE 473 Uncertainty. © UW CSE AI Faculty 2 Many Techniques Developed Fuzzy Logic Certainty Factors Non-monotonic logic Probability Only one has stood.
Daniel Kroening and Ofer Strichman 1 Decision Procedures An Algorithmic Point of View Basic Concepts and Background.
Knowledge Repn. & Reasoning Lecture #9: Propositional Logic UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005.
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
An Introduction to Markov Logic Networks in Knowledge Bases
Hardware Acceleration of A Boolean Satisfiability Solver
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 30
Probabilistic Data Management
Non-Standard-Datenbanken
Emergence of Intelligent Machines: Challenges and Opportunities
Chapter 10: Estimating with Confidence
Chapter 8: Estimating with Confidence
Probabilistic Databases
Junghoo “John” Cho UCLA
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Non-Standard-Datenbanken
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Aiming at prize for brilliant idea the world is not ready for.
Chapter 8: Estimating with Confidence
Markov Networks.
Chapter 8: Estimating with Confidence
Probabilistic Databases with MarkoViews
basic probability and bayes' rule
Presentation transcript:

Interactive Reasoning in Large and Uncertain RDF Knowledge Bases Martin Theobald Joint work with: Maximilian Dylla, Timm Meiser, Ndapa Nakashole, Christina Tefliuodi, Yafang Wang, Mohamed Yahya, Mauro Sozio, and Fabian Suchanek Max Planck Institute Informatics

French Marriage Problem... marriedTo: person  person marriedTo: person  person marriedTo_French: person  person marriedTo_French: person  person 2  x,y,z: marriedTo(x,y)  marriedTo(x,z)  y=z  x,y,z: marriedTo(x,y)  marriedTo(x,z)  y=z

French Marriage Problem Facts in KB: New facts or fact candidates: marriedTo (Hillary, Bill) marriedTo (Carla, Nicolas) marriedTo (Angelina, Brad) marriedTo (Cecilia, Nicolas) marriedTo (Carla, Benjamin) marriedTo (Carla, Mick) marriedTo (Michelle, Barack) marriedTo (Yoko, John) marriedTo (Kate, Leonardo) marriedTo (Carla, Sofie) marriedTo (Larry, Google) 1)for recall: pattern-based harvesting 2)for precision: consistency reasoning 1)for recall: pattern-based harvesting 2)for precision: consistency reasoning 3  x,y,z: marriedTo(x,y)  marriedTo(x,z)  y=z

Agenda – URDF: Reasoning in Uncertain Knowledge Bases Resolving uncertainty at query-time Lineage of answers Propositional vs. probabilistic reasoning Temporal reasoning extensions – UViz: The URDF Visualization Frontend Demo! 4

URDF: Reasoning in Uncertain KB’s Knowledge harvesting from the Web may yield knowledge bases which are – Incomplete bornIn(Albert_Einstein,?x)  {} – Incorrect bornIn(Albert_Einstein,?x)  {Stuttgart} – Inconsistent bornIn(Albert_Einstein,?x)  {Ulm, Stuttgart} Combine grounding of first-order logic rules with additional step of consistency reasoning – Propositional – Constrained Weighted MaxSat – Probabilistic – Lineage & Possible Worlds Semantics  At query time! 5 [Theobald,Sozio,Suchanek,Nakashole: MPII Tech-Report‘10]

Soft Rules vs. Hard Constraints (Soft) Inference Rules vs. (Hard) Consistency Constraints People may live in more than one place livesIn(x,y)  marriedTo(x,z)  livesIn(z,y) livesIn(x,y)  hasChild(x,z)  livesIn(z,y) People are not born in different places/on different dates bornIn(x,y)  bornIn(x,z)  y=z People are not married to more than one person (at the same time, in most countries?) marriedTo(x,y,t1)  marriedTo(x,z,t2)  y≠z  disjoint(t1,t2 ) 6 [0.6] [0.2]

Soft Rules vs. Hard Constraints (ct’d) Enforce FD‘s (e.g., mutual exclusion) as hard constraints: Generalize to other forms of constraints: Hard constraint Soft constraint hasAdvisor(x,y)  graduatedInYear(x,t)  graduatedInYear(y,s)  s < t firstPaper(x,p)  firstPaper(y,q)  author(p,x)  author(p,y)  inYear(p) > inYear(q)+5years  hasAdvisor(x,y) [0.6] livesIn(x,y)  type(y,City)  locatedIn(y,z)  type(z,Country)  livesIn(x,z) hasAdvisor(x,y)  hasAdvisor(x,z)  y=z Combine soft and hard constraints No longer regular MaxSat Constrained (weighted) MaxSat instead Combine soft and hard constraints No longer regular MaxSat Constrained (weighted) MaxSat instead 7 Datalog-style grounding (deductive & potentially recursive soft rules) Datalog-style grounding (deductive & potentially recursive soft rules)

Deductive Grounding (SLD Resolution/Datalog) \/ R1 R3 R2 RDF Base Facts F1: marriedTo(Bill, Hillary) F2: represents(Hillary, New_York) F3: governorOf(Bill, Arkansas) RDF Base Facts F1: marriedTo(Bill, Hillary) F2: represents(Hillary, New_York) F3: governorOf(Bill, Arkansas) /\ F1 \/ R2 R3 R1 F2 X X F3 … X X X X Answers (derived facts): livesIn(Bill, Arkansas) livesIn(Bill, New_York) Answers (derived facts): livesIn(Bill, Arkansas) livesIn(Bill, New_York) 8 Query livesIn(Bill, ?x) Query livesIn(Bill, ?x) 8 First-Order Rules (Horn clauses) R1: livesIn(?x, ?y) :- marriedTo(?x, ?z), livesIn(?z, ?y) R2: livesIn(?x, ?y) :- represents(?x, ?y) R3: livesIn(?x, ?y) :- governorOf(?x, ?y) First-Order Rules (Horn clauses) R1: livesIn(?x, ?y) :- marriedTo(?x, ?z), livesIn(?z, ?y) R2: livesIn(?x, ?y) :- represents(?x, ?y) R3: livesIn(?x, ?y) :- governorOf(?x, ?y)

URDF: Reasoning Example Rules hasAdvisor(x,y)  worksAt(y,z)  graduatedFrom(x,z) [0.4] graduatedFrom(x,y)  graduatedFrom(x,z)  x=z Rules hasAdvisor(x,y)  worksAt(y,z)  graduatedFrom(x,z) [0.4] graduatedFrom(x,y)  graduatedFrom(x,z)  x=z Jeff Stanford University type [1.0] Surajit Princeton David Computer Scientist Computer Scientist worksAt [0.9] type [1.0] graduatedFrom [0.6] graduatedFrom [0.7] graduatedFrom [0.9] hasAdvisor [0.8] hasAdvisor [0.7] 9 KB: Base Facts Derived Facts gradFr(Surajit,Stanford) gradFr(David,Stanford) Derived Facts gradFr(Surajit,Stanford) gradFr(David,Stanford) graduatedFrom [?]

URDF: CNF Construction & MaxSat Solving 10 [Theobald,Sozio,Suchanek,Nakashole: MPII Tech-Report‘10] Query graduatedFrom(?x,?y) Query graduatedFrom(?x,?y) CNF (graduatedFrom(Surajit, Stanford)  graduatedFrom(Surajit, Princeton))  (graduatedFrom(David, Stanford)  graduatedFrom(David, Princeton))  (hasAdvisor(Surajit, Jeff)  worksAt(Jeff, Stanford)  graduatedFrom(Surajit, Stanford))  (hasAcademicAdvisor(David, Jeff)  worksAt(Jeff, Stanford)  graduatedFrom(David, Stanford))  worksAt(Jeff, Stanford)  hasAdvisor(Surajit, Jeff)  hasAdvisor(David, Jeff)  graduatedFrom(Surajit, Princeton)  graduatedFrom(Surajit, Stanford)  graduatedFrom(David, Princeton)  graduatedFrom(David, Stanford) CNF (graduatedFrom(Surajit, Stanford)  graduatedFrom(Surajit, Princeton))  (graduatedFrom(David, Stanford)  graduatedFrom(David, Princeton))  (hasAdvisor(Surajit, Jeff)  worksAt(Jeff, Stanford)  graduatedFrom(Surajit, Stanford))  (hasAcademicAdvisor(David, Jeff)  worksAt(Jeff, Stanford)  graduatedFrom(David, Stanford))  worksAt(Jeff, Stanford)  hasAdvisor(Surajit, Jeff)  hasAdvisor(David, Jeff)  graduatedFrom(Surajit, Princeton)  graduatedFrom(Surajit, Stanford)  graduatedFrom(David, Princeton)  graduatedFrom(David, Stanford)  ) Deductive Grounding – Yields only facts and rules which are relevant for answering the query (dependency graph D) 2) Boolean Formula in CNF consisting of – Grounded hard rules – Grounded soft rules (weighted) – Base facts (weighted) 3) Propositional Reasoning – Compute truth assignment for all facts in D such that the sum of weights is maximized  Compute “most likely” possible world

URDF: Lineage & Possible Worlds 11 1) Deductive Grounding – Same as before, but trace lineage of query answers 2) Lineage DAG (not CNF!) consisting of – Grounded hard rules – Grounded soft rules – Base facts plus: derivation structure 3) Probabilistic Inference – Marginalization: aggregate probabilities of all possible worlds where the answer is “true” – Drop “impossible worlds”   \/ graduatedFrom (Surajit, Princeton) graduatedFrom (Surajit, Princeton) graduatedFrom (Surajit, Stanford) graduatedFrom (Surajit, Stanford) /\ graduatedFrom (Surajit, Princeton) [0.7] graduatedFrom (Surajit, Princeton) [0.7] hasAdvisor (Surajit,Jeff )[0.8] hasAdvisor (Surajit,Jeff )[0.8] worksAt (Jeff,Stanford )[0.9] worksAt (Jeff,Stanford )[0.9] graduatedFrom (Surajit, Stanford) [0.6] graduatedFrom (Surajit, Stanford) [0.6] Query graduatedFrom(Surajit,?y) Query graduatedFrom(Surajit,?y) 0.7x( )=0.078 (1-0.7)x0.888= (1-0.72)x(1-0.6) = x0.9 =

Grounding first-order Horn formulas (Datalog) – Decidable – EXPTIME-complete, PSPACE-complete (including recursion, but in P w/o recursion) Max-Sat (Constrained & Weighted) – NP-complete Probabilistic inference in graphical models – #P-complete Grounding first-order Horn formulas (Datalog) – Decidable – EXPTIME-complete, PSPACE-complete (including recursion, but in P w/o recursion) Max-Sat (Constrained & Weighted) – NP-complete Probabilistic inference in graphical models – #P-complete Classes & Complexities 12 FOLOWL OWL-DL/lite Horn

Monte Carlo Simulation (I) 13 [Karp,Luby,Madras: J.Alg.’89] F = X 1 X 2  X 1 X 3  X 2 X 3 cnt = 0 repeat N times randomly choose X 1, X 2, X 3  {0,1} if F(X 1, X 2, X 3 ) = 1 then cnt = cnt+1 P = cnt/N return P /* Pr ' (F) */ cnt = 0 repeat N times randomly choose X 1, X 2, X 3  {0,1} if F(X 1, X 2, X 3 ) = 1 then cnt = cnt+1 P = cnt/N return P /* Pr ' (F) */ Theorem: If N ≥ (1/ Pr(F)) × (4 ln(2/  )/  2 ) then: Pr[ | P/Pr(F) - 1 | >  ] <  Theorem: If N ≥ (1/ Pr(F)) × (4 ln(2/  )/  2 ) then: Pr[ | P/Pr(F) - 1 | >  ] <  May be very big for small Pr(F) May be very big for small Pr(F) X1X2X1X2 X1X3X1X3 X2X3X2X3 Boolean formula: Zero/One-estimator theorem Works for any F (not in PTIME) Works for any F (not in PTIME) Naïve sampling:

Monte Carlo Simulation (II) 14 cnt = 0; S = Pr(C 1 ) + … + Pr(C m ) repeat N times randomly choose i  {1,2,…, m}, with prob. Pr(C i )/S randomly choose X 1, …, X n  {0,1} s.t. C i = 1 if C 1 =0 and C 2 =0 and … and C i-1 = 0 then cnt = cnt+1 P = cnt/N return P /* Pr ' (F) */ cnt = 0; S = Pr(C 1 ) + … + Pr(C m ) repeat N times randomly choose i  {1,2,…, m}, with prob. Pr(C i )/S randomly choose X 1, …, X n  {0,1} s.t. C i = 1 if C 1 =0 and C 2 =0 and … and C i-1 = 0 then cnt = cnt+1 P = cnt/N return P /* Pr ' (F) */ Theorem: If N ≥ (1/m) × (4 ln(2/  )/  2 ) then: Pr[ |P/Pr(F) - 1| >  ] <  Theorem: If N ≥ (1/m) × (4 ln(2/  )/  2 ) then: Pr[ |P/Pr(F) - 1| >  ] <  F = C 1  C 2 ...  C m Improved sampling: Now it’s better Only for F in DNF in PTIME [Karp,Luby,Madras: J.Alg.’89] Boolean formula in DNF:

Learning “Soft” Rules Extend Inductive Logic Programming (ILP) techniques to large and incomplete knowledge bases 15 Software tools: alchemy.cs.washington.edu Goal: learn livesIn(?x,?y)  bornIn(?x,?y) Li livesIn(x,y ) bornIn(x,y) livesIn(x,z) Positive Examples livesIn(?x,?y)  bornIn(?x,?y) Negative Examples  livesIn(?x,?y)  bornIn(?x,?y)  livesIn(?x,?z) Li Background knowledge

More Variants of Consistency Reasoning Propositional Reasoning – Constrained Weighted MaxSat solver Lineage & Possible Worlds (independent base facts) – Monte Carlo simulations (Luby-Karp) First-Order Logic & Probabilistic Graphical Models – Markov Logic (currently via interface to Alchemy*) [Richardson & Domingos: ML’06] – Even more general: Factor Graphs [McCallum et al. 2008] – MCMC sampling for probabilistic inference 16 *Alchemy – Open-Source AI:

Experiments URDF: SLD grounding & MaxSat solving 17 |C| - # literals in soft rules |S| - # literals in hard rules URDF vs. Markov Logic (MAP inference & MC-SAT) YAGO Knowledge Base: 2 Mio entities, 20 Mio facts Basic query answering: SLD grounding & MaxSat solving of 10 queries over 16 soft rules (partly recursive) & 5 hard rules (bornIn, diedIn, marriedTo, …) Asymptotic runtime checks: runtime comparisons for synthetic soft rule expansions

French Marriage Problem (Revisited) Facts in KB: New fact candidates: marriedTo (Hillary, Bill) marriedTo (Carla, Nicolas) marriedTo (Angelina, Brad) marriedTo (Cecilia, Nicolas) marriedTo (Carla, Benjamin) marriedTo (Carla, Mick) divorced (Madonna, Guy) domPartner (Angelina, Brad) 1: 2: 3: validFrom (2, 2008) validFrom (4, 1996) validUntil (4, 2007) validFrom (5, 2010) validFrom (6, 2006) validFrom (7, 2008) 4: 5: 6: 7: 8: JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC 18

Challenge: Temporal Knowledge Harvesting For all people in Wikipedia (100,000‘s) gather all spouses, incl. divorced & widowed, and corresponding time periods! >95% accuracy, >95% coverage, in one night! 19

Difficult Dating 20

(Even More Difficult) Implicit Dating explicit dates vs. implicit dates relative to other dates explicit dates vs. implicit dates relative to other dates 21

(Even More Difficult) Implicit Dating vague dates relative dates vague dates relative dates narrative text relative order narrative text relative order 22

TARSQI: Extracting Time Annotations Hong Kong is poised to hold the first election in more than half a century that includes a democracy advocate seeking high office in territory controlled by the Chinese government in Beijing. A pro- democracy politician, Alan Leong, announced Wednesday that he had obtained enough nominations to appear on the ballot to become the territory’s next chief executive. But he acknowledged that he had no chance of beating the Beijing-backed incumbent, Donald Tsang, who is seeking re- election. Under electoral rules imposed by Chinese officials, only 796 people on the election committee – the bulk of them with close ties to mainland China – will be allowed to vote in the March 25 election. It will be the first contested election for chief executive since Britain returned Hong Kong to China in Mr. Tsang, an able administrator who took office during the early stages of a sharp economic upturn in 2005, is popular with the general public. Polls consistently indicate that three-fifths of Hong Kong’s people approve of the job he has been doing. It is of course a foregone conclusion – Donald Tsang will be elected and will hold office for another five years, said Mr. Leong, the former chairman of the Hong Kong Bar Association. [Verhagen et al: ACL‘05] extraction errors! extraction errors! 23

13 Relations between Time Intervals A Before B B After A A Meets B B MetBy A A Overlaps B B OverlappedBy A A Starts B B StartedBy A A During B B Contains A A Finishes B B FinishedBy A A Equal B AB A B A B A B A B A B A B [Allen, 1984; Allen & Hayes, 1989] 24

Possible Worlds in Time (I) State Relation ‘03 ‘05‘ Base Facts Derived Facts [Wang,Yahya,Theobald: VLDB/MUD Workshop ‘10] ‘05‘00‘ ‘07 State Relation ‘04 ‘03‘04 ‘07 ‘05 25 playsFor(Beckham,Real)playsFor(Ronaldo,Real)  playsFor(Beckham, Real, T1)  playsFor(Ronaldo, Real, T2)  overlaps(T1,T2) teamMates(Beckham, Ronaldo,T3) State

Possible Worlds in Time (II) StateEvent 0.06 Event ‘95 ‘98‘02 ‘96‘99‘00 ‘96‘98 ‘00‘01‘ ‘ ‘98 playsFor(Beckham, United)wonCup(United, ChampionsLeague) Base Facts Derived Facts  Non-independent Independent [Wang,Yahya,Theobald: VLDB/MUD Workshop ‘10] 26 playsFor(Beckham, United, T1)  wonCup(United, ChampionsL,T2)  overlaps(T1,T2) won(Beckham, ChampionsL,T3) Closed and complete representation model (incl. lineage)  Stanford Trio project [Widom: CIDR’05, Benjelloun et al: VLDB’06] Interval computation remains linear in the number of bins Confidence computation per bin is #P-complete  In general requires possible-worlds-based sampling techniques (Luby-Karp, Gibbs sampling, etc.) Need Lineage! Need Lineage! 0.12

Agenda – URDF: Reasoning in Uncertain Knowledge Bases Resolving uncertainty at query-time Lineage of answers Propositional vs. probabilistic reasoning Temporal reasoning extensions – UViz: The URDF Visualization Frontend Demo! 27

UViz: The URDF Visualization Engine UViz System Architecture – Flash client – Tomcat server (JRE) – Relational backend (JDBC) – Remote Method Invocation & Object Serialization (BlazeDS) 28

UViz: The URDF Visualization Engine Demo! 29