10 Years of Probabilistic Querying – What Next? Martin Theobald University of Antwerp Joint work with Maximilian Dylla, Sairam Gurajada, Angelika Kimmig,

Slides:

Advertisements

Similar presentations

Computational Learning Theory

Advertisements

Panos Ipeirotis Stern School of Business

Uncertainty in Data Integration Ai Jing

Path-Sensitive Analysis for Linear Arithmetic and Uninterpreted Functions SAS 2004 Sumit Gulwani George Necula EECS Department University of California,

Program Verification using Probabilistic Techniques Sumit Gulwani Microsoft Research Invited Talk: VSTTE Workshop August 2006 Joint work with George Necula.

Chapter 13: Query Processing

Applications Computational LogicLecture 11 Michael Genesereth Spring 2004.

1 Knowledge and reasoning – second part Knowledge representation Logic and representation Propositional (Boolean) logic Normal forms Inference in propositional.

Constraint Satisfaction Problems

Inference in First-Order Logic

Online Max-Margin Weight Learning with Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Machine Learning Group Department of Computer Science.

Sugar 2.0 Formal Specification Language D ana F isman 1,2 Cindy Eisner 1 1 IBM Haifa Research Laboratory 1 IBM Haifa Research Laboratory 2 Weizmann Institute.

Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 1 SOFIE: A Self-Organizing Framework for Information Extraction Fabian.

1 Random Sampling from a Search Engines Index Ziv Bar-Yossef Maxim Gurevich Department of Electrical Engineering Technion.

Analysis of Algorithms

Tuesday, May 7 Integer Programming Formulations Handouts: Lecture Notes.

AIFB Denny Vrandečić – AIFB, Universität Karlsruhe (TH) 1 Mind the Web! Valentin Zacharias, Andreas Abecker, Imen.

Formal Models of Computation Part II The Logic Model

Evaluating Provider Reliability in Risk-aware Grid Brokering Iain Gourlay.

Dr. Alexandra I. Cristea CS 319: Theory of Databases: C3.

Chapter 7 Sampling and Sampling Distributions

1 Outline relationship among topics secrets LP with upper bounds by Simplex method basic feasible solution (BFS) by Simplex method for bounded variables.

Sep 16, 2013 Lirong Xia Computational social choice The easy-to-compute axiom.

Configuration management

1 Adaptive Submodularity: A New Approach to Active Learning and Stochastic Optimization Joint work with Andreas Krause 1 Daniel Golovin.

A D ICHOTOMY ON T HE C OMPLEXITY OF C ONSISTENT Q UERY A NSWERING FOR A TOMS W ITH S IMPLE K EYS Paris Koutris Dan Suciu University of Washington.

Text Categorization.

Lower Bounds for Exact Model Counting and Applications in Probabilistic Databases Paul Beame Jerry Li Sudeepa Roy Dan Suciu University of Washington.

1 CS 391L: Machine Learning: Rule Learning Raymond J. Mooney University of Texas at Austin.

Model Counting of Query Expressions: Limitations of Propositional Methods Paul Beame 1 Jerry Li 2 Sudeepa Roy 1 Dan Suciu 1 1 University of Washington.

1 Declarative Programming Motivation Warm Fuzzies What is Logic?... Logic Programming? Mechanics of Prolog Terms, Substitution, Unification, Horn Clauses,

Computer Science CPSC 322 Lecture 3 AI Applications.

CPSC 422, Lecture 11Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11 Jan, 29, 2014.

Faster Query Answering in Probabilistic Databases using Read-Once Functions Sudeepa Roy Joint work with Vittorio Perduca Val Tannen University of Pennsylvania.

Query Answering for OWL-DL with Rules Boris Motik Ulrike Sattler Rudi Studer.

CILC2011 A framework for structured knowledge extraction and representation from natural language via deep sentence analysis Stefania Costantini Niva Florio.

Sep 15, 2014 Lirong Xia Computational social choice The easy-to-compute axiom.

CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu Lecture 27 – Overview of probability concepts 1.

1 Functions and Applications

LIVE A lineage-supported, versioned DBMS  Anish Das Sarma  Martin Theobald  Jennifer Widom.

1 Volume measures and Rebasing of National Accounts Training Workshop on System of National Accounts for ECO Member Countries October 2012, Tehran,

The Pumping Lemma for CFL’s

SAT Solver CS 680 Formal Methods Jeremy Johnson. 2 Disjunctive Normal Form  A Boolean expression is a Boolean function  Any Boolean function can be.

Interactive Reasoning in Large and Uncertain RDF Knowledge Bases Martin Theobald Joint work with: Maximilian Dylla, Timm Meiser, Ndapa Nakashole, Christina.

Research Internships Advanced Research and Modeling Research Group.

UIUC CS 497: Section EA Lecture #2 Reasoning in Artificial Intelligence Professor: Eyal Amir Spring Semester 2004.

Proofs from SAT Solvers Yeting Ge ACSys NYU Nov

Queries with Difference on Probabilistic Databases Sanjeev Khanna Sudeepa Roy Val Tannen University of Pennsylvania 1.

PAPER BY : CHRISTOPHER R’E NILESH DALVI DAN SUCIU International Conference on Data Engineering (ICDE), 2007 PRESENTED BY : JITENDRA GUPTA.

CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.

A COURSE ON PROBABILISTIC DATABASES June, 2014Probabilistic Databases - Dan Suciu 1.

Uncertainty Lineage Data Bases Very Large Data Bases

URDF Query-Time Reasoning in Uncertain RDF Knowledge Bases Ndapandula Nakashole Mauro Sozio Fabian Suchanek Martin Theobald.

Trio: A System for Data, Uncertainty, and Lineage Search “stanford trio”

Trio: A System for Data, Uncertainty, and Lineage Search “stanford trio”

Computer vision: models, learning and inference Chapter 10 Graphical Models.

ULDBs: Databases with Uncertainty and Lineage O. Benjelloun, A. Das Sarma, A. Halevy, J. Widom.

DBrev: Dreaming of a Database Revolution Gjergji Kasneci, Jurgen Van Gael, Thore Graepel Microsoft Research Cambridge, UK.

1  Special Cases:  Query Semantics: (“Marginal Probabilities”)  Run query Q against each instance D i ; for each answer tuple t, sum up the probabilities.

DeepDive Model Dongfang Xu Ph.D student, School of Information, University of Arizona Dec 13, 2015.

Tutorial: Knowledge Bases for Web Content Analytics

Knowledge Repn. & Reasoning Lecture #9: Propositional Logic UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005.

Einat Minkov University of Haifa, Israel CL course, U

An Introduction to Markov Logic Networks in Knowledge Bases

Queries with Difference on Probabilistic Databases

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 20

Non-Standard-Datenbanken

Probabilistic Databases

Non-Standard-Datenbanken

Probabilistic Databases with MarkoViews

Presentation transcript:

10 Years of Probabilistic Querying – What Next? Martin Theobald University of Antwerp Joint work with Maximilian Dylla, Sairam Gurajada, Angelika Kimmig, Andre Melo, Iris Miliaraki, Luc de Raedt, Mauro Sozio, Fabian Suchanek

The important thing is to not stop questioning... One cannot help but be in awe when contemplating the mysteries of eternity, of life, of the marvelous structure of reality. It is enough if one tries merely to comprehend a little of this mystery every day. - Albert Einstein, 1936 The Marvelous Structure of Reality Joseph M. Hellerstein Keynote at WebDB 2003, San Diego The Marvelous Structure of Reality Joseph M. Hellerstein Keynote at WebDB 2003, San Diego

Look, There is Structure! The important thing is to not stop questioning

Look, There is Structure! Plethora of natural-language- processing techniques & tools Part-Of-Speech (POS) Tagging Named-Entity Recognition & Disambiguation (NERD) Dependency Parsing Semantic Role Labeling Text is not just unstructured data C1

Look, There is Structure! Plethora of natural-language- processing techniques & tools Part-Of-Speech (POS) Tagging Named-Entity Recognition & Disambiguation (NERD) Dependency Parsing Semantic Role Labeling Text is not just unstructured data But: Even the best NLP tools frequently yield errors Facts found on the Web are logically inconsistent Web-extracted knowledge bases are inherently incomplete C1

bornOn(Jeff, 09/22/42) gradFrom(Jeff, Columbia) hasAdvisor(Jeff, Arthur) hasAdvisor(Surajit, Jeff) knownFor(Jeff, Theory) type(Jeff, Author) [0.9] author(Jeff, Drag_Book) [0.8] author(Jeff,Cind_Book) [0.6] worksAt(Jeff, Bell_Labs) [0.7] type(Jeff, CEO) [0.4] Information Extraction YAGO/DBpedia et al. New fact candidates >120 M facts for YAGO2 (mostly from Wikipedia infoboxes) 100s M additional facts from Wikipedia free-text

7 instanceOf YAGO Knowledge Base Entity Max_Planck Apr 23, 1858 Person City Country subclass Location subclass bornOn Max Planck means subclass Oct 4, 1947 diedOn Kiel bornIn Nobel Prize Erwin_Planck fatherOf hasWon Scientist means Max Karl Ernst Ludwig Planck Physicist subclass Biologist subclass Germany Politician Angela Merkel Schleswig- Holstein State Angela Dorothea Merkel Oct 23, 1944 diedOn Organization subclass Max_Planck Society instanceOf means instanceOf subclass means Angela Merkel means citizenOf locatedIn subclass 3 M entities, 120 M facts 100 relations, 200k classes 3 M entities, 120 M facts 100 relations, 200k classes accuracy 95% accuracy 95% subclass instanceOf

8 Linked Open Data As of Sept. 2011: >200 linked-data sources >30 billion RDF triples >400 million owl:sameAs links As of Sept. 2011: >200 linked-data sources >30 billion RDF triples >400 million owl:sameAs links

9 Maybe Even More Importantly: Linked Vocabularies! Source: LinkedData.org Instance & class links between DBpedia, WordNet, OpenCyc, GeoNames, and many more… Schema.org Common vocabulary released by Google, Yahoo!, BING to annotate Web pages, incl. links to DBpedia. Micro-Formats: RDFa (W3C) <html xmlns=" xmlns:dc=" version="XHTML+RDFa 1.0" xml:lang="en"> Martin's Home Page

10 Currently (Sept. 2011) > 5 Mio owl:sameAs links between DBpedia/YAGO/Freebase As of Sept. 2011: > 5 million owl:sameAs links between DBpedia/YAGO/Freebase As of Sept. 2011: > 5 million owl:sameAs links between DBpedia/YAGO/Freebase

11 Application 1: Enrichment of Search Results Recent Advances in Structured Data and the Web. Alon Y. Halevy, Keynote at ICDE 2013, Brisbane Recent Advances in Structured Data and the Web. Alon Y. Halevy, Keynote at ICDE 2013, Brisbane

12 Its about the disappearance forty years ago of Harriet Vanger, a young scion of one of the wealthiest families in Sweden, and about her uncle, determined to know the truth about what he believes was her murder. Blomkvist visits Henrik Vanger at his estate on the tiny island of Hedeby. The old man draws Blomkvist in by promising solid evidence against Wennerström. Blomkvist agrees to spend a year writing the Vanger family history as a cover for the real assignment: the disappearance of Vanger's niece Harriet some 40 years earlier. Hedeby is home to several generations of Vangers, all part owners in Vanger Enterprises. Blomkvist becomes acquainted with the members of the extended Vanger family, most of whom resent his presence. He does, however, start a short lived affair with Cecilia, the niece of Henrik. After discovering that Salander has hacked into his computer, he persuades her to assist him with research. They eventually become lovers, but Blomkvist has trouble getting close to Lisbeth who treats virtually everyone she meets with hostility. Ultimately the two discover that Harriet's brother Martin, CEO of Vanger Industries, is secretly a serial killer. A 24-year-old computer hacker sporting an assortment of tattoos and body piercings supports herself by doing deep background investigations for Dragan Armansky, who, in turn, worries that Lisbeth Salander is the perfect victim for anyone who wished her ill." Application II: Machine Reading same uncleOf owns hires headOf affairWith enemyOf Etzioni, Banko, Cafarella: Machine Reading. AAAI06 Mitchell, Carlson et al.: Toward an Architecture for Never-Ending Language Learning. AAAI10 Etzioni, Banko, Cafarella: Machine Reading. AAAI06 Mitchell, Carlson et al.: Toward an Architecture for Never-Ending Language Learning. AAAI10

13 Application III: Natural-Language Question Answering evi.comevi.com (formerly trueknowledge.com)trueknowledge.com

14 Application III: Natural-Language Question Answering wolframalpha.com >10 trillion(!) facts >50,000 search algorithms >5,000 visualizations

15 IBM Watson: Deep Question Answering 99 cents got me a 4-pack of Ytterlig coasters from this Swedish chain This town is known as "Sin City" & its downtown is "Glitter Gulch" William Wilkinson's "An Account of the Principalities of Wallachia and Moldavia" inspired this author's most famous novel As of 2010, this is the only former Yugoslav republic in the EU Knowledge back-ends Question classification & decomposition D. Ferrucci et al.: Building Watson: An Overview of the DeepQA Project. AI Magazine, Fall 2010.

16 bielefeld.de/~cunger/qald/ Multilingual Question Answering over Linked Data (QALD-3), CLEF Natural-Language QA over Linked Data Which river does the Brooklyn Bridge cross? Welchen Fluss überspannt die Brooklyn Bridge? ¿Por qué río cruza la Brooklyn Bridge? Quale fiume attraversa il ponte di Brooklyn? Quelle cours d'eau est traversé par le pont de Brooklyn? Welke rivier overspant de Brooklyn Bridge? river, cross, Brooklyn Bridge Fluss, überspannen, Brooklyn Bridge río, cruza, Brooklyn Bridge fiume, attraversare, ponte di Brooklyn cours d'eau, pont de Brooklyn rivier, Brooklyn Bridge, overspant PREFIX dbo: PREFIX res: SELECT DISTINCT ?uri WHERE { res:Brooklyn_Bridge dbo:crosses ?uri. }

17 Which German politician is a successor of another politician who stepped down before his or her actual term was over, and what is the name of their political ancestor? German politicians successor other stepped down before actual term name ancestor SELECT ?s ?s1 WHERE { ?s rdf:type. ?s1 ?s. FILTER FTContains (?s, "stepped down early"). } saarland.de/tracks/lod/ INEX Linked Data Track, CLEF Natural-Language QA over Linked Data

18 Outline Probabilistic Databases Stanfords Trio System: Data, Uncertainty & Lineage Handling Uncertain RDF Data: URDF (Max-Planck-Institute/U-Antwerp) Probabilistic & Temporal Databases Sequenced vs. Non-Sequenced Semantics Interval Alignment & Probabilistic Inference Probabilistic Programming Statistical Relational Learning Learning Interesting Deduction Rules Summary & Challenges

19 Probabilistic databases combine first-order logic and probability theory in an elegant way: Declarative: Queries formulated in SQL/Relational Algebra/Datalog, support for updates, transactions, etc. Deductive: Well-studied resolution algorithms for SQL/Relational Algebra/Datalog (top-down/bottom- up), indexes, automatic query optimization Scalable (?): Polynomial data complexity (SQL), but #P-complete for the probabilistic inference Probabilistic Databases: A Panacea to All of the Afore Tasks? C2

20 Special Cases: Query Semantics: (Marginal Probabilities) Run query Q against each instance D i ; for each answer tuple t, sum up the probabilities of all instances D i where t exists. A probabilistic database D p (compactly) encodes a probability distribution over a finite set of deterministic database instances D i (1) Tuple-independent PDB(II) Block-independent PDB Note: (I) and (II) are not equivalent! Probabilistic Database

21 Stanford Trio System 1. Alternatives 2. ? (Maybe) Annotations 3. Confidence values 4. Lineage Uncertainty-Lineage Databases (ULDBs) [Widom: CIDR 2005]

22 Trios Data Model 1. Alternatives: uncertainty about value Saw (witness, color, car) Amy red, Honda red, Toyota orange, Mazda Three possible instances

23 Six possible instances Trios Data Model 1. Alternatives 2. ? (Maybe): uncertainty about presence ? Saw (witness, color, car) Amy red, Honda red, Toyota orange, Mazda Bettyblue, Acura

24 Trios Data Model 1. Alternatives 2. ? (Maybe) Annotations 3. Confidences: weighted uncertainty Still six possible instances, each with a probability ? Saw (witness, color, car) Amy red, Honda 0.5 red, Toyota 0.3 orange, Mazda 0.2 Betty blue, Acura 0.6

25 So Far: Model is Not Closed Saw (witness, car) Cathy Honda Mazda Drives (person, car) Jimmy, Toyota Jimmy, Mazda Billy, Honda Frank, Honda Hank, Honda Suspects Jimmy Billy Frank Hank Suspects = π person (Saw Drives) ? ? ? Does not correctly capture possible instances in the result CANNOT

26 Example with Lineage IDSaw (witness, car) 11Cathy Honda Mazda IDDrives (person, car) 21 Jimmy, Toyota Jimmy, Mazda 22 Billy, Honda Frank, Honda 23Hank, Honda IDSuspects 31Jimmy 32 Billy Frank 33Hank Suspects = π person (Saw Drives) ? ? ? λ (31) = (11,2) Λ (21,2) λ (32,1) = (11,1) Λ (22,1); λ (32,2) = (11,1) Λ (22,2) λ (33) = (11,1) Λ 23

27 Example with Lineage ID Saw (witness, car) 11Cathy Honda Mazda ID Drives (person, car) 21 Jimmy, Toyota Jimmy, Mazda 22 Billy, Honda Frank, Honda 23Hank, Honda ID Suspects 31Jimmy 32 Billy Frank 33Hank Suspects = π person (Saw Drives) ? ? ? λ (31) = (11,2) Λ (21,2) λ (32,1) = (11,1) Λ (22,1); λ (32,2) = (11,1) Λ (22,2) λ (33) = (11,1) Λ 23 (4)

28 Operational Semantics Closure: up-arrow always exists Closure: up-arrow always exists Completeness: any (finite) set of possible instances can be represented DpDp D 1, D 2,…, D n D 1, D 2, …, D m D p possible instances Q on each instance rep. of instances direct implementation But: data complexity is #P-complete!

29 Summary on Trios Data Model 1. Alternatives 2. ? (Maybe) Annotations 3. Confidence values 4. Lineage Uncertainty-Lineage Databases (ULDBs) Theorem: ULDBs are closed and complete. Formally studied properties like minimization, equivalence, approximation and membership based on lineage. [Benjelloun, Das Sarma, Halevy, Widom, Theobald: VLDB-J. 2008]

30 Basic Complexity Issue Theorem [Valiant:1979] For a Boolean expression E, computing Pr(E) is #P-complete NP = class of problems of the form is there a witness ? SAT #P = class of problems of the form how many witnesses ? #SAT NP = class of problems of the form is there a witness ? SAT #P = class of problems of the form how many witnesses ? #SAT The decision problem for 2CNF is in PTIME. The counting problem for 2CNF is already #P-complete. (will be coming back to this later again…) [Suciu & Dalvi: SIGMOD05 Tutorial on "Foundations of Probabilistic Answers to Queries"]

…back to Information Extraction bornIn(Barack, Honolulu) bornIn(Barack, Kenya)

Uncertain RDF (URDF): Facts & Rules Extensional Knowledge (the facts) High-confidence facts: existing knowledge base (ground truth) New fact candidates: extracted fact candidates with confidences Linked-Data & integration of various knowledge sources: Ontology merging or explicitly linked facts (owl:sameAs, owl:equivProp.) Large Probabilistic Database of RDF facts Intensional Knowledge (the rules) Soft rules: deductive grounding & lineage (Datalog/SLD resolution) Hard rules: consistency constraints (more general FOL rules) Propositional & probabilistic inference At query-time!

Soft Rules vs. Hard Rules (Soft) Deduction Rules vs. (Hard) Consistency Constraints People may live in more than one place livesIn(x,y) marriedTo(x,z) livesIn(z,y) livesIn(x,y) hasChild(x,z) livesIn(z,y) People are not born in different places/on different dates bornIn(x,y) bornIn(x,z) y=z bornOn(x,y) bornOn(x,z) y=z People are not married to more than one person (at the same time, in most countries?) marriedTo(x,y,t 1 ) marriedTo(x,z,t 2 ) y z disjoint(t 1,t 2 ) [0.8] [0.5]

[0.8] [0.5] Soft Rules vs. Hard Rules (Soft) Deduction Rules vs. (Hard) Consistency Constraints People may live in more than one place livesIn(x,y) marriedTo(x,z) livesIn(z,y) livesIn(x,y) hasChild(x,z) livesIn(z,y) People are not born in different places/on different dates bornIn(x,y) bornIn(x,z) y=z bornOn(x,y) bornOn(x,z) y=z People are not married to more than one person (at the same time, in most countries?) marriedTo(x,y,t 1 ) marriedTo(x,z,t 2 ) yz disjoint(t 1,t 2 ) Deductive Database: Datalog, core of SQL & Relational Algebra, RDF/S, OWL2-RL, etc. Deductive Database: Datalog, core of SQL & Relational Algebra, RDF/S, OWL2-RL, etc. More General FOL Constraints: Datalog plus constraints, X-tuples in PDBs, owl:FunctionalProperty, owl:disjointWith, etc. More General FOL Constraints: Datalog plus constraints, X-tuples in PDBs, owl:FunctionalProperty, owl:disjointWith, etc.

URDF Running Example Jeff Stanford University type [1.0] Surajit Princeton David Computer Scientist Computer Scientist worksAt [0.9] type [1.0] graduatedFrom [0.6] graduatedFrom [0.7] graduatedFrom [0.9] hasAdvisor [0.8] hasAdvisor [0.7] KB: RDF Base Facts Derived Facts gradFr(Surajit,Stanford) gradFr(David,Stanford) Derived Facts gradFr(Surajit,Stanford) gradFr(David,Stanford) graduatedFrom [?] Rules hasAdvisor(x,y) worksAt(y,z) graduatedFrom(x,z) [0.4] graduatedFrom(x,y) graduatedFrom(x,z) y=z Rules hasAdvisor(x,y) worksAt(y,z) graduatedFrom(x,z) [0.4] graduatedFrom(x,y) graduatedFrom(x,z) y=z

Basic Types of Inference MAP Inference Find the most likely assignment to query variables y under a given evidence x. Compute: arg max y P( y | x) (NP-complete for MaxSAT) Marginal/Success Probabilities Probability that query y is true in a random world under a given evidence x. Compute: y P( y | x ) (#P-complete already for conjunctive queries)

General Route: Grounding & MaxSAT Solving Query graduatedFrom(x, y) Query graduatedFrom(x, y) CNF (graduatedFrom(Surajit, Stanford) graduatedFrom(Surajit, Princeton)) (graduatedFrom(David, Stanford) graduatedFrom(David, Princeton)) (hasAdvisor(Surajit, Jeff) worksAt(Jeff, Stanford) graduatedFrom(Surajit, Stanford)) (hasAdvisor(David, Jeff) worksAt(Jeff, Stanford) graduatedFrom(David, Stanford)) worksAt(Jeff, Stanford) hasAdvisor(Surajit, Jeff) hasAdvisor(David, Jeff) graduatedFrom(Surajit, Princeton) graduatedFrom(Surajit, Stanford) graduatedFrom(David, Princeton) CNF (graduatedFrom(Surajit, Stanford) graduatedFrom(Surajit, Princeton)) (graduatedFrom(David, Stanford) graduatedFrom(David, Princeton)) (hasAdvisor(Surajit, Jeff) worksAt(Jeff, Stanford) graduatedFrom(Surajit, Stanford)) (hasAdvisor(David, Jeff) worksAt(Jeff, Stanford) graduatedFrom(David, Stanford)) worksAt(Jeff, Stanford) hasAdvisor(Surajit, Jeff) hasAdvisor(David, Jeff) graduatedFrom(Surajit, Princeton) graduatedFrom(Surajit, Stanford) graduatedFrom(David, Princeton) ) Grounding – Consider only facts (and rules) which are relevant for answering the query 2) Propositional formula in CNF, consisting of – Grounded soft & hard rules – Weighted base facts 3) Propositional Reasoning – Find truth assignment to facts such that the total weight of the satisfied clauses is maximized MAP inference: compute most likely possible world

[Theobald,Sozio,Suchanek,Nakashole: VLDS12] Find: arg max y P( y | x) Resolves to a variant of MaxSAT for propositional formulas URDF: MaxSAT Solving with Soft & Hard Rules { graduatedFrom(Surajit, Stanford), graduatedFrom(Surajit, Princeton) } { graduatedFrom(David, Stanford), graduatedFrom(David, Princeton) } { graduatedFrom(Surajit, Stanford), graduatedFrom(Surajit, Princeton) } { graduatedFrom(David, Stanford), graduatedFrom(David, Princeton) } (hasAdvisor(Surajit, Jeff) worksAt(Jeff, Stanford) graduatedFrom(Surajit, Stanford)) (hasAdvisor(David, Jeff) worksAt(Jeff, Stanford) graduatedFrom(David, Stanford)) worksAt(Jeff, Stanford) hasAdvisor(Surajit, Jeff) hasAdvisor(David, Jeff) graduatedFrom(Surajit, Princeton) graduatedFrom(Surajit, Stanford) graduatedFrom(David, Princeton) (hasAdvisor(Surajit, Jeff) worksAt(Jeff, Stanford) graduatedFrom(Surajit, Stanford)) (hasAdvisor(David, Jeff) worksAt(Jeff, Stanford) graduatedFrom(David, Stanford)) worksAt(Jeff, Stanford) hasAdvisor(Surajit, Jeff) hasAdvisor(David, Jeff) graduatedFrom(Surajit, Princeton) graduatedFrom(Surajit, Stanford) graduatedFrom(David, Princeton) S: Mutex-const. Special case: Horn-clauses as soft rules & mutex-constraints as hard rules C: Weighted Horn clauses (CNF) Compute W 0 = clauses C w(C) P(C is satisfied); For each hard constraint S { For each fact f in S t { Compute W f+ t = clauses C w(C) P(C is sat. | f = true); } Compute W S- t = clauses C w(C) P(C is sat. | S t = false); Choose truth assignment to f in S t that maximizes W f+ t, W S- t ; Remove satisfied clauses C; t++; } Compute W 0 = clauses C w(C) P(C is satisfied); For each hard constraint S { For each fact f in S t { Compute W f+ t = clauses C w(C) P(C is sat. | f = true); } Compute W S- t = clauses C w(C) P(C is sat. | S t = false); Choose truth assignment to f in S t that maximizes W f+ t, W S- t ; Remove satisfied clauses C; t++; } Runtime: O(|S||C|) Approximation guarantee of 1/2 Runtime: O(|S||C|) Approximation guarantee of 1/2 MaxSAT Alg.

Experiment (I): MAP Inference URDF: Grounding & MaxSAT solving |C| - # literals in grounded soft rules |S| - # literals in grounded hard rules URDF MaxSAT vs. Markov Logic (MAP inference & MC-SAT) YAGO Knowledge Base: 2 Mio entities, 20 Mio facts Query Answering: Deductive grounding & MaxSAT solving for 10 queries over 16 soft rules (partly recursive) & 5 hard rules (bornIn, diedIn, marriedTo, …) Asymptotic runtime checks via synthetic (random) soft rule expansions

Basic Types of Inference MAP Inference Find the most likely assignment to query variables y under a given evidence x. Compute: arg max y P( y | x) (NP-complete for MaxSAT) Marginal/Success Probabilities Probability that query y is true in a random world under a given evidence x. Compute: y P( y | x ) (#P-complete already for conjunctive queries)

[Yahya,Theobald: RuleML11 Dylla,Miliaraki,Theobald: ICDE13] Deductive Grounding with Lineage (SLD Resolution in Datalog/Prolog) \/ /\ graduatedFrom (Surajit, Princeton) [0.7] graduatedFrom (Surajit, Princeton) [0.7] hasAdvisor (Surajit,Jeff )[0.8] hasAdvisor (Surajit,Jeff )[0.8] worksAt (Jeff,Stanford )[0.9] worksAt (Jeff,Stanford )[0.9] graduatedFrom (Surajit, Stanford) [0.6] graduatedFrom (Surajit, Stanford) [0.6] Query graduatedFrom(Surajit, y) Query graduatedFrom(Surajit, y) CD AB A (B (C D)) graduatedFrom (Surajit, Princeton) graduatedFrom (Surajit, Princeton) graduatedFrom (Surajit, Stanford) graduatedFrom (Surajit, Stanford) Q1Q1 Q2Q2 Rules hasAdvisor(x,y) worksAt(y,z) graduatedFrom(x,z) [0.4] graduatedFrom(x,y) graduatedFrom(x,z) y=z Rules hasAdvisor(x,y) worksAt(y,z) graduatedFrom(x,z) [0.4] graduatedFrom(x,y) graduatedFrom(x,z) y=z Base Facts graduatedFrom(Surajit, Princeton) [0.7] graduatedFrom(Surajit, Stanford) [0.6] graduatedFrom(David, Princeton) [0.9] hasAdvisor(Surajit, Jeff) [0.8] hasAdvisor(David, Jeff) [0.7] worksAt(Jeff, Stanford) [0.9] type(Princeton, University) [1.0] type(Stanford, University) [1.0] type(Jeff, Computer_Scientist) [1.0] type(Surajit, Computer_Scientist) [1.0] type(David, Computer_Scientist) [1.0] Base Facts graduatedFrom(Surajit, Princeton) [0.7] graduatedFrom(Surajit, Stanford) [0.6] graduatedFrom(David, Princeton) [0.9] hasAdvisor(Surajit, Jeff) [0.8] hasAdvisor(David, Jeff) [0.7] worksAt(Jeff, Stanford) [0.9] type(Princeton, University) [1.0] type(Stanford, University) [1.0] type(Jeff, Computer_Scientist) [1.0] type(Surajit, Computer_Scientist) [1.0] type(David, Computer_Scientist) [1.0]

Lineage & Possible Worlds 1) Deductive Grounding Dependency graph of the query Trace lineage of individual query answers 2) Lineage DAG (not in CNF), consisting of Grounded soft & hard rules Probabilistic base facts 3) Probabilistic Inference Compute marginals: P(Q): sum up the probabilities of all possible worlds that entail the query answers lineage P(Q|H): drop impossible worlds \/ /\ graduatedFrom (Surajit, Princeton) [0.7] graduatedFrom (Surajit, Princeton) [0.7] hasAdvisor (Surajit,Jeff )[0.8] hasAdvisor (Surajit,Jeff )[0.8] worksAt (Jeff,Stanford )[0.9] worksAt (Jeff,Stanford )[0.9] graduatedFrom (Surajit, Stanford) [0.6] graduatedFrom (Surajit, Stanford) [0.6] Query graduatedFrom(Surajit, y) Query graduatedFrom(Surajit, y) 0.7x( )=0.078(1-0.7)x0.888= (1-0.72)x(1-0.6) = x0.9 =0.72 CD AB A (B (C D)) graduatedFrom (Surajit, Princeton) graduatedFrom (Surajit, Princeton) graduatedFrom (Surajit, Stanford) graduatedFrom (Surajit, Stanford) Q1Q1 Q2Q2 [Das Sarma,Theobald,Widom: ICDE08 Dylla,Miliaraki,Theobald: ICDE13]

Possible Worlds Semantics P(Q 2 )= P(Q 2 |H)= / = P(Q 1 )=0.0784P(Q 1 |H)= / = Hard rule H: A (B (C D))

Inference in Probabilistic Databases Safe query plans [Dalvi,Suciu: VLDB-J07] Can propagate confidences along with relational operators. Read-once functions [Sen,Deshpande,Getoor: PVLDB10] Can factorize Boolean formula (in polynomial time) into read-once form, where every variable occurs at most once. Knowledge compilation [Olteanu et al.: ICDT10, ICDT11] Can decompose Boolean formula into ordered binary decision diagram (OBDD), such that inference resolves to independent-and and independent-or operations over the decomposed formula. Top-k pruning [Ré,Davli,Suciu: ICDE07; Karp,Luby,Madras: J-Alg.89] Can return top-k answers based on lower and upper bounds, even without knowing their exact marginal probabilities. Multi-Simulation: run multiple Markov-Chain-Monte-Carlo (MCMC) simulations in parallel.

Monte Carlo Simulation (I) E = X 1 X 2 v X 1 X 3 v X 2 X 3 cnt = 0 repeat N times randomly choose X 1, X 2, X 3 {0,1} if E(X 1, X 2, X 3 ) = 1 then cnt = cnt+1 P = cnt/N return P /* estimate for true Pr(F) */ cnt = 0 repeat N times randomly choose X 1, X 2, X 3 {0,1} if E(X 1, X 2, X 3 ) = 1 then cnt = cnt+1 P = cnt/N return P /* estimate for true Pr(F) */ Theorem: If N (1/ Pr(E)) × (4 ln(2/ )/ 2 ) then: Pr[ | P/ Pr(E) - 1 | > ] < N may be very big for small Pr(E) X1X2X1X2 X1X2X1X2 X1X3X1X3 X1X3X1X3 X2X3X2X3 X2X3X2X3 Boolean formula: Zero/One-Estimator Theorem Works for any E (not in PTIME) Works for any E (not in PTIME) Naïve sampling: [Suciu & Dalvi: SIGMOD05 Tutorial on "Foundations of Probabilistic Answers to Queries" Karp,Luby,Madras: J-Alg.89]

cnt = 0; S = Pr(C 1 ) + … + Pr(C m ) repeat N times randomly choose i {1,2,…, m}, with prob. Pr(C i )/S randomly choose X 1, …, X n {0,1} s.t. C i = 1 if C 1 = 0 and C 2 = 0 and … and C i-1 = 0 then cnt = cnt+1 P = cnt/N return P /* estimate for true Pr(E) */ cnt = 0; S = Pr(C 1 ) + … + Pr(C m ) repeat N times randomly choose i {1,2,…, m}, with prob. Pr(C i )/S randomly choose X 1, …, X n {0,1} s.t. C i = 1 if C 1 = 0 and C 2 = 0 and … and C i-1 = 0 then cnt = cnt+1 P = cnt/N return P /* estimate for true Pr(E) */ Theorem: If N (1/m) × (4 ln(2/ )/ 2 ) then: Pr[ |P/Pr(E) - 1| > ] < Theorem: If N (1/m) × (4 ln(2/ )/ 2 ) then: Pr[ |P/Pr(E) - 1| > ] < E = C 1 v C 2 v... v C m Importance sampling: This is better! Only for E in DNF in PTIME Boolean formula in DNF: Monte Carlo Simulation (II) [Suciu & Dalvi: SIGMOD05 Tutorial on "Foundations of Probabilistic Answers to Queries" Karp,Luby,Madras: J-Alg.89]

Top-k Ranking by Marginal Probabilities \/ graduatedFrom (Surajit, Stanford) [0.6] graduatedFrom (Surajit, Stanford) [0.6] Query graduatedFrom(Surajit, y) Query graduatedFrom(Surajit, y) graduatedFrom (Surajit, Princeton) graduatedFrom (Surajit, Princeton) graduatedFrom (Surajit, Stanford) graduatedFrom (Surajit, Stanford) Q1Q1 Q2Q2 graduatedFrom (Surajit, Princeton )[0.7] graduatedFrom (Surajit, Princeton )[0.7] AB graduatedFrom (Surajit, y=Stanford) graduatedFrom (Surajit, y=Stanford) /\ hasAdvisor (Surajit,Jeff )[0.8] hasAdvisor (Surajit,Jeff )[0.8] worksAt (Jeff,Stanford )[0.9] worksAt (Jeff,Stanford )[0.9] CD Datalog/SLD resolution Top-down grounding allows us to compute lower and upper bounds on the marginal probabilities of answer candidates before rules are fully grounded. Subgoals may represent sets of answer candidates. First-order lineage formulas: Φ (Q 1 ) = A Φ (Q 2 ) = B y gradFrom(Surajit,y) Prune entire set of answer candidates represented by Φ. [Dylla,Miliaraki,Theobald: ICDE13]

Bounds for First-Order Formulas Theorem 1: Given a (partially grounded) first-order lineage formula Φ : Φ (Q 2 ) = B y gradFrom(S,y) Lower bound P low (for all query answers that can be obtained from grounding Φ) Substitute y gradFrom(S,y) with false (or true if negated). P low (Q 2 ) = P(B false) = P(B) = 0.6 Upper bound P up (for all query answers that can be obtained from grounding Φ) Substitute y gradFrom(S,y) with true (or false if negated). P up (Q 2 ) = P(B true) = P(true) = 1.0 Proof: (sketch) Substitution of a subformula with false reduces the number of models (possible worlds) that satisfy Φ ; substitution with true increases them. [Dylla,Miliaraki,Theobald: ICDE13]

Theorem II: Let Φ 1,…, Φ n be a series of first-order lineage formulas obtained from grounding Φ via SLD resolution, and let φ be the propositional lineage formula of an answer obtained from this grounding procedure. Then rewriting each Φ i according to Theorem 1 into P i,low and P i,up creates a monotonic series of lower and upper bounds that converges to P( φ ). 0 = P(false) P(B false) = 0.6 P(B (C D)) = P(B true) = P(true) = 1 Proof: (sketch, via induction) Substitution of true with a formula reduces the number of models that satisfy Φ ; substitution of false with a formula increases this number. Convergence of Bounds [Dylla,Miliaraki,Theobald: ICDE13]

P 2,up (Q j ) P 2,low (Q j ) Top-k Pruning Fagins Algorithm Maintain two disjoint queues: Top-k queue sorted by P low and Candidates sorted by P up Return the top-k queue at the tth grounding step when: P i,low (Q k ) | Q k Top-k > P i,up (Q j ) | Q j Candidates Drop Q j from the Candidates queue. P 1,up (Q j ) P 1,low (Q j ) k-th lower bound P n,up (Q j ) P n,low (Q j ) #SLD steps t Marginal probability 1 0 [Fagin et al.01; Balke,Kießling02; Dylla,Miliaraki,Theobald: ICDE13]

Top-k Stopping Condition Fagins Algorithm Maintain two disjoint queues: Top-k queue sorted by P low and Candidates sorted by P up Return the top-k queue at the tth grounding step when: P i,low (Q k ) | Q k Top-k > P i,up (Q j ) | Q j Candidates Stop and return the top-2 query answers. 2-nd lower bound [Fagin et al.01; Balke,Kießling02; Dylla,Miliaraki,Theobald: ICDE13] k = 2 P t,up (Q 2 ) P t,low (Q 2 ) P t,up (Q 1 ) P t,low (Q 1 step t Marginal probability 1 0 P t,low (Q m ) P t,up (Q m )

Experiment (II): Computing Marginals IMDB data with 26 Mio facts about movies, directors, actors, etc. 4 query patterns, each instantiated to 1,000 queries (showing runtime averages) Q1 – safe, non-repeating hierarchical Q2 – unsafe, repeating hierarchical Q3 – unsafe, head-hierarchical Q4 – general unsafe

Experiment (II): Computing Marginals Runtime vs. number of top-k results; single join query Percentage of tuples scanned from input relations IMDB data set, 26 Mio facts

Basic Types of Inference MAP Inference Find the most likely assignment to query variables y under a given evidence x. Compute: arg max y P( y | x) (NP-complete for MaxSAT) Marginal/Success Probabilities Probability that query y is true in a random world under a given evidence x. Compute: y P( y | x ) (#P-complete already for conjunctive queries)

Probabilistic & Temporal Database Sequenced Semantics & Snapshot Reducibility: Built-in semantics: reduce temporal-relational operators to their non- temporal counterparts at each snapshot of the database. Coalesce/split tuples with consecutive time intervals based on their lineages. Non-Sequenced Semantics Queries can freely manipulate timestamps just like regular attributes. Single temporal operator T supports all of Allens 13 temporal relations. Deduplicate tuples with overlapping time intervals based on their lineages. A temporal-probabilistic database D Tp (compactly) encodes a probability distribution over a finite set of deterministic database instances D i and a finite time domain T. [Dignös, Gamper, Böhlen: SIGMOD12] [Dylla,Miliaraki,Theobald: PVLDB13]

Temporal Alignment & Deduplication Non-Sequenced Semantics: f1f f 2 ¬f 3 f 2 f 3 f 1 ¬f 3 f 1 f 3 (f 1 f 3 ) (f 1 ¬f 3 ) (f 2 f 3 ) (f 2 ¬f 3 ) (f 1 f 3 ) (f 2 ¬f 3 ) T MarriedTo(X,Y)[T b1,t max ) Wedding(X,Y)[T b1,T e1 ) ¬Divorce(X,Y)[T b2,T e2 ) MarriedTo(X,Y)[T b1,T e2 ) Wedding(X,Y)[T b1,T e1 ) Divorce(X,Y)[T b2,T e2 ) T e1 T T b2 Base Facts Deduced Facts Dedupl. Facts Wedding(DeNiro,Abbott) Divorce(DeNiro,Abbott) t max f2f2 f3f3 t min

playsFor(Beckham, Real, T 1 ) Base Facts Derived Facts playsFor(Ronaldo, Real, T 2 ) playsFor(Beckham, Real, T 1 ) playsFor(Ronaldo, Real, T 2 ) overlaps(T 1,T 2, T 3 ) t 3 teamMates(Beckham, Ronaldo, t 3 ) teamMates(Beckham, Ronaldo, T 3 ) Inference in Temporal-Probabilistic Databases [Wang,Yahya,Theobald: MUD10; Dylla,Miliaraki,Theobald: PVLDB13]

playsFor(Beckham, Real, T 1 ) Base Facts Derived Facts playsFor(Ronaldo, Real, T 2 ) playsFor(Zidane, Real, T 3 ) teamMates(Beckham, Zidane, T 5 ) teamMates(Ronaldo, Zidane, T 6 ) teamMates(Beckham, Ronaldo, T 4 ) Non-independent Independent Inference in Temporal-Probabilistic Databases [Wang,Yahya,Theobald: MUD10; Dylla,Miliaraki,Theobald: PVLDB13]

playsFor(Beckham, Real, T 1 ) Base Facts Derived Facts playsFor(Ronaldo, Real, T 2 ) playsFor(Zidane, Real, T 3 ) teamMates(Beckham, Zidane, T 5 ) teamMates(Ronaldo, Zidane, T 6 ) Non-independent Independent Closed and complete representation model (incl. lineage) Temporal alignment is linear in the number of input intervals Confidence computation per interval remains #P-hard In general requires Monte Carlo approximations (Luby-Karp for DNF, MCMC-style sampling), decompositions, or top-k pruning Closed and complete representation model (incl. lineage) Temporal alignment is linear in the number of input intervals Confidence computation per interval remains #P-hard In general requires Monte Carlo approximations (Luby-Karp for DNF, MCMC-style sampling), decompositions, or top-k pruning teamMates(Beckham, Ronaldo, T 4 ) Need Lineage! Inference in Temporal-Probabilistic Databases [Wang,Yahya,Theobald: MUD10; Dylla,Miliaraki,Theobald: PVLDB13]

Experiment (III): Temporal Alignment & Probabilistic Inference 1,827 base facts with temporal annotations Extracted from free-text biographies from Wikipedia, IMDB.com, biography.com 11 handcrafted temporal deduction rules, e.g.: MarriedTo(X,Y)[T b1,T e2 ) Wedding(X,Y)[T b1,T e1 ) Divorce(X,Y)[T b2,T e2 ) T e1 T T b2 21 handcrafted temporal consistency constraints, e.g.: BornIn(X,Y)[T b1,T e1 ) MarriedTo(X,Y)[T b2,T e2 ) T e1 T T b2

Statistical Relational Learning & Probabilistic Programming SRL combines first-order logic and probabilistic inference Employs relational data as input, but with a focus also on learning the relations (facts, rules & weights) Knowledge compilation for probabilistic inference Including recent techniques for lifted inference Markov Logic Networks (U-Washington) Grounding of weighted first-order rules over a function-free Herbrand base into an undirected graphical model ( Markov Random Field) Probabilistic Programming (ProbLog, KU-Leuven) Deductive grounding over a set of base facts into a directed graphical model (SLD proofs Bayesian Net)

Learning Soft Deduction Rules Inductive learning algorithm based on dynamic programming A-priori-style pre-filtering & pruning of low-support join patterns Adaptation of confidence and support measures from data mining Learning interesting rules with constants and type constraints Ground truth for bornIn(partially known) Specificity: avoid producing overly general rules Overly general Refine by types Ground truth for IivesIn (only partially known) Knowledge base for livesIn (known positive examples) Facts inferred for livesIn from the body of the rule bornIn (only partially correct) Goal: Inductively learn soft rule S: livesIn(x,y) :- bornIn(x,y) G KB R

Learning Interesting Deduction Rules (I) Plots for the distribution of income versus quarterOfBirth and educationLevel over actual US census data from Oct (>1 billion RDF facts). Divergence fromOverall population shows strong correlation of income with educationLevel but not with quarterOfBirth. income re/. freq. Overall population QOB-1 st -quarter QOB-2 nd -quarter QOB-3 rd -quarter QOB-4 th -quarter Overall population QOB-1 st -quarter QOB-2 nd -quarter QOB-3 rd -quarter QOB-4 th -quarter income re/. freq. income(x, y), quarterOfBirth(x, z) income(x, y), educationLevel(x, z)

Learning Interesting Deduction Rules (II) Divergence measured using Kullback-Leibler or χ 2 betweenOverall population withNursery school to Grade 4 andProfessional school degree over discretized income domain. re/. freq. lowmediumhigh income(x, y) :- educationLevel(x, z) income(x, low) :- educationLevel(x, Nursery school to Grade 4) income(x, medium) :- educationLevel(x, Professional school degree) income(x, high) :- educationLevel(x, Professional school degree) – Overall population – Nursery school to Grade 4 – Professional school degree – Overall population – Nursery school to Grade 4 – Professional school degree income

ontological rigor human effort Names & PatternsEntities & Relations Open- Domain & Unsuper- vised Domain- Oriented Training Data/Facts < N. Portman, honored with, Academy Award>, < Jeff Bridges, expected to win, Oscar > < Bridges, nominated for, Academy Award> wonAward: Person Prize type (Meryl_Streep, Actor) wonAward (Meryl_Streep, Academy_Award) wonAward (Natalie_Portman, Academy_Award) wonAward (Ethan_Coen, Palme_dOr) Summary & Challenges (I) Web-Scale Information Extraction

ontological rigor human effort Names & PatternsEntities & Relations Open- Domain & Unsuper- vised Domain- Oriented Training Data/Facts Summary & Challenges (I) Web-Scale Information Extraction TextRunner ReadTheWeb / NELL Probase Freebase YAGO2DBpedia 3.8 Sofie / Prospera StatSnowball / EntityCube ? WebTables / FusionTables

Summary & Challenges (II) RDF is Not Enough! HMMs, CRFs, PCFGs (not in this talk) yield much richer output structures than just triplets. Extraction of facts beliefs, modifiers, modalities, etc.. intensional knowledge (rules) More expressive but canonical representation of natural language: trees, graphs, objects, frames (F-logic, KL-one, CycL, OWL, etc.) All combined with structured probabilistic inference Extraction of facts beliefs, modifiers, modalities, etc.. intensional knowledge (rules) More expressive but canonical representation of natural language: trees, graphs, objects, frames (F-logic, KL-one, CycL, OWL, etc.) All combined with structured probabilistic inference

Summary & Challenges (III) Scalable Probabilistic Inference Domain-liftable FO formula X,Y People smokes(X) friends(X,Y) smokes(Y) Exact lifted inference via Weighted-First-Order-Model-Counting (WFOMC) Probability of a query depends only on the size(s) of the domain(s), a weight function for the first-order predicates, and the weighted model count over the FO d-DNNF. [Van den Broeck11]: Compilation rules and inference algorithms for FO d-DNNFs [Jha & Suciu11]: Classes of SQL queries which admit polynomial-size (propositional) d-DNNFs Approximate inference via Belief Propagation, MCMC-style sampling, etc. Scale-out via distributed grounding & inference: TrinityRDF (MSR), GraphLab2 (MIT) Corresponding FO d-DNNF circuit

Final Summary Text is not just unstructured data. Probabilistic databases combine first-order logic and probability theory in an elegant way. Natural-Language-Processing people, Database guys, and Machine-Learning folks: its about time to join your forces! C1C2C3

Demo! urdf.mpi-inf.mpg.de

References Maximilian Dylla, Iris Miliaraki, and Martin Theobald: A Temporal-Probabilistic Database Model for Information Extraction. PVLDB 6(14), 2013 (to appear) Maximilian Dylla, Iris Miliaraki, and Martin Theobald: Top-k Query Processing in Probabilistic Databases with Non- Materialized Views. ICDE 2013, 2013 Ndapandula Nakashole, Mauro Sozio, Fabian Suchanek, Martin Theobald: Query-Time Reasoning in Uncertain RDF Knowledge Bases with Soft and Hard Rules. VLDS 2012: Mohamed Yahya, Martin Theobald: D2R2: Disk-Oriented Deductive Reasoning in a RISC-Style RDF Engine. RuleML America 2011: Timm Meiser, Maximilian Dylla, Martin Theobald: Interactive Reasoning in Uncertain RDF Knowledge Bases. CIKM 2011: Ndapandula Nakashole, Martin Theobald, Gerhard Weikum: Scalable Knowledge Harvesting with High Precision and High Recall. WSDM 2011: Maximilian Dylla, Mauro Sozio, Martin Theobald: Resolving Temporal Conflicts in Inconsistent RDF Knowledge Bases. BTW 2011: Yafang Wang, Mohamed Yahya, Martin Theobald: Time-aware Reasoning in Uncertain Knowledge Bases. MUD 2010: Ndapandula Nakashole, Martin Theobald, Gerhard Weikum: Find your Advisor: Robust Knowledge Gathering from the Web. WebDB 2010 Anish Das Sarma, Martin Theobald, Jennifer Widom: LIVE: A Lineage-Supported Versioned DBMS. SSDBM 2010: Anish Das Sarma, Martin Theobald, Jennifer Widom: Exploiting Lineage for Confidence Computation in Uncertain and Probabilistic Databases. ICDE 2008: Omar Benjelloun, Anish Das Sarma, Alon Y. Halevy, Martin Theobald, Jennifer Widom: Databases with uncertainty and lineage. VLDB J. 17(2): (2008)