Non-Standard-Datenbanken

Slides:



Advertisements
Similar presentations
10 Years of Probabilistic Querying – What Next? Martin Theobald University of Antwerp Joint work with Maximilian Dylla, Sairam Gurajada, Angelika Kimmig,
Advertisements

Distributed Snapshots: Determining Global States of Distributed Systems - K. Mani Chandy and Leslie Lamport.
LIVE A lineage-supported, versioned DBMS  Anish Das Sarma  Martin Theobald  Jennifer Widom.
Interactive Reasoning in Large and Uncertain RDF Knowledge Bases Martin Theobald Joint work with: Maximilian Dylla, Timm Meiser, Ndapa Nakashole, Christina.
Research Internships Advanced Research and Modeling Research Group.
Modeling and Querying Possible Repairs in Duplicate Detection George Beskales Mohamed A. Soliman Ihab F. Ilyas Shai Ben-David.
CS4432: Database Systems II
Queries with Difference on Probabilistic Databases Sanjeev Khanna Sudeepa Roy Val Tannen University of Pennsylvania 1.
PAPER BY : CHRISTOPHER R’E NILESH DALVI DAN SUCIU International Conference on Data Engineering (ICDE), 2007 PRESENTED BY : JITENDRA GUPTA.
BCDM Temporal Domains - Time is linear and totally ordered - Chronons are the basic time unit - Time domains are isomorphic to subsets of the domain of.
“Lineage/Provenance” Workgroup Report Birgitta, Amol, Ihab, Thomas, Anish, Martin, Matthias.
Efficient Query Evaluation on Probabilistic Databases
E FFICIENT T OP - K Q UERY E VALUATION ON P ROBABILISTIC D ATA P APER B Y C HRISTOPHER R´ E N ILESH D ALVI D AN S UCIU Presented By Chandrashekar Vijayarenu.
Querying Probabilistic XML Databases Asma Souihli Oct. 24 th 2012 Network and Computer Science Department.
URDF Query-Time Reasoning in Uncertain RDF Knowledge Bases Ndapandula Nakashole Mauro Sozio Fabian Suchanek Martin Theobald.
Ming Hua, Jian Pei Simon Fraser UniversityPresented By: Mahashweta Das Wenjie Zhang, Xuemin LinUniversity of Texas at Arlington The University of New South.
Trio: A System for Data, Uncertainty, and Lineage Search “stanford trio”
5/25/2005EE562 EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS Lecture 16, 6/1/2005 University of Washington, Department of Electrical Engineering Spring 2005.
Computer vision: models, learning and inference Chapter 10 Graphical Models.
Population Proportion The fraction of values in a population which have a specific attribute p = Population proportion X = Number of items having the attribute.
Efficient Query Evaluation over Temporally Correlated Probabilistic Streams Bhargav Kanagal, Amol Deshpande ΗΥ-562 Advanced Topics on Databases Αλέκα Σεληνιωτάκη.
Correlation testing for affine invariant properties on Shachar Lovett Institute for Advanced Study Joint with Hamed Hatami (McGill)
1  Special Cases:  Query Semantics: (“Marginal Probabilities”)  Run query Q against each instance D i ; for each answer tuple t, sum up the probabilities.
ITCS 6163 Lecture 5. Indexing datacubes Objective: speed queries up. Traditional databases (OLTP): B-Trees Time and space logarithmic to the amount of.
A Survey Based Seminar: Data Cleaning & Uncertain Data Management Speaker: Shawn Yang Supervisor: Dr. Reynold Cheng Prof. David Cheung
1 Chapter 14 Probabilistic Reasoning. 2 Outline Syntax of Bayesian networks Semantics of Bayesian networks Efficient representation of conditional distributions.
Christopher Re and Dan Suciu University of Washington Efficient Evaluation of HAVING Queries on a Probabilistic Database.
DAQ: A New Paradigm for Approximate Query Processing Navneet Potti Jignesh Patel VLDB 2015.
Webdamlog and Contradictions Daniel Deutch Tel Aviv University Joint work with Serge Abiteboul, Meghyn Bienvenu, Victor Vianu.
1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.
Daphne Koller Overview Conditional Probability Queries Probabilistic Graphical Models Inference.
Approximation by Constraints Why do we need approximation/interpolation? –Record values at some finite set of points in space and time. –Does not capture.
Web-Mining Agents Data Mining Prof. Dr. Ralf Möller Universität zu Lübeck Institut für Informationssysteme Karsten Martiny (Übungen)
Learning Deep Generative Models by Ruslan Salakhutdinov
Qian Liu CSE spring University of Pennsylvania
Inference in Bayesian Networks
Distributed database approach,
Probabilistic Data Management
CS 4/527: Artificial Intelligence
Chapter 15 QUERY EXECUTION.
Query Execution Presented by Khadke, Suvarna CS 257
Adversarially Tuned Scene Generation
Queries with Difference on Probabilistic Databases
CAP 5636 – Advanced Artificial Intelligence
Lecture 16: Probabilistic Databases
Chap 2: A Prelude to Parametric Data
Software Verification 2 Automated Verification
Lecture 2 – Monte Carlo method in finance
Inference Inference: calculating some useful quantity from a joint probability distribution Examples: Posterior probability: Most likely explanation: B.
Instructors: Fei Fang (This Lecture) and Dave Touretzky
CAP 5636 – Advanced Artificial Intelligence
CS 188: Artificial Intelligence
The Trio System for Data, Uncertainty, and Lineage: Overview and Demo
CS4705 Natural Language Processing
A Temporal Probabilistic Database Model for Information Extraction
Einführung in Web- und Data-Science
CS 188: Artificial Intelligence
Logical Agents Chapter 7.
Chapter 2: Intro to Relational Model
CS 188: Artificial Intelligence Fall 2008
DATABASE HISTOGRAMS E0 261 Jayant Haritsa
Probabilistic Databases
Example of a Relation attributes (or columns) tuples (or rows)
Chapter 2: Intro to Relational Model
Clique Tree Algorithm: Computation
Non-Standard-Datenbanken
Probabilistic Ranking of Database Query Results
Probabilistic Databases with MarkoViews
Representations & Reasoning Systems (RRS) (2.2)
Lecture 2-6 Complexity for Computing Influence Spread
Presentation transcript:

Non-Standard-Datenbanken Prof. Dr. Ralf Möller Universität zu Lübeck Institut für Informationssysteme Felix Kuhr (Übungen)

Acknowledgements: This presentation is based on the following two presentations 10 Years of Probabilistic Querying – What Next? Martin Theobald University of Antwerp Temporal Alignment Anton Dignös1 Michael H. Böhlen1 Johann Gamper2 1University of Zürich, Switzerland 2Free University of Bozen-Bolzano, Italy

Recap: Probabilistic Databases A probabilistic database Dp (compactly) encodes a probability distribution over a finite set of deterministic database instances Di. D1: 0.42 D2: 0.18 D3: 0.28 D4: 0.12 WorksAt(Sub, Obj) Jeff Stanford Princeton WorksAt(Sub, Obj) Jeff Stanford WorksAt(Sub, Obj) Jeff Princeton WorksAt(Sub, Obj) Special Cases: Query Semantics: (“Marginal Probabilities”) Run query Q against each instance Di; for each answer tuple t, sum up the probabilities of all instances Di where t is a result. (1) Dp tuple-independent (II) Dp block-independent WorksAt(Sub, Obj) p Jeff Stanford 0.6 Princeton 0.7 WorksAt(Sub, Obj) p Jeff Stanford 0.6 Princeton 0.4 Note: (I) and (II) are not equivalent!

Probabilistic & Temporal Databases A temporal-probabilistic database DTp (compactly) encodes a probability distribution over a finite set of deterministic database instances Di at each time point of a finite time domain T. BornIn(Sub,Obj) T p DeNiro Green- which [1943, 1944) 0.9 Tribeca [1998, 1999) 0.6 Wedding(Sub,Obj) T p DeNiro Abbott [1936, 1940) 0.3 [1976, 1977) 0.7 Divorce(Sub,Obj) T p DeNiro Abbott [1988, 1989) 0.8 Sequenced Semantics & Snapshot Reducibility: Built-in semantics: reduce temporal-relational operators to their non- temporal counterparts at each snapshot (i.e., time point) of the database. Coalesce/split tuples with consecutive time intervals based on their lineages. Non-Sequenced Semantics Queries can freely manipulate timestamps just like regular attributes. Single temporal operator ≤T supports all of Allen’s 13 temporal relations. Deduplicate tuples with overlapping time intervals based on their lineages. [Dignös, Gamper, Böhlen: SIGMOD’12] marriedTo(x,y)[tb1,Tmax)  wedding(x,y)[tb1,te1)  ¬divorce(x,y)[tb2,te2) [Dylla,Miliaraki,Theobald: PVLDB’13]

Sequenced Semantics [Dignös, Gamper, Böhlen: SIGMOD’12]

Temporal Splitter / Snapshot Reduction [Dignös, Gamper, Böhlen: SIGMOD’12]

Temporal Alignment & Deduplication Example (f1  f3)  (f1  ¬f3)  (f2  f3)  (f2  ¬f3) Dedupl. Facts (f1  f3)  (f1  ¬f3) (f1  ¬f3)  (f2  ¬f3) f1  f3 Derived Facts f1  ¬f3 f2  f3 f2  ¬f3 f3 Base Facts wedding(DeNiro,Abbott) f2 divorce(DeNiro,Abbott) f1 wedding(DeNiro,Abbott) T Tmin 1936 1976 1988 Tmax Non-Sequenced Semantics: marriedTo(x,y)[tb1,Tmax)  wedding(x,y)[tb1,te1)  ¬divorce(x,y)[tb2,te2) marriedTo(x,y)[tb1,te2)  wedding(x,y)[tb1,te1)  divorce(x,y)[tb2,te2)  te1 ≤T tb2

Inference in Probabilistic-Temporal Databases [Wang,Yahya,Theobald: MUD’10; Dylla,Miliaraki,Theobald: PVLDB’13] Derived Facts teamMates(Beckham, Ronaldo, T3)  playsFor(Beckham, Real, T1) playsFor(Ronaldo, Real, T2) overlaps(T1, T2, T3) ‘03 ‘04 ‘07 ‘05 0.16 0.12 0.08 0.4 0.6 ‘03 ‘05 ‘07 playsFor(Beckham, Real, T1) 0.2 0.1 0.4 ‘05 ‘00 ‘02 ‘07 playsFor(Ronaldo, Real, T2) ‘04 Base Facts Example using the Allen predicate overlaps

Inference in Probabilistic-Temporal Databases [Wang,Yahya,Theobald: MUD’10; Dylla,Miliaraki,Theobald: PVLDB’13] Derived Facts teamMates(Beckham, Ronaldo, T4) teamMates(Beckham, Zidane, T5) 0.08 0.12 0.16 ‘03 ‘04 ‘07 ‘05 teamMates(Ronaldo, Zidane, T6) Non-independent Independent 0.4 0.6 ‘03 ‘05 ‘07 0.2 0.1 ‘05 ‘00 ‘02 ‘07 ‘04 0.4 playsFor(Zidane, Real, T3) Base Facts playsFor(Beckham, Real, T1) playsFor(Ronaldo, Real, T2)

Inference in Probabilistic-Temporal Databases [Wang,Yahya,Theobald: MUD’10; Dylla,Miliaraki,Theobald: PVLDB’13] Derived facts stored in views teamMates(Beckham, Ronaldo, T4) teamMates(Beckham, Zidane, T5) teamMates(Ronaldo, Zidane, T6) Need Lineage! Non-independent Independent Closed and complete representation model (incl. lineage) Temporal alignment is polyn. in the number of input intervals Confidence computation per interval remains #P-hard In general requires Monte Carlo approximations (Luby-Karp for DNF, MCMC-style sampling), decompositions, or top-k pruning playsFor(Zidane, Real, T3) Base Facts playsFor(Beckham, Real, T1) playsFor(Ronaldo, Real, T2)

Lineage & Possible Worlds [Das Sarma,Theobald,Widom: ICDE’08 Dylla,Miliaraki,Theobald: ICDE’13] Query graduatedFrom(Surajit, y) 1) Deductive Grounding Dependency graph of the query Trace lineage of individual query answers 2) Lineage DAG (not in CNF), consisting of Grounded soft & hard views Probabilistic base facts 3) Probabilistic Inference  Compute marginals: P(Q): sum up the probabilities of all possible worlds that entail the query answers’ lineage P(Q|H): drop “impossible worlds” 0.7x(1-0.888)=0.078 (1-0.7)x0.888=0.266 1-(1-0.72)x(1-0.6) =0.888 0.8x0.9 =0.72 graduatedFrom (Surajit, Princeton) graduatedFrom (Surajit, Stanford) Q1 Q2 A(B (CD))  A(B (CD))  \/ C D A B graduatedFrom (Surajit, Princeton)[0.7] graduatedFrom (Surajit, Stanford)[0.6] /\ hasAdvisor (Surajit,Jeff)[0.8] worksAt (Jeff,Stanford)[0.9]