Non-Standard-Datenbanken

Slides:



Advertisements
Similar presentations
Uncertainty in Data Integration Ai Jing
Advertisements

Slide 1 of 18 Uncertainty Representation and Reasoning with MEBN/PR-OWL Kathryn Blackmond Laskey Paulo C. G. da Costa The Volgenau School of Information.
10 Years of Probabilistic Querying – What Next? Martin Theobald University of Antwerp Joint work with Maximilian Dylla, Sairam Gurajada, Angelika Kimmig,
LIVE A lineage-supported, versioned DBMS  Anish Das Sarma  Martin Theobald  Jennifer Widom.
Interactive Reasoning in Large and Uncertain RDF Knowledge Bases Martin Theobald Joint work with: Maximilian Dylla, Timm Meiser, Ndapa Nakashole, Christina.
Research Internships Advanced Research and Modeling Research Group.
Probabilistic models Jouni Tuomisto THL. Outline Deterministic models with probabilistic parameters Hierarchical Bayesian models Bayesian belief nets.
Modeling and Querying Possible Repairs in Duplicate Detection George Beskales Mohamed A. Soliman Ihab F. Ilyas Shai Ben-David.
Jianxin Li, Chengfei Liu, Rui Zhou Swinburne University of Technology, Australia Wei Wang University of New South Wales, Australia Top-k Keyword Search.
CS4432: Database Systems II
Cleaning Uncertain Data with Quality Guarantees Reynold Cheng, Jinchuan Chen, Xike Xie 2008 VLDB Presented by SHAO Yufeng.
Online Filtering, Smoothing & Probabilistic Modeling of Streaming Data In short, Applying probabilistic models to Streams Bhargav Kanagal & Amol Deshpande.
Queries with Difference on Probabilistic Databases Sanjeev Khanna Sudeepa Roy Val Tannen University of Pennsylvania 1.
PAPER BY : CHRISTOPHER R’E NILESH DALVI DAN SUCIU International Conference on Data Engineering (ICDE), 2007 PRESENTED BY : JITENDRA GUPTA.
1 Program Slicing Purvi Patel. 2 Contents Introduction What is program slicing? Principle of dependences Variants of program slicing Slicing classifications.
“Lineage/Provenance” Workgroup Report Birgitta, Amol, Ihab, Thomas, Anish, Martin, Matthias.
Querying Probabilistic XML Databases Asma Souihli Oct. 24 th 2012 Network and Computer Science Department.
Uncertainty Lineage Data Bases Very Large Data Bases
URDF Query-Time Reasoning in Uncertain RDF Knowledge Bases Ndapandula Nakashole Mauro Sozio Fabian Suchanek Martin Theobald.
Probabilistic Fingerprints for Shapes Niloy J. MitraLeonidas Guibas Joachim GiesenMark Pauly Stanford University MPII SaarbrückenETH Zurich.
Plan Recognition with Multi- Entity Bayesian Networks Kathryn Blackmond Laskey Department of Systems Engineering and Operations Research George Mason University.
Trio: A System for Data, Uncertainty, and Lineage Search “stanford trio”
Top-k Queries on Uncertain Data: On score Distribution and Typical Answers Presented by Qian Wan, HKUST Based on [1][2]
Trio: A System for Data, Uncertainty, and Lineage Search “stanford trio”
Computer vision: models, learning and inference Chapter 10 Graphical Models.
Population Proportion The fraction of values in a population which have a specific attribute p = Population proportion X = Number of items having the attribute.
Statistical Relational Learning Pedro Domingos Dept. Computer Science & Eng. University of Washington.
Efficient Query Evaluation over Temporally Correlated Probabilistic Streams Bhargav Kanagal, Amol Deshpande ΗΥ-562 Advanced Topics on Databases Αλέκα Σεληνιωτάκη.
Cost-based Optimization of Graph Queries Silke Trißl Humboldt-Universität zu Berlin Knowledge Management in Bioinformatics IDAR 2007.
1  Special Cases:  Query Semantics: (“Marginal Probabilities”)  Run query Q against each instance D i ; for each answer tuple t, sum up the probabilities.
A Survey Based Seminar: Data Cleaning & Uncertain Data Management Speaker: Shawn Yang Supervisor: Dr. Reynold Cheng Prof. David Cheung
Data-Centric Human Computation Jennifer Widom Stanford University.
AP STATISTICS LESSON COMPARING TWO PROPORTIONS.
Wireless Sensor Networks In-Network Relational Databases Jocelyn Botello.
1 Chapter 14 Probabilistic Reasoning. 2 Outline Syntax of Bayesian networks Semantics of Bayesian networks Efficient representation of conditional distributions.
DAQ: A New Paradigm for Approximate Query Processing Navneet Potti Jignesh Patel VLDB 2015.
OLAP Recap 3 characteristics of OLAP cubes: Large data sets ~ Gb, Tb Expected Query : Aggregation Infrequent updates Star Schema : Hierarchical Dimensions.
Web-site Building Methodologies Current Research.
Lineage Tracing for General Data Warehouse Transformations Yingwei Cui and Jennifer Widom Computer Science Department, Stanford University Presentation.
Daphne Koller Overview Conditional Probability Queries Probabilistic Graphical Models Inference.
Welcome to the Course of Web and Document Databases
Information Management for Digital Humanities and Diplomatics
Distributed database approach,
Using Graph Summarization for Join-ahead Pruning in a Distributed RDF Store Sairam Gurajada (MPI Informatics, Germany), Stephan Seufert (MPI Informatics,
Probabilistic Data Management
CS 4/527: Artificial Intelligence
Web-Mining Agents Cooperating Agents for Information Retrieval
Chapter 15 QUERY EXECUTION.
Queries with Difference on Probabilistic Databases
CAP 5636 – Advanced Artificial Intelligence
Data Integration with Dependent Sources
Lecture 16: Probabilistic Databases
Non-Standard-Datenbanken
Software Verification 2 Automated Verification
Emergence of Intelligent Machines: Challenges and Opportunities
Lecture 2 – Monte Carlo method in finance
Inference Inference: calculating some useful quantity from a joint probability distribution Examples: Posterior probability: Most likely explanation: B.
Instructors: Fei Fang (This Lecture) and Dave Touretzky
CS 188: Artificial Intelligence
The Trio System for Data, Uncertainty, and Lineage: Overview and Demo
A Temporal Probabilistic Database Model for Information Extraction
A DATA MINING APPROACH FOR LOCATION PREDICTION IN MOBILE ENVIRONMENTS*
CS 188: Artificial Intelligence Fall 2008
Probabilistic Databases
Data Engineering Research Group
Probabilistic Ranking of Database Query Results
Probabilistic Databases with MarkoViews
Representations & Reasoning Systems (RRS) (2.2)
Generalized Diagnostics with the Non-Axiomatic Reasoning System (NARS)
Presentation transcript:

Non-Standard-Datenbanken Prof. Dr. Ralf Möller Universität zu Lübeck Institut für Informationssysteme

Acknowledgements: This presentation is based on the following two presentations 10 Years of Probabilistic Querying – What Next? Martin Theobald University of Antwerp Temporal Alignment Anton Dignös1 Michael H. Böhlen1 Johann Gamper2 1University of Zürich, Switzerland 2Free University of Bozen-Bolzano, Italy

Recap: Probabilistic Databases A probabilistic database Dp (compactly) encodes a probability distribution over a finite set of deterministic database instances Di. D1: 0.42 D2: 0.18 D3: 0.28 D4: 0.12 WorksAt(Sub, Obj) Jeff Stanford Princeton WorksAt(Sub, Obj) Jeff Stanford WorksAt(Sub, Obj) Jeff Princeton WorksAt(Sub, Obj) Special Cases: Query Semantics: (“Marginal Probabilities”) Run query Q against each instance Di; for each answer tuple t, sum up the probabilities of all instances Di where t is a result. (1) Dp tuple-independent (II) Dp block-independent WorksAt(Sub, Obj) p Jeff Stanford 0.6 Princeton 0.7 WorksAt(Sub, Obj) p Jeff Stanford 0.6 Princeton 0.4 Note: (I) and (II) are not equivalent!

Probabilistic & Temporal Databases A temporal-probabilistic database DTp (compactly) encodes a probability distribution over a finite set of deterministic database instances Di at each time point of a finite time domain T. BornIn(Sub,Obj) T p DeNiro Green- which [1943, 1944) 0.9 Tribeca [1998, 1999) 0.6 Wedding(Sub,Obj) T p DeNiro Abbott [1936, 1940) 0.3 [1976, 1977) 0.7 Divorce(Sub,Obj) T p DeNiro Abbott [1988, 1989) 0.8 Sequenced Semantics & Snapshot Reducibility: Built-in semantics: reduce temporal-relational operators to their non- temporal counterparts at each snapshot (i.e., time point) of the database. Coalesce/split tuples with consecutive time intervals based on their lineages. Non-Sequenced Semantics Queries can freely manipulate timestamps just like regular attributes. Single temporal operator ≤T supports all of Allen’s 13 temporal relations. Deduplicate tuples with overlapping time intervals based on their lineages. [Dignös, Gamper, Böhlen: SIGMOD’12] marriedTo(x,y)[tb1,Tmax)  wedding(x,y)[tb1,te1)  ¬divorce(x,y)[tb2,te2) [Dylla,Miliaraki,Theobald: PVLDB’13]

Sequenced Semantics: Example [Dignös, Gamper, Böhlen: SIGMOD’12]

Temporal Splitter / Snapshot Reduction [Dignös, Gamper, Böhlen: SIGMOD’12]

Temporal Alignment & Deduplication Example (f1  f3)  (f1  ¬f3)  (f2  f3)  (f2  ¬f3) Dedupl. Facts (f1  f3)  (f1  ¬f3) (f1  ¬f3)  (f2  ¬f3) f1  f3 Derived Facts f1  ¬f3 f2  f3 f2  ¬f3 f3 Base Facts wedding(DeNiro,Abbott) f2 divorce(DeNiro,Abbott) f1 wedding(DeNiro,Abbott) T Tmin 1936 1976 1988 Tmax Non-Sequenced Semantics: marriedTo(x,y)[tb1,Tmax)  wedding(x,y)[tb1,te1)  ¬divorce(x,y)[tb2,te2) marriedTo(x,y)[tb1,te2)  wedding(x,y)[tb1,te1)  divorce(x,y)[tb2,te2)  te1 ≤T tb2

Inference in Probabilistic-Temporal Databases [Wang,Yahya,Theobald: MUD’10; Dylla,Miliaraki,Theobald: PVLDB’13] Derived Facts teamMates(Beckham, Ronaldo, T3)  playsFor(Beckham, Real, T1) playsFor(Ronaldo, Real, T2) overlaps(T1, T2, T3) ‘03 ‘04 ‘07 ‘05 0.16 0.12 0.08 0.4 0.6 ‘03 ‘05 ‘07 playsFor(Beckham, Real, T1) 0.2 0.1 0.4 ‘05 ‘00 ‘02 ‘07 playsFor(Ronaldo, Real, T2) ‘04 Base Facts Example using the Allen predicate overlaps

Inference in Probabilistic-Temporal Databases [Wang,Yahya,Theobald: MUD’10; Dylla,Miliaraki,Theobald: PVLDB’13] Derived Facts teamMates(Beckham, Ronaldo, T4) teamMates(Beckham, Zidane, T5) 0.08 0.12 0.16 ‘03 ‘04 ‘07 ‘05 teamMates(Ronaldo, Zidane, T6) Non-independent Independent 0.4 0.6 ‘03 ‘05 ‘07 0.2 0.1 ‘05 ‘00 ‘02 ‘07 ‘04 0.4 playsFor(Zidane, Real, T3) Base Facts playsFor(Beckham, Real, T1) playsFor(Ronaldo, Real, T2)

Inference in Probabilistic-Temporal Databases [Wang,Yahya,Theobald: MUD’10; Dylla,Miliaraki,Theobald: PVLDB’13] Derived facts stored in views teamMates(Beckham, Ronaldo, T4) teamMates(Beckham, Zidane, T5) teamMates(Ronaldo, Zidane, T6) Need Lineage! Non-independent Independent Closed and complete representation model (incl. lineage) Temporal alignment is polyn. in the number of input intervals Confidence computation per interval remains #P-hard In general requires Monte Carlo approximations (Luby-Karp for DNF, MCMC-style sampling), decompositions, or top-k pruning playsFor(Zidane, Real, T3) Base Facts playsFor(Beckham, Real, T1) playsFor(Ronaldo, Real, T2)

Lineage & Possible Worlds [Das Sarma,Theobald,Widom: ICDE’08 Dylla,Miliaraki,Theobald: ICDE’13] Query graduatedFrom(Surajit, y) 1) Deductive Grounding Dependency graph of the query Trace lineage of individual query answers 2) Lineage DAG (not in CNF), consisting of Grounded soft & hard views Probabilistic base facts 3) Probabilistic Inference  Compute marginals: P(Q): sum up the probabilities of all possible worlds that entail the query answers’ lineage P(Q|H): drop “impossible worlds” 0.7x(1-0.888)=0.078 (1-0.7)x0.888=0.266 1-(1-0.72)x(1-0.6) =0.888 0.8x0.9 =0.72 graduatedFrom (Surajit, Princeton) graduatedFrom (Surajit, Stanford) Q1 Q2 A(B (CD))  A(B (CD))  \/ C D A B graduatedFrom (Surajit, Princeton)[0.7] graduatedFrom (Surajit, Stanford)[0.6] /\ hasAdvisor (Surajit,Jeff)[0.8] worksAt (Jeff,Stanford)[0.9]

Literature [Das Sarma,Theobald,Widom: ICDE’08] Das Sarma, Anish and Theobald, Martin and Widom, Jennifer, Exploiting Lineage for Confidence Computation in Uncertain and Probabilistic Databases. In: 24th International Conference on Data Engineering (ICDE 2008), IEEE Computer Society Press, pp. 1023-1032. 2008 [Wang,Yahya,Theobald: MUD’10] Wang, Yafang and Yahya, Mohamed and Theobald, Martin, Time-aware Reasoning in Uncertain Knowledge Bases. In: 4th International VLDB Workshop on Management of Uncertain Data, Vol. WP 10-, pp. 51-65, 2010 [Dignös, Gamper, Böhlen: SIGMOD’12] A. Dignös, M.H. Böhlen, J. Gamper. Temporal alignment. In Proc. of the SIGMOD-12, pages 433-444, Scottsdale, AZ, USA, May 20-24, 2012 [Dylla,Miliaraki,Theobald: PVLDB’13] Maximilian Dylla, Iris Miliaraki, Martin Theobald, A Temporal-Probabilistic Database Model for Information Extraction, Proceedings of the VLDB Endowment, Volume 6, Issue 14, 2013

Historical Facts vs. Future Facts Processing uncertain historical data Estimating probabilities of future facts (increasing Tmax)? marriedTo(x,y)[tb1, te2)  wedding(x,y)[tb1,te1)  ¬divorce(x,y)[tb2,te2) marriedTo(x,y)[tb1,te2)  wedding(x,y)[tb1,te1)  divorce(x,y)[tb2,te2)  te1 ≤T tb2 marriedTo(x,y)[tb1,Tmax)  wedding(x,y)[tb1,te1)  ¬divorce(x,y)[tb2,te2) marriedTo(x,y)[tb1,te2)  wedding(x,y)[tb1,te1)  divorce(x,y)[tb2,te2)  te1 ≤T tb2