Download presentation
Presentation is loading. Please wait.
1
Probabilistic Databases with MarkoViews
Abhay Jha Dan Suciu Presented by: Alon Vizel, 15/1/2017 Soft-Logic Seminar in Computer Science, Technion
2
Lecture layout: Definitions & Background
INDB - Tuple independent database MLN - Markov Logic Network MarkoViews & MVDB Translating MVDB to INDB Experimental Evaluation Summary
3
Definitions A database instance I of a relational schema R is a k-tuple ( π
1 πΌ , , π
π πΌ ), where π
π πΌ is an instance of the relation π
π probabilistic database D = (W, P), where W = { πΌ 1 , , πΌ π } is a set of instances, called possible worlds, and P : W β [0, 1] We denote Tup the set of possible tuples, i.e. the set of all tuples occurring in all possible worlds πΌ 1 , , πΌ π
4
Definitions cont. A conjunctive query (CQ) is a query Q of the form (β π¦ )( π
1 ( π₯ 1 ) β§ β§ π
π‘ ( π₯ π‘ )) A union of conjunctive queries (UCQ) is a query Q of the form π 1 β¨ β¨ π π , where each π π β CQ
5
Background Many query processing techniques. Short running time.
Dealing successfully with large databases. Problem: Most scalable query processing techniques assume that the tuples are independent. Most processing techniques are based UCQ. Insufficient for complex knowledge extraction tasks.
6
What do we want? Represent complex correlations
Efficient query evaluation: Easy translation (our main goal today) Fast evaluation
7
Lecture layout: Definitions & Background
INDB - Tuple independent database MLN - Markov Logic Network MarkoViews & MVDB Translating MVDB to INDB Experimental Evaluation Summary
8
Tuple independent database (INDB)
A probabilistic database is tuple-independent if, for any set of possible tuples π‘ 1 ,β¦, π‘ π , the events π π‘ 1 ,β¦, π π‘ π are independent. We write π· 0 =(π»ππ, π) π»ππ is the set of possible tuples p : Tup β [0, 1]. The possible worlds are all subsets I β Tup, and their probabilities are π πΌ = π‘βπΌ π(π‘) β π‘βππ’πβπΌ (1βπ π‘ )
9
INDB example S Prob. B A 0.6 1 m s1 0.5 n s2 T Prob. D C 0.4 p 1 t1
Possible worlds probability Instance 0.12 {s1, s2, t1} 0.18 {s1, s2} {s1, t1} {s1} 0.08 {s2, t1} {s2} {t1} β
S Prob. B A 0.6 1 m s1 0.5 n s2 T Prob. D C 0.4 p 1 t1
10
Alternative INDB definition
π· 0 = ( π»ππ 0 , π€ 0 ), π»ππ 0 is a set of possible tuples π€ 0 (t) associates a real number to each tuple t. This definition is equivalent to the one given earlier, by setting the tuple probability to p(t) = π€ 0 (t) 1 + π€ 0 (t) . In a tuple-independent database, a weight represents the odds, w = p 1βπ .
11
Lecture layout: Definitions & Background
INDB - Tuple independent database MLN - Markov Logic Network MarkoViews & MVDB Translating MVDB to INDB Experimental Evaluation Summary
12
Markov Logic Networks (MLNs)
A Markov Logic Network is a set L = {( πΉ 1 , π€ 1 ), ,( πΉ π , π€ π )}, where each πΉ π is a formula over a relational schema R, in First Order Logic, called a feature. π€ π is a weight. A grounding of a formula πΉ π is a formula where the free variables π₯ of πΉ π are substituted with some constants π , denote G( πΉ π ) So, the grounded MLN is G(L) = {(G, π€ π ) | β( πΉ π , π€ π ) β L : G β G( πΉ π )}
13
The semantics of an MLN L is the probabilistic database π« π³ = (W, P), where W = {I | I β Tup} and
P(I) = π πΌ /Z for all I β Tup π(I) is the weight of a possible world: π πΌ = πΊ,π€ βπΊ πΏ :πΌβ¨πΊ π€ Z is a partition function: π= πΌβπ»ππ π πΌ w > 1 means that worlds where the feature holds are more likely w < 1 means that worlds were the feature holds are less likely w = 1 means indifference. w = β is interpreted as a hard constraint
14
MLN examples 1. w=4.5: notSame( π 1 , π 2 ):- Person( π 1 , π 1 , π 1 , π 1 ) β Person( π 2 , π 2 , π 2 , π 2 ) β Β¬SameCountry( π 1 , π 2 ) w=0.5: Same( π 1 , π 2 ) :- Person( π 1 , π 1 , π 1 , π 1 ) β Person( π 2 , π 2 , π 2 , π 2 ) β Similar( π 1 , π 2 ) β Similar( π 1 , π 2 ) β Close( π 1 , π 2 ) We are more likely to have a world (Instance of R) where if 2 persons are not from the same country, they are not the same person. We are less likely to have a world (Instance of R) with 2 persons with same name who live close on the same city.
15
2. Consider the MLN consisting of features: (R(π), π€ 1 ),(S(π), π€ 2 )
2. Consider the MLN consisting of features: (R(π), π€ 1 ),(S(π), π€ 2 ). We remind that w = π /(1 β π) π = π€ /(1+ π€) This MLN defines a tuple-independent database, so the probabilities are R(π), S(π) S(π) R(π) β
Possible worlds π€ 1 π€ 2 π€ 2 π€ 1 1 weights π ( πΌ π ) 1 + π€ π€ π€ 1 π€ 2 = (1 + π€ 1 )(1+ π€ 2 ) Partition Z π€ 1 π€ 2 (1 + π€ 1 )(1+ π€ 2 ) π€ 2 (1 + π€ 1 )(1+ π€ 2 ) π€ 1 (1 + π€ 1 )(1+ π€ 2 ) 1 (1 + π€ 1 )(1+ π€ 2 ) P( πΌ π ) π 1 π 2 (1 β π 1 ) π 2 π 1 (1 β π 2 ) (1 β π 1 )(1 β π 2 ) P( πΌ π )
16
Lecture layout: Definitions & Background
INDB - Tuple independent database MLN - Markov Logic Network MarkoViews & MVDB Translating MVDB to INDB Experimental Evaluation Summary
17
Markov View (MarkoView)
π½ π π ππππ :βπΈ V is the view name Q is a Union of Conjunctive Query (UCQ) π₯ are variables π ππππ is an expression representing a non-negative weight MarkoViews are defined over a probabilistic databases, and introduce a correlation between all tuples in the lineage expression
18
Example: π 1 (id1,id2)[w= count(pid)/2] :- Advisor p (id1,id2), Student p (id1,year), Wrote(id1,pid), Wrote(id2,pid), Pub(pid,title,year) The more they published together while id1 was a student, the more likely id2 was his advisor
19
MarkoView Database (MVDB)
Let R be a relational schema. An MVDB is a triple (Tup, W, V) Tup is a set of possible tuples over the schema R W : Tup β [0, β] - weight function V is a set of MarkoViews Its semantics is given by the probabilistic database π· πΏ associated to the MLN L = {( πΉ π‘ , π€ π‘ ) | t β Tup βͺ π»ππ π½ } π»ππ π½ is the set of all possible tuples in all views.
20
MarkoView Database - example
Consider the MVDB consisting of features: (R(π), π€ 1 ),(S(π), π€ 2 ),(V (π), π€ 3 ), Where V (x)[ π€ 3 ] : βR(x), S(x) R(π), S(π) S(π) R(π) β
Possible worlds π€ 3 π€ 1 π€ 2 π€ 2 π€ 1 1 weights π ( πΌ π ) Z = 1 + π€ π€ 2 + π€ 3 π€ 1 π€ 2 Partition Z π€ 3 π€ 1 π€ 2 Z π€ 2 Z π€ 1 Z 1 Z P( πΌ π )
21
Lecture layout: Definitions & Background
INDB - Tuple independent database MLN - Markov Logic Network MarkoViews & MVDB Translating MVDB to INDB Experimental Evaluation Summary
22
Example Consider MVDB D with (R(π), π€ 1 ),(S(π), π€ 2 ), and the MarkoView V (x)[π€] : βR(x), S(x), where w is a constant. The four possible worlds have weights: 1, π€ 1 , π€ 2 , π€ π€ 1 π€ 2 if Q = R(a) β¨ S(a) , then Ο(Q) = π€ 1 + π€ 2 + π€ π€ 1 π€ 2 , and P(Q) = ( π€ 1 + π€ 2 + π€ π€ 1 π€ 2 )/(1 + π€ 1 + π€ 2 + π€ π€ 1 π€ 2 ).
23
Example cont. The INDB associated to D is π· 0 over R, S, NV: (R(π), π€ 1 ),(S(π), π€ 2 ),(NV (π), π€ 0 ) If defining W =R(a) β§ S(a) β§ NV (a), Then we get hard constraint Β¬W with the meaning: Β¬W = R(a), S(a) β V (a), where V(a) = Β¬NV(a) in matter that if V(a) is satisfied, Ο(I) gets a factor of w= π€ 0 Seven out of the eight possible worlds of the INDB satisfy Β¬W, and their weights are: π π π π π π π π 1 Β¬π΅π½(π) - π π π π π π π π π π π΅π½(π) (π+π π ) π π (π+π π ) π π π+π π Total:
24
Example cont. For this INDB β Ο 0 weight function
π 0 (= Ο 0 (true)) partition function π 0 probability function We want to compute P(Q) over the schema R, S, for some query Q over the MVDB, by translate it to query over INDB
25
Example cont. For example, Q = R(a) β¨ S(a)
Ο 0 (Q β§ Β¬W) = (1 + π€ 0 ) π€ 1 + (1 + π€ 0 ) π€ π€ 1 π€ = = (1 + π€ 0 ) Β· ( π€ π€ π€ 0 π€ 1 π€ 2 ) = = (1 + π€ 0 ) Β· Ο(Q) , when defining w= π€ 0 Therefore: P(Q) = Ο(Q) π = Ο 0 (Q β§ Β¬W) Ο 0 (Β¬W) = = π 0 (Q β§ Β¬W) π 0 (Β¬W) = π 0 (Q β¨ W) β π 0 (W) 1 β π 0 (W)
26
Translating MVDB to INDB
MVDB D = (π»ππ, w, V) Let NV ={ NV i | V π β V} The INDB associated to D is the following database over the schema RβͺNV: π· 0 = ( Tup 0 , π€ 0 ), Tup 0 =Tup βͺ Tup ππ Tup ππ ={ NV i ( π ) | V π ( π ) β Tup V π } π€ 0 (t) = w(t) if t β Tup 1βπ€ π (t) π€ π (t) if t β Tup π
27
Translating MVDB to INDB cont.
Let Q π be the UCQ defining the view V π . Then each W π is: W π = NV i ( π₯ π ) β§ Q π ( π₯ π ) And W = π W π Then, for every Boolean query Q, the following holds: P(Q) = π 0 (Q β¨ W) β π 0 (W) 1 β π 0 (W)
28
Lecture layout: Definitions & Background
INDB - Tuple independent database MLN - Markov Logic Network MarkoViews & MVDB Translating MVDB to INDB Experimental Evaluation Summary
29
Constructing and compiling MV-index
An MV-Index consists of a set of OBDD augmented with certain pre- computations and indices that we describe below. CUDD- a widely popular package for OBDDs. More details at - F. Somenzi. CUDD: CU Decision Diagram Package Release OBDD**: An Ordered Binary Decision Diagrams, is a rooted DAG, where internal nodes are labeled with Boolean variables and have two outgoing edges, labeled 0 and 1; sink nodes (leaves) are labeled 0 or 1. **More details at - R. E. Bryant. Symbolic manipulation of boolean functions using a graphical representation. In DAC, pages 688β694, 1985.
30
Experimental Evaluation
For experimental evaluation, an MV-index for MVDB was constructed, based on an extended CUDD package. The new approache was compared with Alchemy, the de-facto standard inference engine for MLN. It was also compared for construction with native CUDD.
31
Reminder of our old MVDB: Author(aid, name) FirstPub(aid,year) Wrote(aid, pid) DBLPAffiliation(aid,inst) Pub(pid, title, year) HomePage(aid, url) ππ‘π’ππππ‘ π (aid,year) [ π 1 ] π΄ππ£ππ ππ π (aid1,aid2) [ π 2 ] π΄πππππππ‘πππ π (aid,inst) [ π 3 ] π 1 (aid1,aid2)[count(pid)/2] :- Advisor p (aid1,aid2), Student p (aid1,year), Wrote(aid1,pid), Wrote(aid2,pid), Pub(pid,title,year)
32
Experimental Evaluation
Two main questions were asked: How do MarkoViews, and MV-index compare to other approaches for probabilistic inference on large Markov Networks? How effective is the MV-index construction algorithm compared to the standard approach for constructing OBDDs?
33
Alchemy vs MV for querying advisor of a student
34
Alchemy vs MV for querying all students of an advisor
35
Cudd vs MV : OBDD construction time
36
Lecture layout: Definitions & Background
INDB - Tuple independent database MLN - Markov Logic Network MarkoViews & MVDB Translating MVDB to INDB Experimental Evaluation Summary
37
Summary We made two contributions that allow queries to be processed very efficiently on such databases: First, and main contribution, is a translation from MarkoViews into tuple-independent databases. Second, compilation of the MarkoViews into OBDDs, which dramatically speeds up query execution.
38
Questions?
39
Some of the probabilities in π· 0 may be negative: if w > 1, then π€ 0 = (1βw)/w < 0, and the probability π 0 = π€ 0 /(1 + π€ 0 ) = 1 β w is negative. Negative probabilities have already been considered before. It has been proven that probability theory can be consistently extended to allow for negative probabilities, and there is interest in applying them to quantum mechanics and financial modeling Every query answer P(Q) will be a correct probability, in [0, 1], even if the probabilities P0 on the right are negative.
40
Link to the paper
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.