Presentation is loading. Please wait.

Presentation is loading. Please wait.

Business Intelligence on Complex Graph Data Dritan Bleco Yannis Kotidis Department.

Similar presentations


Presentation on theme: "Business Intelligence on Complex Graph Data Dritan Bleco Yannis Kotidis Department."— Presentation transcript:

1 Business Intelligence on Complex Graph Data Dritan Bleco Yannis Kotidis (dritanbleco@aueb.gr) (kotidis@aueb.gr)dritanbleco@aueb.grkotidis@aueb.gr Department of Informatics Athens University Of Economics and Business BEWEB 2012 Berlin

2 Dritan Bleco Outline Motivation Graph Data Model Operators on Graph Data Querying Graph Records Query Rewrites Experiments Conclusions

3 Motivational Example A Supply Chain Management (SCM) application Tracks the different routes that articles of a customer order follows from production lines to the consumer hands Multiple warehouses are located among the production lines and the shipping points and can stage the products while the order is being assembled RFID readers are used to keep track of the location of the articles An order follows one or more paths so our web supply chain application produces graph like data Dritan Bleco

4 A :1 D :8 F B G H Start NodeEnd NodeMeasure AB15 AC11 CE7 BE5 BD9 EE8 E ED4 EF2 CG43 DH6 NodeLocation AThessaloniki CTrikala BLamia DAthens E F GIraklion HKalamata FG9 F H4 C CC1 Production Lines Warehouses Shipping Points 11 15 5 9 7 2 4 43 9 4 6 Dritan Bleco

5 A C:1 D :8 F B G H E NodeLocation AThessaloniki CTrikala BLamia DAthens E F GIraklion HKalamata 11 15 43 7 5 9 2 9 4 6 Q1: What is the total order completion time? The longest path between nodes A and G,H Q2: What is the total processing time for parts that are shipped through warehouses located in Athens? The longest path between nodes A and G,H considering only paths that transverse at least one location in Athens. 4 Dritan Bleco

6 A C:1 D :8 F B G H E 11 15 43 7 5 9 2 9 4 6 4 U Aggregate Nodes Aggregate Node U coalesces Warehouses located in Athens Set In(U) contains the set of nodes of U that have at least one incoming edge from nodes that do not belong U: In(U)={D, E} Set Out(u) contains nodes in U that have at least one outgoing edge towards a node that does not belong to U: Out(U)={D,F} A single node can be abstracted as an aggregate node whose internal structure is not revealed to the query: E =[in(E),out(E)] :8 Dritan Bleco

7 A D :8 F B G H E 11 15 43 7 5 9 2 9 4 6 4 Path C:1 Different simple Paths Dritan Bleco C:1 11 7 (ACE) starting from out(A) end ending to in(E) (internal measure 8 is not included) (ACE] starting from out(A) end ending to out(E) (internal measure 8 is included) (AE) E:8 (AE]

8 A D :8 F B G H E 11 15 43 7 5 9 2 9 4 6 4 Node C:1 Different simple Paths Dritan Bleco (ACE) starting from out(A) end ending to in(E) (internal measure 8 is not included) (ACE] starting from out(A) end ending to out(E) (internal measure 8 is included) Starting from in(E) end ending to out(E) [in(E),out(E)]= E E - Path [EE] E:8

9 A D :8 F B G H E 11 15 43 7 5 9 2 9 4 6 4 Composite Path [AE]* C:1 Composite Paths: Paths with same Starting and Ending Node [A,E]* ={ [ACE], [ABE] } Dritan Bleco A C:1 E:8 B 11 7 5 15

10 A D :8 F B G H E 11 15 43 7 5 9 2 9 4 6 4 U Composite Path [A in(u))* C:1 Composite Paths: Paths with same Starting and Ending Node [A,E]* ={ [ACE], [ABE] } [A, in(U))* ={ [ACE),[ABE),[ABD)} Dritan Bleco A C:1 B 11 7 5 15 9

11 A D :8 F B G H E 11 15 43 7 5 9 2 9 4 6 4 U Composite Path [in(U)out(U)]* C:1 Composite Paths: Paths with same Starting and Ending Node [A,E]* ={ [ACE], [ABE] } [A, in(U))* ={ [ACE),[ABE),[ABD)} [in(U),out(U)]* ={ [EF],[ED] } Dritan Bleco F E:8 D 2 4

12 A D :8 F B G H E 11 15 43 7 5 9 2 9 4 6 4 Operators on Graph Data C:1 Path-join operator concatenates two paths p1 and p2 1.Ending node of p1 is the same as the starting node of p2 2.One of the two paths is open-ended at the common end-point. [ACE) [EFG]= [ACEFG] Dritan Bleco A C:1 E:8 F G 11 7 2 9

13 A D :8 F B G H E 11 15 43 7 5 9 2 9 4 6 4 U Operators on Graph Data C:1 Path-join operator concatenates two paths p1 and p2 1.Ending node of p1 is the same as the starting node of p2 2.One of the two paths is open-ended at the common end-point. [ACE) [EFG]= [ACEFG] [Pr, in(U)) [in(U), out(U)] (out(U), Sr] PrPr SrSr Dritan Bleco A C:1 E:8 F D 11 7 2 4 15 B 5 9 G H 9 4 6

14 A D :8 F B G H E 11 15 43 7 5 9 2 9 4 6 4 Operators on Graph Data C:1 π p (r) Path projection operator projects the record on the edges defined in path p, while retaining their measures. Π [ACE) (r)={(A,C):11, (C,C):1, (C,E):7 } Dritan Bleco A C:1 11 7 15 B 5 9 The projection of a record on a composite path is computed as a set containing the projections into the constituent paths. Π [AE)* (r)={ {(A,C):11, (C,C):1, (C,E):7 }, {(A,B):15, (B,E):5 } }

15 A D :8 F B G H E 11 15 43 7 5 9 2 9 4 6 4 U BI on Graph Data C:1 Intra-Path Aggregate Function F p (r) : applied on the measures resulting from the projection of record r on path p Sum [ACE) (r)=[ACE):19 Sum [AE)* (r)={ [ACE):19, [ABE):20} PrPr SrSr Inter-Path Aggregate Function G(F p (r) ): consolidates the result(s) obtained via Inter- Path aggregation. Max(Sum [ACE)* (r) )=Max({ [ACE):19, [ABE):20 })=[ABE):20 Max(Sum [Pr, Sr]* (r)) returns the order completion time for the order depicted in record r Dritan Bleco [ABE):20 [ACE):19

16 A D :8 F B G H E 11 15 43 7 5 9 2 9 4 6 4 U Queries using operators C:1 PrPr SrSr Dritan Bleco

17 A D :8 F B G H E 11 15 43 7 5 9 2 9 4 6 4 U Query Rewrite C:1 PrPr SrSr Dritan Bleco A C:1 E:8 F D B G H ΜΑΧ( SUM [Pr, in(U)) [in(U), out(U)] (out(U, Sr] (r)) = MAX (SUM [Pr, in(U)) (r) SUM SUM [in(U), out(U)] (r) SUM SUM (out(U, Sr] (r)) Generally G(F p=p1 p2 (r)) = G ( F p1 (r) H F p2 (r) ) : pushing intra-path on a path

18 A D :8 F B G H E 11 15 43 7 5 9 2 9 4 6 4 U Query Rewrite C:1 PrPr SrSr Dritan Bleco A C:1 E:8 F D B G H ΜΑΧ( SUM [Pr, in(U)) [in(U), out(U)] (out(U, Sr] (r)) = MAX (SUM [Pr, in(U)) (r) SUM SUM [in(U), out(U)] (r) SUM SUM (out(U, Sr] (r)) MAX ({[ABE):20,[ACE):19,[ABD):24} SUM { [EF] :10,[ED]:12,[DD]:0} SUM { (FG] :9, (FH]:4, (DH]:6}) MAX( {[ABEFG]:39, [ACEFG]:38, [ABEFH]:34, [ACEFH]:33, [ABEDH]:38, [ABEDH]:37 [ABDH]:30} ) = [ABEFG]:39 [ACE):19 [ABE):20 [ABD):24 [EF]:10 [ED]:12 [DD]:0 (FG]:9 (FH]:4 (DH]:6

19 A D :8 F B G H E 11 15 43 7 5 9 2 9 4 6 4 U Query Rewrite (I) C:1 PrPr SrSr For Max, Min aggregate function : MAX (SUM [Pr, in(U)) (r) SUM SUM [in(U), out(U)] (r) SUM SUM (out(U, Sr] (r)) = Max(MAX δ (SUM [Pr, in(U)) (r)) SUM MAX δ (SUM [in(U), out(U)] (r)) SUM MAX δ (SUM (out(U, Sr] (r)) ) Generally G(F p=p1 p2 (r)) = G (G δ (F p1 (r)) H G δ (F p2 (r)) ) : pushing inter-path on a path ΜΑΧ( SUM [Pr, in(U)) [in(U), out(U)] (out(U, Sr] (r)) = MAX (SUM [Pr, in(U)) (r) SUM SUM [in(U), out(U)] (r) SUM SUM (out(U, Sr] (r)) Generally G(F p=p1 p2 (r)) = G ( F p1 (r) H F p2 (r) ) : pushing intra-path on a path Dritan Bleco

20 A D :8 F B G H E 11 15 43 7 5 9 2 9 4 6 4 U Query Rewrite (II) C:1 PrPr SrSr SUM [Pr, in(U)) (r)={[ACE):19,[ABE):20,[ABD):24} =>Max δ (SUM [Pr, in(U)) (r))={[ABE):20,[ABD):24} SUM [in(U),out(U)] (r)= Max δ (SUM [in(U),out(U)] (r)) ={ [EF] :10, [ED]:12, [DD]:0 } SUM (out(U), Sr] (r)= Max δ (SUM (out(U), Sr] (r)) ={ (FG] :9, (FH]:4, (DH]:6 } MAX ({[ABE):20,[ABD):24} SUM { [EF] :10,[ED]:12,[DD]:0} SUM { (FG] :9, (FH]:4, (DH]:6}) MAX( {[ABEFG]:39, [ABEFH]:34, [ABEDH]:38, [ABDH]:30} ) = [ABEFG]:39 Dritan Bleco

21 Experiments (I) Two real Schema Graphs: 1.* BAY: Depicts San Francisco Bay Area roads and 2.**Gnutella: Describes connections among Gnutella hosts from August 2002. 120 million records are synthesized and assigned random real values to the labels of each record. Experimental evaluation using the PBS (Pick By Size) Queries 50% intra-path and 50% inter-path chosen with zipf or unif. Independent evaluation of the Cost via the total number of tuples that need to be retrieved * http://www.dis.uniroma1.it/~challenge9/download.shtmlhttp://www.dis.uniroma1.it/~challenge9/download.shtml ** http://snap.stanford.edu/data/p2p-Gnutella05.html

22 Experiments (II) PBS, Bay Data Set, Uniform 100 QueriesPBS, Bay Data Set, Zipf 100 Queries PBS-1 considers only intra-path materialized aggregates PBS-2 considers only inter-path materialized aggregates PBS selects and materializes both types of views depending on the query workload Dritan Bleco

23 Experiments (III) PBS, Gnutella Data Set, Uniform 100 Queries PBS, Gnutella Data Set, Zipf 100 Queries PBS-1 considers only intra-path materialized aggregates PBS-2 considers only inter-path materialized aggregates PBS selects and materializes both types of views depending on the query workload Dritan Bleco

24 Experiments (IV) Varying Query Mix, BAY Data Set, Uniform Queries Varying Query Mix, BAY Data Set, Zipf Queries Mix of intra-/inter-path queries in the BAY dataset for a fixed budget of 20%. For inter-paths queries PBS and PBS-2 have the same performance For only intra-path queries PBS-1 and PBS give the best performance. PBS that considers both types of views provides consistently the largest reduction in query cost. Dritan Bleco

25 Conclusions A framework for modeling analytical queries in a graph database independent of the underlying storage representation of the records the query language used Our framework Permits rewriting of complex aggregations into smaller computational units Enables cost-based query optimization and pre- computation of frequently used calculations. Experimental results show that proper selection of materialized views can provide substantial gains in a large data warehouse containing millions of graph records. Dritan Bleco

26 Thank you, Questions? Dritan Bleco


Download ppt "Business Intelligence on Complex Graph Data Dritan Bleco Yannis Kotidis Department."

Similar presentations


Ads by Google