Unified Algorithm to Solve Several Graph Problems with Relational Queries Wellington Cabrera, Carlos Ordonez (presenter) University of Houston, USA
Motivation Graph problems are among the most challenging problems in big data analytics (social networks, WWW, transportation networks). Are specialized graph systems (e.g. Giraph) required to analyze big graphs? Lot of data stored in relational databases. Query processing: studied for a long time.
Definitions Let G=(V,E) , E is stored in a relational table E(i,j,v) Table E corresponds to the adjacency matrix E, omitting zeroes weights/distances are represented by v If |E|= O(n), we say that E is sparse Let S be a vector of n graph vertices, stored on table S(j,v): v: distance, reachability, order, probability We omit v values with no information (like inifinity for distances, 0 for probabilities)
Example: Directed Graph 2 3 6 2 2 1 3 2 1 2 4 3 2 3 1 7 5 3
Bellman-Ford Reachability Topological Sort Page Rank Graph Algorithms Bellman-Ford Reachability Topological Sort Page Rank Main idea: These algorithms can be expressed as a sequence of vector-matrix multiplications How can they work in a relational database?
Graph algorithms over a semi-ring:
Algorithm Pattern:
Example: Vector-Matrix Multiplication with SQL queries Vector-Matrix Multiplication (+ ,* ) semiring SELECT S.j, sum(S.v * E.v) FROM Sd-1 as S join E on S.j=E.i GROUP BY j Vector-Matrix Multiplication (min, +) semiring SELECT S.j, min(S.v + E.v) FROM Sd-1 as S JOIN E on S.j=E.i In general SELECT S.j, g()(S.v ⊕ E.v)
Bellman Ford Input: Table E Output: Table Sd ( Vector with shortest distances from a source)
Reachability Input: Table E Output: Table Rd
PageRank Input: Table E Output: Table Sd ( Vector with shortest distances from a source)
Topological Sort Input: Table E Output: Table Sd ( Vector with shortest distances from a source)
Comparison of 4 algorithms:
Unified Algorithm Input: E, S0, R0, f(), g(), ⨂, ε, unionFlag Optional Input: s Output: Rd
Conclusions Graph algorithms are expressed as an iteration of SPJA queries External algorithms Not limited by RAM Strengths Sparse storage Early termination, when possible Lightweight relational queries Unified Algorithm Solves 4 important and diverse graph problems Future work: more graph algorithms
References C. Ordonez, W. Cabrera, and A. Gurram. Comparing columnar, row and array DBMSs to process recursive queries on graphs, Information Systems journal, 2016 (accepted).