Finding Skyline Nodes in Large Networks
Evaluation Metrics: Distance from the query node. (John) Coverage of the Query Topics. (Big Data, Cloud Computing, Map Reduce) Motivation Finding Skyline Nodes in Large Networks 2
Homogeneous Approach ? Finding Skyline Nodes in Large Networks 3 Score = λ. Distance + (1- λ ). Coverage How to get λ ?
Weighted Set Cover ? Finding Skyline Nodes in Large Networks 4 Find nodes with smallest aggregate distance from the query node, such that they cover all query topics. Ignore some interesting nodes. Cannot rank the results. abc abcacd abcde Q = { a, b, c } u1u1 u2u2 u3u3 u4u4 u5u5 u6u6 u7u7 u8u8 u 0 = q
Graph Skyline Finding Skyline Nodes in Large Networks 5 Dominance on Coverage: u > c v Query topics covered by node u is a superset of the query topics covered by node v. Dominance on Distance: u > d v Distance of u from q is less than that of v from q. Dominance: u > v (1) u > c v and u ≥ d v ; or (2) u ≥ c v and u > d v. abc abcacd abcde Q = { a, b, c } u1u1 u2u2 u3u3 u4u4 u5u5 u6u6 u7u7 u8u8 u 0 = q
Ranking of Skyline Nodes Finding Skyline Nodes in Large Networks 6 Too many skyline nodes. Rank them. abc abcacd abcde Q = { a, b, c } u1u1 u2u2 u3u3 u4u4 u5u5 u6u6 u7u7 u8u8 u 0 = q Dominance Count: # nodes dominated by a skyline node. [Lin et. al., ICDE ‘07] Higher Dominance Count => more pruning from candidate set. 1. DC(u 4 ) = {u 5, u 6, u 7 }, 2. DC(u 1 ) = {u 5 } 3. DC(u 2 ) = Φ; 4. DC(u 3 ) = Φ
Algorithm Finding Skyline Nodes in Large Networks 7 Construct a Query DAG. Three variables associated with each DAG node: Count (C), Dominance (D), Traversal (T). abc abcacd abcde Q = { a, b, c } u1u1 u2u2 u3u3 u4u4 u5u5 u6u6 u7u7 u8u8 u 0 = q abc abacbc a bc Input NetworkQuery DAG Naïve Complexity: O(n2 r ) Complexity with Preprocessing: O(nr 2 ) C = 0 D = - T = - C = 2 D = - T = - C = 0 D = - T = - C = 2 D = - T = - C = 0 D = - T = - C = 1 D = - T = - C = 2 D = - T = -
Query DAG Construction Finding Skyline Nodes in Large Networks 8 abc abcacd abcde Q = { a, b, c } u1u1 u2u2 u3u3 u4u4 u5u5 u6u6 u7u7 u8u8 u 0 = q ab a b c u4u4 u7u7 u1u1 u5u5 u2u2 u3u3 u4u4 u6u6 u7u7
Query DAG Construction (cont.) Finding Skyline Nodes in Large Networks 9 abc abcacd abcde Q = { a, b, c } u1u1 u2u2 u3u3 u4u4 u5u5 u6u6 u7u7 u8u8 u 0 = q ab a b c u1u1 u5u5 u2u2 u3u3 u4u4 u6u6 u7u7 abc
Query DAG Construction (cont.) Finding Skyline Nodes in Large Networks 10 abc abcacd abcde Q = { a, b, c } u1u1 u2u2 u3u3 u4u4 u5u5 u6u6 u7u7 u8u8 u 0 = q ab ab c u1u1 u5u5 u2u2 u3u3 u4u4 u6u6 u7u7 abc ac bc
Find Dominance Variable Finding Skyline Nodes in Large Networks 11 Perform a topological ordering of the DAG nodes to evaluate the Dominance variable (D) of each DAG node. # Nodes dominated (or equal) by coverage. abc abcacd abcde Q = { a, b, c } u1u1 u2u2 u3u3 u4u4 u5u5 u6u6 u7u7 u8u8 u 0 = q abc abacbc a bc Input NetworkQuery DAG Naïve Complexity: O(n2 r ) Complexity by Topological Ordering: O(3 r ) C = 0 D = 3 T = - C = 2 D = 2 T = - C = 0 D = 4 T = - C = 2 D = 7 T = - C = 0 D = 3 T = - C = 1 D = 1 T = - C = 2 D = 2 T = -
Find Traversal Variable Finding Skyline Nodes in Large Networks 12 Perform a Breadth First Search (BFS) starting from the query node. # Nodes not dominated by distance. abc abcacd abcde Q = { a, b, c } u1u1 u2u2 u3u3 u4u4 u5u5 u6u6 u7u7 u8u8 u 0 = q abc abacbc a bc Input NetworkQuery DAG Complexity by BFS: O(n+e) C = 0 D = 3 T = 0 C = 2 D = 2 T = 2 C = 0 D = 4 T = 0 C = 2 D = 7 T = 1 C = 0 D = 3 T = 0 C = 1 D = 1 T = 1 C = 2 D = 2 T = 2 h =2
Find Skyline Nodes Finding Skyline Nodes in Large Networks 13 Store DAG nodes into a Lookup Table. Skyline Bit for each DAG node. Helps to prune non-skyline nodes directly. abc abcacd abcde Q = { a, b, c } u1u1 u2u2 u3u3 u4u4 u5u5 u6u6 u7u7 u8u8 u 0 = q abc abacbc a bc Input Network Query DAG h =1 abc0 ab0 ac0 bc0 a1 b1 c1 Lookup Table abc
Find Skyline Nodes (cont.) Finding Skyline Nodes in Large Networks 14 abc abcacd abcde Q = { a, b, c } u1u1 u2u2 u3u3 u4u4 u5u5 u6u6 u7u7 u8u8 u 0 = q abc abacbc a bc Input Network Query DAG h =2 abc1 ab1 ac1 bc1 a1 b1 c1 Lookup Table Store DAG nodes into a Lookup Table. Skyline Bit for each DAG node. Helps to prune non-skyline nodes directly.
Dominance Count of Skyline Nodes Finding Skyline Nodes in Large Networks 15 abc abcacd abcde Q = { a, b, c } u1u1 u2u2 u3u3 u4u4 u5u5 u6u6 u7u7 u8u8 u 0 = q abc abacbc a bc Input Network Query DAG h =2 abc1 ab1 ac1 bc1 a1 b1 c1 Lookup Table C = 0 D = 3 T = 0 C = 2 D = 2 T = 1 C = 0 D = 4 T = 0 C = 2 D = 7 T = 0 C = 0 D = 3 T = 0 C = 1 D = 1 T = 1 C = 2 D = 2 T = 1 DC(u 4 ) = D(abc)-T(abc)-T(ab)-T(ac)-T(bc)-T(a)-T(b)-T(c)-1 = 3 Top-k Buffer to store top-k skyline nodes.
Pruning and Early Termination Finding Skyline Nodes in Large Networks 16 DC(u 4 ) = D(abc)-T(abc)-T(ab)-T(ac)-T(bc)-T(a)-T(b)-T(c)-1 = 3 Top-k Buffer to store top-k skyline nodes.
Experimental Results Finding Skyline Nodes in Large Networks 17 DC(u 4 ) = D(abc)-T(abc)-T(ab)-T(ac)-T(bc)-T(a)-T(b)-T(c)-1 = 3 Top-k Buffer to store top-k skyline nodes.
Efficiency Finding Skyline Nodes in Large Networks 18 DC(u 4 ) = D(abc)-T(abc)-T(ab)-T(ac)-T(bc)-T(a)-T(b)-T(c)-1 = 3 Top-k Buffer to store top-k skyline nodes.
Conclusion and Future Works Finding Skyline Nodes in Large Networks 19 DC(u 4 ) = D(abc)-T(abc)-T(ab)-T(ac)-T(bc)-T(a)-T(b)-T(c)-1 = 3 Top-k Buffer to store top-k skyline nodes. Efficient Algorithm to find top-k skyline nodes in large attributed network. Required experimental evaluation in real and synthetic datasets. Time Complexity is linear in the number of nodes and edges in the network. Distance based indexing might improve the efficiency. Top-k Skyline set instead of Top-k Skyline nodes might be more effective.
Questions Finding Skyline Nodes in Large Networks 20 DC(u 4 ) = D(abc)-T(abc)-T(ab)-T(ac)-T(bc)-T(a)-T(b)-T(c)-1 = 3 Top-k Buffer to store top-k skyline nodes. Thank You ! ! !