CPT-S 580-06 Advanced Databases 11 Yinghui Wu EME 49.

CPT-S 580-06 Advanced Databases 11 Yinghui Wu EME 49

Scalable machine learning - part II: graph algorithms 22 Adapted from KDD tutorial, Ron Bekkerman et. al http://hunch.net/~large_scale_survey & Eric Xing’s talk https://petuum.github.io/research.html http://hunch.net/~large_scale_survey

Learning/Mining in big graphs Learning/mining in big graphs –network of companies & board-of- directors members –‘viral’ marketing –web-log (‘blog’) news propagation –computer network security: email/IP traffic and anomaly detection –… A case study: large scale graph ML/mining –pattern mining –Loopy belief propagation

Parallel graph pattern mining 44

5 network and graph mining How does the Internet/Web look like? What is ‘normal’/‘abnormal’? which patterns/laws hold?

Graph mining is expensive! 6 Graph data is large and complex – Data driven, unstructured data, poor locality, high explorative Mining task is complex: – Fuzzy queries, Label ambiguity, topology constraints – Can go beyond FO (reachability and regular path with recursion) – Computationally expensive (NP-hard) Non-trivial metrics (support, confidence, significance, similarity, transformation…) –nice features e.g., anti-monotonicity and submodularity may not hold Inherently hard to parallel! -- DFS, cycle detection

What are the options? 7 Hadoop/MapReduce –Simple queries, not very good for iterative algorithms –Example: node/edge aggregation; ranking Vertex-centric –Mining tasks with high locality –Example: neighborhood mining, kNN, local classifier, bounded exploration, random-walk based Graph-centric –Mining tasks that require “global” information –A generalized model for vertex-centric –SSSP, pattern matching

Parallel/distributed graph mining Given a big graph G, and n processors S1, …, Sn G is partitioned into fragments (G1, …, Gn) G is distributed to n processors: Gi is stored at Si Dividing a big G into small fragments of manageable size Each processor Si processes operations for a (sub) mining task on its local fragment Gi, in parallel T( ) G G G1G1 G1G1 GnGn GnGn G2G2 G2G2 … 8 Parallel Scalable? Load Balancing? Minimize communication? Minimize makespan? Skewed graphs? Parallel Scalable? Load Balancing? Minimize communication? Minimize makespan? Skewed graphs?

Association via graph patterns (cont.) more involved than rules for itemsets! 9 acc #1 “fake” acc #2 “claim a prize” (keywords) article blog detect fake account Question 1: How to define association rule via graph patterns? Question 2: How to discovery interesting rules? Question 3: How to use the rules to identify customers? Identify customers for released album x x1x1 Ecuador “Shakira” album x2x2

Graph Pattern Association Rules (GPARs) 10 graph-pattern association rule (GPAR) R(x, y) R(x, y): Q(x, y) ⇒ q(x, y) Q(x, y): a graph pattern; where x and y are two designated nodes q(x, y) is an edge labeled q from x to y (predicate) Q and q as the antecedent and consequent of R, respectively. R(x, French restaurant ): x x’ French restauran t city French 3 restaurant x x’ French restauran t city French 3 restaurant ⇒ : x French restauran t Q(x, French restaurant ) ⇒ like(x, French restaurant )

Support and Confidence 11 Support of R(x, y): Q(x,y) p(x,y) ⇒ u1u1 Le Bernadin New York (city) French 3 restaurant u2u2 u3u3 Per se French 3 restaurant ⇒ x x’ French restauran t city French 3 restaurant # of isomorphic subgraph in single graph?

Support and Confidence 12 Confidence of R(x, y) Candidate # of x with one edge of type q but is not a match for q(x,y) Local closed world assumption v2v2 x x1x1 Ecuador Shakira album x2x2 v v1v1 Ecuador Shakira album MJ’s album v'v' v'' hobby Pop music “positive” “negative” “unknown”

Association rule discovery 13 (Bi-criteria Diversification function) Rule discovery problem: Given a graph G, an object function F(.), a number k, and pattern constraints/conditions C: Σ= Discover (G, F, k, C) s.t argmax f(Σ) (|Σ| = k) Example: - top-k pattern mining: find k most interesting patterns - top-k closed pattern mining - top-k diversified pattern mining

Discover GPARs 14 The diversified mining problem: Given a graph G, a predicate q(x, y), a support bound and positive integers k and d, find a set S of k nontrivial GPARs pertaining to q(x, y) such that (a) F(S) is maximized; and (b) for each GPAR R ∈ S, supp(R,G) ≥ α and r(PR, x) ≤ d. Mining GPARs for particular event – often leads to same group of entities Difference function Bi-criteria Diversification function

Diversified associated rule discovery 15 x x French restaurant city French 2 restaurant y u1u1 Le Bernadin New York (city) u2u2 u3u3 Per se French 3 restaurant u4u4 Patina LA (city) u5u5 u6u6 French 3 restaurant Asian restaurant X’ x French restaurant city Asian restaurant y Diversified GPARs

16 Apriori-Based Approach … G G1G1 G2G2 GnGn k-edge (k+1)-edge G’ G’’ Join Prune check the frequency of each candidate G1G1 GnGn Subgraph isomorphism test NP-complete

Parallel GPAR Discovery 17 A parallel discovery algorithm coordinator S c divides G into n-1 fragments, each assigned to a processor S i discovers GPARs in parallel by bulk synchronous processing in d rounds S c posts a set M of GPARs to each processor each processor generates GPARs locally by extending those in M new GPARs are collected and assembled by S c in the barrier synchronization phase; S c incrementally updates top-k GPARs set L k coordinator worker 1 worker 2 worker n … R1R1 R2R2 … RkRk LkLk 1. Distribute M i 2. Locally expand M i 3. Synchronization/ Update L k Parallel scalable? Load Balancing? Communication cost? Parallel scalable? Load Balancing? Communication cost? … R R1R1 R2R2 RnRn R’ R’’ R1R1 RnRn

Strategies to improve the scalability 18 Bottlenecks? –# of candidates –Cost for verification –Rounds of supersteps –Messages/communication cost –Makespan/skewness Strategies? –Pattern filtering/reduction –Incremental evaluation/indexes –Bounded search –Message grouping –Load balancing Advantages? –“fragment”-level optimization

Recall Incremental query answering 19 Minimizing unnecessary recomputation Incremental query processing: Input: Q, G, Q(G), ∆G Output: ∆M such that Q(G ⊕ ∆G) = Q(G) ⊕ ∆M Changes to the output New output Changes to the input Old output When changes ∆G to the graph G are small, typically so are the changes ∆M to the output Q(G ⊕ ∆G) Changes ∆G are typically small Compute Q(G) once, and then incrementally maintain it Real-life data is dynamic – constantly changes, ∆G Re-compute Q(G ⊕ ∆G) starting from scratch?

Reduce verification cost 20 worker 1 Gi, Q(Gi) … R R1R1 R2R2 RnRn R’ R’’ R1R1 RnRn A (revised ) Incremental query processing: Input: Q, G, Q(G), ∆Q, ∆G Output: ∆M such that (Q ⊕ ∆Q)(G ⊕ ∆G) = Q(G) ⊕ ∆M worker 1 (∆Q, ∆G) (Q ⊕ ∆Q)(G ⊕ ∆G) ∆M Incremental verification

Recall Graph Compression 21 Q( ) G G Gc Input: A directed graph G, and a graph pattern Q Output: the maximum simulation relation R Bisimulation: a binary relation B over V of G, such that for each node pair (u,v) ∈ B, L(u) = L(v) for each edge (u,u’) ∈ E, there exists (v,v’) ∈ E, s.t. (u’,v’) ∈ B, for each edge (v,v’) ∈ E, there exists (u,u’) ∈ E, s.t. (u’,v’) ∈ B Equivalence relation Rb: the unique maximum bisimulation relation Bisimulation preserves results for subgraph isomorphism! Compress G by leveraging the equivalence relation

Load Balancing 22 A Generalized Assignment problem –Given n workers, k jobs, each incurs a cost Cij and a time Ti [Ck1, Tk] –Find best assignment that minimize total cost and makespan? efficient approximation algorithm exists. coordinator worker 1 worker 2 worker n R1R1 R2R2 … RkRk [C1n, T1]

Parallel message passing-based algorithms 23

Loopy belief propagation Invented in 1982 [Pearl] to calculate marginals in Bayes nets. Also used to estimate marginals (=beliefs), or most likely states Iterative process in which neighbor variables “talk” to each other, passing messages “I (variable x1) believe you (variable x2) belong in these states with various likelihoods…” When consensus reached, calculate belief

Sum-Product Algorithm aka belief update Suppose the factor graph is a tree. For the tree to the left, we have: P(X) = f 1 (x 1,x 2 )f 2 (x 2,x 3,x 4 )f 3 (x 3,x 5 )f 4 (x 4,x 6 ) Then marginalization (for example, computing P(x 1 )) can be sped up by exploiting the factorization: P(x 1 ) =  f 1 (x 1,x 2 )f 2 (x 2,x 3,x 4 )f 3 (x 3,x 5 )f 4 (x 4,x 6 ) =  f 1 (x 1,x 2 ) (  f 3 (x 3,x 5 )) (  f 4 (x 4,x 6 )) x 2,x 3,x 4,x 5,x 6 x 2,x 3,x 4 x5x5 x6x6 Quickly computes every single-variable marginal P(x n ) from a tree graph

Message Passing for Sum-Product We can compute every marginal P(x n ) quickly using a system of message passing: Message from variable node n to factor node m: v n,m (x n ) =   i,n (x n ) Message from factor node m to variable node n:  m,n (x n ) =  [f s (x N(s) )  v k,m (x k )] Marginal P(x n ): P(x n ) /   m,n (x n ) Each node n can pass a message to neighbor m only once it has received a message from all other adjacent nodes. Intuitively, each message from n to m represents P(x m |S n ), where S n is the set of all children of node n. i in N(n) \ n k in N(n) \ n x N(n) \ n m in N(n) i k

Loopy Belief Propagation (Loopy BP) Iteratively estimate the “beliefs” about vertices –Read in messages –Updates marginal estimate (belief) –Send updated out messages Repeat for all variables until convergence 27

Bulk Synchronous Loopy BP Often considered embarrassingly parallel –Associate processor with each vertex –Receive all messages –Update all beliefs –Send all messages Proposed by: –Brunton et al. CRV’06 –Mendiburu et al. GECC’07 –Kang,et al. LDMTA’10 –…–… 28

Sequential Computational Structure 29

Hidden Sequential Structure 30

Hidden Sequential Structure Running Time: Evidence Time for a single parallel iteration Time for a single parallel iteration Number of Iterations 31

Optimal Sequential Algorithm Forward-Backward Bulk Synchronous 2n 2 /p p ≤ 2n Running Time 2n Gap p = 1 n p = 2 32

The Splash Operation Generalize the optimal chain algorithm: to arbitrary cyclic graphs: ~ 1)Grow a BFS Spanning tree with fixed size 2)Forward Pass computing all messages at each vertex 3)Backward Pass computing all messages at each vertex 33

Data-Parallel Algorithms can be Inefficient The limitations of the Map-Reduce abstraction can lead to inefficient parallel algorithms. Optimized in Memory Bulk Synchronous Asynchronous Splash BP

Many more graph-Parallel Algorithms Collaborative Filtering –Alternating Least Squares –Stochastic Gradient Descent –Tensor Factorization Structured Prediction –Loopy Belief Propagation –Max-Product Linear Programs –Gibbs Sampling Semi-supervised M L –Graph SSL –CoEM Community Detection –Triangle-Counting –K-core Decomposition –K-Truss Graph Analytics –PageRank –Personalized PageRank –Shortest Path –Graph Coloring Classification –Neural Networks 35

User case: collaborative classification inferencing 36

Collective classification (CC) Anomaly detection as a classification problem  spam/non-spam email, malicious/benign web page, fraud/legitimate transaction, etc. Often connected objects  guilt-by-association Label of object o in network may depend on:  Attributes (features) of o  Labels of objects in o’s neighborhood  Attributes of objects in o’s neighborhood CC: simultaneous classification of interlinked objects using above correlations

Problem sketch ? ? Graph (V, E) Nodes as variables  X: observed  Y: TBD Edges  observed relations Goal: label Y nodes nodes; web pages, edges; hyperlinks, labels; SH or CH: student/course page; features nodes are keywords; ST: student, CO: course, CU:curriculum, AI: artificial intelligence

Chakrabarti+’98, Taskar+’02 Lafferty+’01 Taskar+’03 Collective classification applications Document classification Part of speech tagging Link prediction Optical character recognition Image/3Ddata segmentation Entity resolution in sensor networks Spam and fraud detection Pandit+’07, Kang+’11 Taskar+’03 Anguelov+’05, Chechetka+’10 Chen+’03

Inferencing anomalies Given: –Observed variables Y –Hidden variables X –Some model of P(X,Y) make some analysis of P(X|Y): –Estimate marginal P(S) for subset S in X –Minimal Mean Squared Error configuration (MMSE) This is just E[X|Y] –Maximum A-Posteriori configuration (MAP) –N most likely configurations –Minimum Variance (MVUE)

Collective classification inference Exact inference is NP hard for arbitrary networks Approximate inference techniques  Relational classifier Macskassy&Provost’03,07  Iterative classification alg. (ICA) Neville&Jensen’00, Lu&Getoor’03, McDowell+’07  Gibbs sampling IC Gilks et al. ‘96  Loopy belief propagation Yedidia et al. ‘00 All the above are iterative

Iterative classification Main idea: classify node Y i based on its attributes as well as neighbor set N i ’s labels Convert each node Y i to a flat vector a i Various #neighbors  aggregation count mode proportion mean exists

Iterative classification Main idea: classify Y i based on N i  Convert each node Y i to a flat vector a i Various #neighbors  aggregation  Use local classifier f(a i ) (e.g., SVM, kNN, …) to compute best value for y i  Repeat for each node Y i Reconstruct feature vector a i Update label to f(a i ) (hard assignment)  Until class labels stabilize or max # iterations Note: convergence not guaranteed B o o t strap IC

Iterative classification count a1a1 t=0 a2a2 t=0t=0

Iterative classification a1a1 t=1 f(a 2 t=0 )=CH 1 2 f(a 1 t=1 )=SH

Iterative classification http://eliassi.org/papers/ai-mag-tr08.pdf

Conclusion Scalable graph mining and learning is challenging and research in this domain is still quite new Parallel graph mining –MapReduce vs. Vertex centric vs. Block centric –Strategies that work for single machine case can be readily integrated (compression, indexing, incremental evaluation…) –Message passing-based models with good locality e.g., loopyBF and Iterative classification fits vertex-centric models –Mining tasks requiring global information can be (arguably) more effectively implemented over fragment-centric/block- centric models (esp. bottleneck is message and supersteps) Parallelizing ML/mining tasks as a tuple 47

CPT-S 580-06 Advanced Databases 11 Yinghui Wu EME 49.

Similar presentations

Presentation on theme: "CPT-S 580-06 Advanced Databases 11 Yinghui Wu EME 49."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CPT-S 580-06 Advanced Databases 11 Yinghui Wu EME 49.

Similar presentations

Presentation on theme: "CPT-S 580-06 Advanced Databases 11 Yinghui Wu EME 49."— Presentation transcript:

Similar presentations

About project

Feedback