Ahsanul Haque *, Swarup Chandra *, Latifur Khan * and Michael Baron + * Department of Computer Science, University of Texas at Dallas + Department of Mathematical Sciences, University of Texas at Dallas MapReduce Guided Approximate Inference Over Graphical Models
Agenda Brief overview on Inference techniques Problem Proposed Approaches Experiments Discussion
Graphical Models A probabilistic graphical model G is a collection of functions over a set of random variables. Generally represented as a network of nodes: Each node denoting a random variable (e.g., data feature). Each edge denotes relationship between two random variables. Two types of representations: Bayesian network is represented by directed graph. Markov network is represented by undirected graph.
Example Graphical Model Inference is needed to evaluate Probability of Evidence, Prior and Posterior Marginal, Most Probable Explanation (MPE) and Maximum a Posteriori (MAP) queries. Probability of Evidence needs to be evaluated in classification problems. AC (A,C) A B C D E F (A,C) (C,E) (D,F) (B,D) (C,D) (A,B) (E,F) Sample Factor:
Exact Inference Exact Inference algorithms, e.g., Variable Elimination provide accurate results for Probability of Evidence. Challenges: Exponential time and space complexity. Computationally intractable on large graphs. Approximate Inference algorithms are used widely in practice to evaluate queries within resource limit. Sampling based, e.g., Gibbs Sampling, Importance Sampling. Propagation based, e.g., Iterative Join Graph Propagation.
Adaptive Importance Sampling (AIS)
RB-AIS We focus on a special type of AIS in this paper, called Rao- Blackwellized Adaptive Importance Sampling (RB-AIS). In RB-AIS, a set of variables, X w ⊂ X \ X e (called w-cutset variables) are sampled. X w is chosen in such a way that Exact Inference over X \ X w, X e is tractable. Large |X w | results in quicker evaluation of query but more erroneous result. Small |X w | results in more accurate result but takes more time. Trade off! V. Gogate and R. Dechter, "Approximate inference algorithms for hybrid bayesian networks with discrete constraints." in UAI. AUAI Press, 2005, pp. 209–216.
RB-AIS : Steps Start Initial Q on X w Generate Samples Calculate Sample Weights Update Q and Z Converge? End Yes No
Problem Real world applications require good quality result within the time constraint. Typically, real world networks are large and complex (i.e., large tree width). For instance, if we want to model facebook users using graphical models, it will have billions of nodes in it! Even RB-AIS may run out of time to provide a quality estimate within the time limit. For instance, RB-AIS takes more than 6 hours to find out a single probability of evidence on a network having only 67 nodes and 271 factors.
Challenges To design a parallel and distributed approach for RB-AIS, following challenges need to be addressed: RB-AIS updates Q periodically. Since values of Q and Z at iteration i depends on those values at iteration i -1, a proper synchronization mechanism is needed. Distributing the task of sample generation on X w over the worker nodes.
Proposed Approaches We design and implement two MapReduce based approaches for distributed and parallel computation of inference queries using RB-AIS. Distributed Sampling in Mappers (DSM) Parallel sampling. Sequential weight calculation. Each MapReduce Job Unit(MJU) contains only one MapReduce Job. Distributed Weight Calculation in Reducers (DWCR) Parallel sampling. Parallel weight calculation. Each MapReduce Job Unit(MJU) contains two MapReduce Jobs.
Distributed Sampling in Mappers (DSM) Reducer 1 ( X 1, x 11, Q i [X 1 ] ) n ( X 1, x 1n, Q i [X 1 ] ) Shuffle and Sort: aggregate values by keys X1X1 Q i [x 1 ] Map 1 Input to i th MJU: X w, Q i X2X2 Q i [x 2 ]X3X3 Q i [x 3 ]XmXm Q i [x m ] Z 1 ( X 2, x 21, Q i [X 2 ] ) n ( X 2, x 2n, Q i [X 2 ] ) 1 ( X 3, x 31, Q i [X 3 ] ) n ( X 3, x 3n, Q i [X 3 ] ) 1 ( X m, x m1, Q i [X m ] ) n ( X m, x mn, Q i [X m ] ) s ( X 1, x 1s, Q[X 1 ] )( X 2, x 2s, Q[X 2 ] ) ( X 3, x 3s, Q[X 3 ] ) ( X m, x ms, Q[X m ] ) Update Z, and Q i to Q i+1 Z X1X1 Q i+1 [x 1 ]X2X2 Q i+1 [x 2 ]X3X3 Q i+1 [x 3 ]XmXm Q i+1 [x m ]Z Combine x 1s, x 2s …x ms to form x s, where s = {1,2…n} Map 2Map 3Map m
Distributed Weight Calculation in Reducers (DWCR) Input to i th MJU: X w, Q i Map 1 Input: X 1 ⊂ X w Output: Partial Samples x 1 ∈ X 1 Map 2 Input: X 1 ⊂ X w Output: Partial Samples x 2 ∈ X 2 Map m Input: X m ⊂ X w Output: Partial Samples x m ∈ X m Reducer Update Z, and Q i to Q i+1 Reducer 1 Combine partial Samples s: x i → x; i ∈ {1….m} Calculate weight Ψ x Reducer 2 Combine partial Samples s: x i → x; i ∈ {1….m} Calculate weight Ψ x Reducer r Combine partial Samples s: x i → x; i ∈ {1….m} Calculate weight Ψ x Map 1 Output Ψ x Map 2 Output Ψ x Map j Output Ψ x
Setup Performance Metrics: Speedup = T sq /T d T sq = Execution time of sequential approach. T d = Execution time of distributed approach. Scaleup = T s /T p T s = Execution time using single Machine. T p = Execution time using multiple Machines. Hadoop version data nodes, 1 name node. Each machine has 2.2GHz processor and 4 GB of RAM. Network Number of Nodes Number of Factors 54.wcsp [1] wcsp [1] wcsp [1] [1] "The probabilistic inference challenge (pic2011)," , last updated on
Speedup
Scaleup
Discussion Both of the approaches achieve substantial speedup and scaleup comparing with the sequential execution. DWCR has better speedup and scalability than DSM. Weight calculation is computationally more expensive than sample generation. DWCR does both parallel weight calculation and parallel sampling, so it outperforms DSM. Both of the approaches show similar accuracy to the sequential execution asymptotically.
