Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ahsanul Haque *, Swarup Chandra *, Latifur Khan * and Charu Aggarwal + * Department of Computer Science, University of Texas at Dallas + IBM T. J. Watson.

Similar presentations


Presentation on theme: "Ahsanul Haque *, Swarup Chandra *, Latifur Khan * and Charu Aggarwal + * Department of Computer Science, University of Texas at Dallas + IBM T. J. Watson."— Presentation transcript:

1 Ahsanul Haque *, Swarup Chandra *, Latifur Khan * and Charu Aggarwal + * Department of Computer Science, University of Texas at Dallas + IBM T. J. Watson Research Center, Yorktown NY, USA Distributed Adaptive Importance Sampling on Graphical Models using MapReduce This material is based upon work supported by University Of Texas at Dallas

2 2 Agenda University Of Texas at Dallas Brief overview on Inference techniques Problem Proposed Approaches Experiments Discussion

3 3 Agenda University Of Texas at Dallas Brief overview on Inference techniques Problem Proposed Approaches Experiments Discussion

4 4 Graphical Models University Of Texas at Dallas A probabilistic graphical model G is a collection of functions over a set of random variables. Generally represented as a network of nodes: Each node denoting a random variable (e.g., data feature). Each edge denotes relationship between two random variables. Two types of representations: Bayesian network is represented by directed graph. Markov network is represented by undirected graph.

5 5 Example Graphical Model University Of Texas at Dallas Inference is needed to evaluate Probability of Evidence, Prior and Posterior Marginal, Most Probable Explanation (MPE) and Maximum a Posteriori (MAP) queries. Probability of Evidence needs to be evaluated in classification problems. AC  (A,C) 005 01100 1015 1120 A B C D E F  (A,C)  (C,E)  (D,F)  (B,D)  (C,D)  (A,B)  (E,F) Sample Factor:

6 6 Exact Inference University Of Texas at Dallas Exact Inference algorithms, e.g., Variable Elimination provide accurate results for Probability of Evidence. Challenges: Exponential time and space complexity. Computationally intractable on large graphs. Approximate Inference algorithms are used widely in practice to evaluate queries within resource limit. Sampling based, e.g., Gibbs Sampling, Importance Sampling. Propagation based, e.g., Iterative Join Graph Propagation.

7 7 Adaptive Importance Sampling (AIS) University Of Texas at Dallas

8 8 RB-AIS University Of Texas at Dallas We focus on a special type of AIS in this paper, called Rao- Blackwellized Adaptive Importance Sampling (RB-AIS). In RB-AIS, a set of variables, X w ⊂ X \ X e (called w-cutset variables) are sampled. X w is chosen in such a way that Exact Inference over X \ X w, X e is tractable. Large |X w | results in quicker evaluation of query but more erroneous result. Small |X w | results in more accurate result but takes more time. Trade off! V. Gogate and R. Dechter, “Approximate inference algorithms for hybrid bayesian networks with discrete constraints.” in UAI. AUAI Press, 2005, pp. 209–216.

9 9 RB-AIS : Steps University Of Texas at Dallas Start Initial Q on X w Generate Samples Calculate Sample Weights Update Q and Z Converge? End Yes No

10 10 Agenda University Of Texas at Dallas Brief overview on Inference techniques Problem Proposed Approaches Experiments Discussion

11 11 Problem University Of Texas at Dallas Real world applications require good quality result within the time constraint. Typically, real world networks are large and complex (i.e., large tree width). For instance, if we want to model facebook users using graphical models, it will have billions of nodes in it! Even RB-AIS may run out of time to provide a quality estimate within the time limit. For instance, RB-AIS takes more than 6 hours to find out a single probability of evidence on a network having only 67 nodes and 271 factors.

12 12 Agenda University Of Texas at Dallas Brief overview on Inference techniques Problem Proposed Approaches Experiments Discussion

13 13 Challenges University Of Texas at Dallas To design a parallel and distributed approach for RB-AIS, following challenges need to be addressed: RB-AIS updates Q periodically. Since values of Q and Z at iteration i depends on those values at iteration i -1, a proper synchronization mechanism is needed. Distributing the task of sample generation on X w over the worker nodes.

14 14 Proposed Approaches University Of Texas at Dallas We design and implement two MapReduce based approaches for distributed and parallel computation of inference queries using RB-AIS. Distributed Sampling in Mappers (DSM) Parallel sampling. Sequential weight calculation. Distributed Weight Calculation in Mappers (DWCM) Sequential sampling Parallel weight calculation.

15 15 Distributed Sampling in Mappers (DSM) University Of Texas at Dallas Reducer 1 ( X 1, x 11, Q i [X 1 ] ) n ( X 1, x 1n, Q i [X 1 ] ) Shuffle and Sort: aggregate values by keys X1X1 Q i [x 1 ] Map 1 Input to i th MR job: X w, Q i X2X2 Q i [x 2 ]X3X3 Q i [x 3 ]XmXm Q i [x m ] Z 1 ( X 2, x 21, Q i [X 2 ] ) n ( X 2, x 2n, Q i [X 2 ] ) 1 ( X 3, x 31, Q i [X 3 ] ) n ( X 3, x 3n, Q i [X 3 ] ) 1 ( X m, x m1, Q i [X m ] ) n ( X m, x mn, Q i [X m ] ) s ( X 1, x 1s, Q[X 1 ] )( X 2, x 2s, Q[X 2 ] ) ( X 3, x 3s, Q[X 3 ] ) ( X m, x ms, Q[X m ] ) Update Z, and Q i to Q i+1 Z X1X1 Q i+1 [x 1 ]X2X2 Q i+1 [x 2 ]X3X3 Q i+1 [x 3 ]XmXm Q i+1 [x m ]Z Combine x 1s, x 2s …x ms to form x s, where s = {1,2…n} Map 2Map 3Map m

16 16 Distributed Weight Calculation in Mappers (DWCM) University Of Texas at Dallas Reducer wv Shuffle and Sort: aggregate values by keys x1x1 Q i [X w =x 1 ] Map 1 Input to i th MR job: X w, List[x] Z wv Update Z and Q i to Q i+1 Z Map 2Map 3Map n x2x2 Q i [X w =x 2 ]x3x3 Q i [X w =x 3 ]xnxn Q i [X w =x n ] wv x1x1 Q i+1 [X w =x 1 ] Z x2x2 Q i+1 [X w =x 2 ]x3x3 Q i+1 [X w =x 3 ]xnxn Q i [X w =x n ]

17 17 Agenda University Of Texas at Dallas Brief overview on Inference techniques Problem Proposed Approaches Experiments Discussion

18 18 Setup University Of Texas at Dallas Performance Metrics: Speedup = T sq /T d T sq = Execution time of sequential approach. T d = Execution time of distributed approach. Scaleup = T s /T p T s = Execution time using single Mapper. T p = Execution time using multiple Mappers. Hadoop version 1.2.1. 8 data nodes, 1 name node. Each machine has 2.2GHz processor and 4 GB of RAM. Network Number of Nodes Number of Factors 54.wcsp [1] 67271 29.wcsp [1] 82462 404.wcsp [1] 100710 [1] “The probabilistic inference challenge (pic2011),” http://www.cs.huji.ac.il/project/PASCAL/showNet.php, 2011, last updated on 10.23.2014.

19 19 Speedup University Of Texas at Dallas

20 20 Scaleup University Of Texas at Dallas

21 21 Discussion University Of Texas at Dallas Both of the approaches achieve substantial speedup and scaleup comparing with the sequential execution. DWCM has better speedup and scalability than DSM. Weight calculation is computationally more expensive than sample generation. DWCM does parallel weight calculation, so it outperforms DSM. Both of the approaches show similar accuracy to the sequential execution asymptotically.

22 22 University Of Texas at Dallas Questions?


Download ppt "Ahsanul Haque *, Swarup Chandra *, Latifur Khan * and Charu Aggarwal + * Department of Computer Science, University of Texas at Dallas + IBM T. J. Watson."

Similar presentations


Ads by Google