Graphical Models and Overlay Networks for Reasoning about Large Distributed Systems Thesis committee Carlos Guestrin, chair Geoff Gordon Sanjiv Singh Joseph.

Graphical Models and Overlay Networks for Reasoning about Large Distributed Systems Thesis committee Carlos Guestrin, chair Geoff Gordon Sanjiv Singh Joseph Hellerstein, UC Berkeley Stanislav Funiak Thesis Defense Carnegie Mellon

2 Distributed systems are abundant Peer-to-peer networks intelligent environment Wireless sensor networksLarge-scale modular robots

3 Distributed systems need to reason under uncertainty intelligent environment Place wireless cameras around an environment Want to determine the locations automatically ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ? local noisy observation hidden state

4 Simultaneous localization and tracking camera location object trajectory Localize cameras by letting them track a moving object noisy observations

5 Distributed reasoning under uncertainty Benefits: no centralized point of failure lower communication cost lower memory requirements able to do distributed control Challenges: decentralized data and model coordination scalability robustness to network fluctuations Nodes need to reason under uncertainty in a distributed manner

6 Distributed reasoning needs to scale memory per node number of nodes 10 1 10 2 10 3 10 4 10 5 10 6 cannot store entire model in memory need to coordinate a very large number of nodes many variables to reason about modular robots peer-to-peer networks sensor networks 1kB1MB1GB

7 Distributed reasoning needs to be robust failed node network partition varying link quality

8 A three-layer framework Overlay network (distributed data structure): robust to network failures Graphical model: efficient reasoning physical network communication links subset of the communication links interface local interactions [Paskin et al, IPSN05] 1

9 Thesis Statement In many applications, devices need to integrate uncertain observations from across the network. By designing distributed algorithms that build upon graphical models and overlay networks, one can obtain scalable, robust, and accurate solutions.

10 Thesis overview Camera networks: reasoning about a dynamic system robustness to network partitions Localization in modular robots: scaling to very large systems Recommendations in a peer-to-peer network robustness to nodes entering or leaving the network Chapters 3,4 Chapter 5 Chapter 6

11 Simultaneous localization and tracking camera location object trajectory Localize cameras by letting them track a moving object noisy observations

12 Model: Dynamic Bayesian Network Transition model: t-1 t image Observation model: t=0 Object position M (0) Camera locations L1L1 L2L2 Z 1 (0) Z 1 (1) Z 1 (2) Z 2 (0) t=1t=2 M (1) M (2) Observed variables (images) latest stateobservation history Filtering: compute the conditional distribution

13 Filtering roll-up prediction prior distribution estimation posterior distribution

14 The sparsity of prior / posterior distribution tt + 1 Transition model introduces dependences among distant cameras  exact inference expensive or intractable 12 3 4 56 7 8 t=0t=1t=2 DBN: sparse Prior/posterior distribution: sparse? time t

15 12 3 4 56 7 8 L 1, L 2, M (t) L 3, L 4, M (t) L 4, L 5, M (t) L 5, L 6, M (t) L 6, L 7, M (t) L 7, L 8, M (t) L 2, L 3, M (t) Assumed density filtering L 1, L 2 L 2, L 3 L 3, L 4 L 4, L 5 L 5, L 6 L 6, L 7 L 7, L 8 Clique: tightly interacting variables Camera and its neighbor Object location at current step Intuition: only capture strong dependences among variables [Boyen and Koller 1998] Junction tree: Clique updates

16 Distributed filtering: The wrong way L 1, L 2, M (t) L 2, L 3, M (t) L 3, L 4, M (t) L 4, L 5, M (t) 1 3 2 5 4 L 1, L 2, M (t) L 2, L 3, M (t) L 3, L 4, M (t) L 4, L 5, M (t) 1.Map each clique onto some node 2.Connect them the same way 3.Run the centralized Boyen-Koller Connection interrupted? Node lost? … physical network graphical model ≠

17 Distributed filtering: Our approach L 1, L 2, M (t) L 2, L 3, M (t) L 3, L 4, M (t) L 4, L 5, M (t) 1 3 2 5 4 L 1, L 2, M (t) L 2, L 3, M (t) L 3, L 4, M (t) L 4, L 5, M (t) 1.Nodes form a routing tree over strong links 2.Each node associated with variables 3.Ensure flow of information Build a distributed data structure: L 2, L 3, M (t) L 4, L 5, M (t) L 3, L 4, M (t) L 1, L 2, M (t) L 2, L 3, L 4, M (t)  Network junction tree (overlay network) determines the communication pattern External junction tree  (graphical model) represents the distribution ?

18 The three-layer framework revisited L 1, L 2, M (t) L 2, L 3, M (t) L 3, L 4, M (t) L 4, L 5, M (t) Graphical model: external junction tree Physical network: ad-hoc wireless network Overlay network: network junction tree L 2, L 3, M (t) L 4, L 5, M (t) L 1, L 2, M (t) L 2, L 3, L 4, M (t) interface

19 Distributed filtering: Algorithm 1 3 2 5 4 L 1, L 2, M (t) L 2, L 3, M (t) L 3, L 4, M (t) L 4, L 5, M (t) At time t: Each node starts with a prior over its clique Nodes make new observations Nodes condition on each others’ observations Instance of robust distributed inference [Paskin & Guestrin, UAI’04] cliquepast observations At convergence, each node conditions on all observations in the network. At time t: 4.Advance the clique marginal to the next time step, analogous to [Boyen&Koller 1998] With enough communication at each t and in the absence of partitions, our solution is the same as centralized BK98 [Funiak et al, NIPS 2009]

20 Results: Convergence RMS error better centralized solution 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 35101520 Iterations per time step

21 Partitions introduce inconsistencies network partition 123 L 1, L 2, M (t) Before partition: After partition: = ≠ L 2, L 3, M (t) Node 1:Node 2:Node 1:Node 2:

22 Resolving inconsistencies For consistent marginals: no ambiguity = But what do we do for inconsistent marginals? ≠ We could choose : root marginal conditional distribution L 1, L 2 L 2, L 3 L 1, L 2 L 2, L 3 External junction tree:

23 We wish to minimize the entropy with different roots r in order to get the most certain estimates What is the best root? Choose a root that leads to the most informative distribution entropy of < L 3, L 4 L 1, L 2 L 2, L 3 L 1, L 2 L 3, L 4 more informative (better root) less informative (worse root)

24 If, then j is better root than i Computing the best root External junction tree: Observation: Entropy difference can be computed locally entropies of the inconsistent marginals over the shared variables L 1, L 2 L 2, L 3 Optimized conditional alignment: dynamic programming algorithm – local computation – no global synchronization – easy to distribute [Funiak et al, NIPS 2009]

25 Results: Partition progressively partition the communication graph better number of partition components omniscient best omniscient worst a better alignment, see thesis Optimized conditional alignment

26 Distributed filtering: Summary Scaling: Robustness: Coordination: transition model introduces dependencies tt + 1 junction trees, assume density filtering partitions introduce inconsistencies alignment L 3, L 4, M (t) L 1, L 2, M (t) L 2, L 3, M (t) L 2, L 3, L 4, M (t) communicate over strong links, preserve flow of information network junction tree ≠

27 Thesis overview Camera networks: reasoning about a dynamic system robustness to network partitions Localization in modular robots: scaling to very large systems Recommendations in a peer-to-peer network robustness to nodes entering or leaving the network

28 Localization in Large-Scale Modular Robots estimate of the shape ground truth connectivity noisy pairwise observations of relative translation and rotation 120˚ id: 305 each node recovers its own pose

29 Existing methods spanning tree Compute locations incrementally using observations along a spanning tree [Reshko 2004, Pillai et al. IROS 2006] Works for perfect sensors or perfect lattice Distributes easily g ground truthestimate

30 Model: Markov network Probabilistic model: Markov network Goal: maximize likelihood Observation model: predicted center of module j in the global coordinate frame module i module j all poses all observations poses of adjacent modules

31 convergence Simple solution: gradient ascent With bad initialization, convergence very slow; may get stuck in local optima greedy initialization hypothesized optimum gradient ascent Easy to implement, distributes naturally

32 Solve the subproblems 2 A partitioning heuristic for localization Partition the problem 1 Merge the partial solutions 3 Observation: If we treat the partial solutions as two rigid bodies, the likelihood can be maximized exactly [Funiak et al, RSS 08; Funiak et al IJRR 09]

34 Accuracy Classical MDS Regularized SDP Incremental Tree optimization [Grisetti et al, 2007] Our solution RMS error [module radii] better 14 12 10 8 6 4 2 0

35 Distributing the rigid alignment leader Observation: The centralized algorithm only requires Aggregate the statistics to a leader, let it do the rest O(1) work

36 Distributing the rigid alignment Level 1: Level 2: leader 1.Form spanning trees over pairs of components 2.Compute partial sums towards the leaders 3.Disseminate the result and return leader Using multiple overlay networks, able to perform the alignments in parallel

37 Communication complexity better

38 Modular robot localization: Summary Scaling: Robustness: Coordination: slow convergence, local optima graph partitioning (assume stationary modules) aggregate information parallel spanning trees root

39 Thesis overview Camera networks: reasoning about a dynamic system robustness to network partitions Localization in modular robots: scaling to very large systems Recommendations in a peer-to-peer network robustness to nodes entering or leaving the network

40 Distributed collaborative filtering  Suggestion: Challenges: the ratings do not reside in a single location nodes enter and leave the network finite bandwidth

41 Latent factors level of action kid friendliness Latent factor: a vector of parameters that describe the movie IMDB tells you: Action, PG-13You want: 0.57 action, 0.12 kid-friendly

42 latent factor for movie i latent factor for user u Centralized approach: Matrix factorization systematic tendencies (biases) affinity of u for i (solved iteratively, sequential algorithm) Training: minimize the difference between predicted and observed ratings qiqi users pupu movies observed rating r u,i Prediction:

43 Simple parallel algorithm Partition the users across the nodes users movies 1.nodes start with the same parameter estimates 2.update the estimates locally using the centralized algorithm using its local data 3.average the movie estimates Iterate: each node carries parameters for its users and all movies 1 2 3 4 1 2 3 4

44 Speedup of the parallel algorithm instantaneous communication “optimal” actual

45 Communication complexity number of servers 2 nodes little communication a lot of data/node 1 user/node little data / node lot of communication communicates dense vectors communicates sparse vectors

46 Distributed collaborative filtering  1.Select a small number of super-peers, the rest are clients 2.Each client connects to one super-peer and uploads its ratings Challenges: assign clients to super-peers? super-peer leaves? estimate the parameters in this setting? Approach:

47 Distributed hash table (DHT) An overlay network: look up which super-peer associated with a key 0 128 192 50 100 super-peers Chord DHT: [Stoica et al 2001] 16 Properties: mapping is stable mapping is load-balanced lookups take O(log N) steps updated mapping

48 The three-layer framework revisited Physical network: Internet Overlay network: Distributed hash table interface Graphical model: matrix factorization model

49 Distributed matrix factorization 0 128 100 192 1.Assign clients to super-peers based on ids 2.Clients upload their ratings to super-peers 3.Spanning tree may not be stable  use a distributed averaging protocol [Kempe et al., 2003; Boyd et al., 2006]  DHT generates random neighbors user parameters movie parameters uploaded ratings

50 Experiments Netflix dataset:  480 000 users, 17 770 movies > 100M ratings: 99% training, 1% test (but also online experiments) PlanetLab testbed: >1000 nodes worldwide realistic network conditions selected 100 super-peers manually

51 Convergence better 100 super-peers 1 simulated year distributed matrix factorization distributed Restricted Boltzmann Machines based on [Salakhutdinov et al 2007] convergence with <1GB over 25 simulated days

52 Testing unstable network 0100200300400 0.9 0.95 1 1.05 1.1 iteration RMS error baseline solution with replication on the client start with 100 nodes, progressively disconnect them node disconnected beyond our control

53 Distributed collaborative filtering: Summary Scaling: Robustness: Coordination: speedup levels off super-peer architecture assign clients to super-peers, sampling in distributed averaging distributed hash table nodes may enter and leave the network distributed averaging

54 Thesis summary Nodes in distributed systems need to integrate uncertain observations from across the network:  They must do so scalably and robustly: camera networksmodular robotsdist. recommender system memory [kB] 10 1 10 2 10 3 10 4 10 5 10 6 number of nodes 10 1 10 2 10 3 10 4 10 5 10 6 10 7

55 Thesis Summary Approximate filtering using junction trees converges to the centralized solution addresses the belief inconsistency problem camera belief modeling (Chapter 4) L 1, L 2, M (t) L 2, L 3, M (t) L 3, L 4, M (t) Localization of large-scale modular robots effective partitioning heuristic scalable algorithm with parallel spanning trees concise implementation (Chapter 5) Collaborative filtering in P2P networks simple but effective parallel algorithm robust super-peer architecture with DHTs Using graphical models and overlay network, we designed solutions to one general problem and three applications:

56 Lessons learned Using overlay networks, a distributed algorithm can reason about the graphical model even if model ≠ physical network A scalable algorithm can shift a part of its computation to a subset of the nodes. A robust algorithm may need to address inconsistencies that arise when nodes reason about overlapping sets of variables ≠

57 Thanks! Thanks to my coauthors: Mark Paskin, Rahul Sukthankar, Babu Pillai, Michael Ashley-Rollman, Jason Campbell, Seth Copen Goldstein

Graphical Models and Overlay Networks for Reasoning about Large Distributed Systems Thesis committee Carlos Guestrin, chair Geoff Gordon Sanjiv Singh Joseph.

Similar presentations

Presentation on theme: "Graphical Models and Overlay Networks for Reasoning about Large Distributed Systems Thesis committee Carlos Guestrin, chair Geoff Gordon Sanjiv Singh Joseph."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Graphical Models and Overlay Networks for Reasoning about Large Distributed Systems Thesis committee Carlos Guestrin, chair Geoff Gordon Sanjiv Singh Joseph.

Similar presentations

Presentation on theme: "Graphical Models and Overlay Networks for Reasoning about Large Distributed Systems Thesis committee Carlos Guestrin, chair Geoff Gordon Sanjiv Singh Joseph."— Presentation transcript:

Similar presentations

About project

Feedback