© 2007, Roman Schmidt Distributed Information Systems Laboratory Evergrow workshop, Jerusalem, IsraelFebruary 19, 2007 Efficient implementation of BP in.

Slides:



Advertisements
Similar presentations
Exact Inference. Inference Basic task for inference: – Compute a posterior distribution for some query variables given some observed evidence – Sum out.
Advertisements

P2P data retrieval DHT (Distributed Hash Tables) Partially based on Hellerstein’s presentation at VLDB2004.
Probabilistic Skyline Operator over Sliding Windows Wenjie Zhang University of New South Wales & NICTA, Australia Joint work: Xuemin Lin, Ying Zhang, Wei.
Scalable Content-Addressable Network Lintao Liu
1 Greedy Forwarding in Dynamic Scale-Free Networks Embedded in Hyperbolic Metric Spaces Dmitri Krioukov CAIDA/UCSD Joint work with F. Papadopoulos, M.
File Processing : Hash 2015, Spring Pusan National University Ki-Joune Li.
Exact Inference in Bayes Nets
M. Waldvogel, G. Varghese, J. Turner, B. Plattner Presenter: Shulin You UNIVERSITY OF MASSACHUSETTS, AMHERST – Department of Electrical and Computer Engineering.
IP Routing Lookups Scalable High Speed IP Routing Lookups.
PDPTA03, Las Vegas, June S-Chord: Using Symmetry to Improve Lookup Efficiency in Chord Valentin Mesaros 1, Bruno Carton 2, and Peter Van Roy 1 1.
Junction Trees And Belief Propagation. Junction Trees: Motivation What if we want to compute all marginals, not just one? Doing variable elimination for.
Discrete Optimization in Computer Vision Nikos Komodakis Ecole des Ponts ParisTech, LIGM Traitement de l’information et vision artificielle.
Dynamic Bayesian Networks (DBNs)
Search and Replication in Unstructured Peer-to-Peer Networks Pei Cao, Christine Lv., Edith Cohen, Kai Li and Scott Shenker ICS 2002.
Belief Propagation by Jakob Metzler. Outline Motivation Pearl’s BP Algorithm Turbo Codes Generalized Belief Propagation Free Energies.
Topology Generation Suat Mercan. 2 Outline Motivation Topology Characterization Levels of Topology Modeling Techniques Types of Topology Generators.
Common approach 1. Define space: assign random ID (160-bit) to each node and key 2. Define a metric topology in this space,  that is, the space of keys.
GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.
1 Graphical Models in Data Assimilation Problems Alexander Ihler UC Irvine Collaborators: Sergey Kirshner Andrew Robertson Padhraic Smyth.
IP Address Lookup for Internet Routers Using Balanced Binary Search with Prefix Vector Author: Hyesook Lim, Hyeong-gee Kim, Changhoon Publisher: IEEE TRANSACTIONS.
Internet Indirection Infrastructure Ion Stoica UC Berkeley.
Scalable and Distributed Similarity Search in Metric Spaces Michal Batko Claudio Gennaro Pavel Zezula.
A Scalable and Load-Balanced Lookup Protocol for High Performance Peer-to-Peer Distributed System Jerry Chou and Tai-Yi Huang Embedded & Operating System.
Graphical Models Lei Tang. Review of Graphical Models Directed Graph (DAG, Bayesian Network, Belief Network) Typically used to represent causal relationship.
Freddies: DHT-Based Adaptive Query Processing via Federated Eddies Ryan Huebsch Shawn Jeffery CS Peer-to-Peer Systems 12/9/03.
Aggregating Information in Peer-to-Peer Systems for Improved Join and Leave Distributed Computing Group Keno Albrecht Ruedi Arnold Michael Gähwiler Roger.
SCALLOP A Scalable and Load-Balanced Peer- to-Peer Lookup Protocol for High- Performance Distributed System Jerry Chou, Tai-Yi Huang & Kuang-Li Huang Embedded.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Improving Data Access in P2P Systems Karl Aberer and Magdalena Punceva Swiss Federal Institute of Technology Manfred Hauswirth and Roman Schmidt Technical.
P-Grid Presentation by Thierry Lopez P-Grid: A Self-organizing Structured P2P System Karl Aberer, Philippe Cudré-Mauroux, Anwitaman Datta, Zoran Despotovic,
Ecole Polytechnique Fédérale de Lausanne, Switzerland Efficient processing of XPath queries with structured overlay networks Gleb Skobeltsyn, Manfred Hauswirth,
Distributed Process Management1 Learning Objectives Distributed Scheduling Algorithms Coordinator Elections Orphan Processes.
Peer-to-peer file-sharing over mobile ad hoc networks Gang Ding and Bharat Bhargava Department of Computer Sciences Purdue University Pervasive Computing.
UCSC 1 Aman ShaikhICNP 2003 An Efficient Algorithm for OSPF Subnet Aggregation ICNP 2003 Aman Shaikh Dongmei Wang, Guangzhi Li, Jennifer Yates, Charles.
Roger ZimmermannCOMPSAC 2004, September 30 Spatial Data Query Support in Peer-to-Peer Systems Roger Zimmermann, Wei-Shinn Ku, and Haojun Wang Computer.
Effizientes Routing in P2P Netzwerken Chord: A Scalable Peer-to- peer Lookup Protocol for Internet Applications Dennis Schade.
Other Structured P2P Systems CAN, BATON Lecture 4 1.
Developing Analytical Framework to Measure Robustness of Peer-to-Peer Networks Niloy Ganguly.
Indexing data-oriented overlay networks September 1 st, 2005 Indexing data-oriented overlay networks Presented by: Anwitaman Datta Joint work with Karl.
IEEE P2P, Aachen, Germany, September Ad-hoc Limited Scale-Free Models for Unstructured Peer-to-Peer Networks Hasan Guclu
DaVinci: Dynamically Adaptive Virtual Networks for a Customized Internet Jennifer Rexford Princeton University With Jiayue He, Rui Zhang-Shen, Ying Li,
Presentation 1 By: Hitesh Chheda 2/2/2010. Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT Laboratory for Computer Science.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
Kaleidoscope – Adding Colors to Kademlia Gil Einziger, Roy Friedman, Eyal Kibbar Computer Science, Technion 1.
Scalable Content- Addressable Networks Prepared by Kuhan Paramsothy March 5, 2007.
The new protocol of freenet Taken from Ian Clarke and Oskar Sandberg (The Freenet Project)
Johannes Kepler University Linz Department of Business Informatics Data & Knowledge Engineering Altenberger Str. 69, 4040 Linz Austria/Europe
1. Efficient Peer-to-Peer Lookup Based on a Distributed Trie 2. Complex Queries in DHT-based Peer-to-Peer Networks Lintao Liu 5/21/2002.
Stefanos Antaris A Socio-Aware Decentralized Topology Construction Protocol Stefanos Antaris *, Despina Stasi *, Mikael Högqvist † George Pallis *, Marios.
1 30 November 2006 An Efficient Nearest Neighbor (NN) Algorithm for Peer-to-Peer (P2P) Settings Ahmed Sabbir Arif Graduate Student, York University.
Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:
Peer to Peer Network Design Discovery and Routing algorithms
STATE KEY LABORATORY OF NETWORKING & SWITCHING BEIJING UNIVERSITY OF POSTS AND TELECOMMUNICATAIONS A Semantic Peer-to- Peer Overlay for Web Services.
BATON A Balanced Tree Structure for Peer-to-Peer Networks H. V. Jagadish, Beng Chin Ooi, Quang Hieu Vu.
NCLAB 1 Supporting complex queries in a distributed manner without using DHT NodeWiz: Peer-to-Peer Resource Discovery for Grids Sujoy Basu, Sujata Banerjee,
P2P Content Search: Give the Web Back to the People Matthias Bender Sebastin Michel Peter Triantafillou Gerhard Weikum Christian Zimmer Mariam John CSE.
1 Relational Factor Graphs Lin Liao Joint work with Dieter Fox.
Efficient Semantic Web Service Discovery in Centralized and P2P Environments Dimitrios Skoutas 1,2 Dimitris Sacharidis.
Incrementally Improving Lookup Latency in Distributed Hash Table Systems Hui Zhang 1, Ashish Goel 2, Ramesh Govindan 1 1 University of Southern California.
Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications * CS587x Lecture Department of Computer Science Iowa State University *I. Stoica,
BAHIR DAR UNIVERSITY Institute of technology Faculty of Computing Department of information technology Msc program Distributed Database Article Review.
IP Routers – internal view
Accessing nearby copies of replicated objects
SKIP GRAPHS James Aspnes Gauri Shah SODA 2003.
DHT Routing Geometries and Chord
CSCI 5822 Probabilistic Models of Human and Machine Learning
Department of Computer Science University of York
Throughput-Optimal Broadcast in Dynamic Wireless Networks
2018, Spring Pusan National University Ki-Joune Li
A Semantic Peer-to-Peer Overlay for Web Services Discovery
Presentation transcript:

© 2007, Roman Schmidt Distributed Information Systems Laboratory Evergrow workshop, Jerusalem, IsraelFebruary 19, 2007 Efficient implementation of BP in P2P networks Roman Schmidt School of Computer and Communication Sciences Ecole Polytechnique Fédérale de Lausanne (EPFL) Evergrow Loopy Belief Propagation Algorithm and Applications Workshop Jerusalem, Israel, February 19-21, 2007

© 2007, Roman Schmidt Distributed Information Systems Laboratory Evergrow workshop, Jerusalem, Israel2February 19, 2007 Motivation Users share (correlated) data in P2P systems –currently mainly for retrieval –but correlations hold hidden knowledge Profit by correlations for new services –Distributed Knowledge Base (e.g., for software bugs) –Structure/cluster data (e.g., for better search results) –Recommendation system (e.g., for data annotation) –etc. Distributed Inference System on top of a P2P system Current focus and contribution –Message reduction

© 2007, Roman Schmidt Distributed Information Systems Laboratory Evergrow workshop, Jerusalem, Israel3February 19, 2007 Outline Motivation Basic Concepts –Belief Propagation –The P-Grid Overlay P2P Belief Propagation –Inference Architecture –The Relaxation Algorithm Evaluation Conclusions

© 2007, Roman Schmidt Distributed Information Systems Laboratory Evergrow workshop, Jerusalem, Israel4February 19, 2007 Belief Propagation Inference based on Bayesian networks –models dependencies between variables Iterative message-passing algorithm –compute marginal probabilities (“beliefs”) –provably efficient on trees, works for arbitrary networks

© 2007, Roman Schmidt Distributed Information Systems Laboratory Evergrow workshop, Jerusalem, Israel5February 19, 2007 The BP message-passing algorithm Sends messages across edges –2 messages per edge and iteration –if all messages from previous iteration were received Beliefs are updated per iteration –algorithm terminates if beliefs stabilize Messages are vectors –length corresponds to the number of node states Computation complexity grows exponentially with the number of states of nodes

© 2007, Roman Schmidt Distributed Information Systems Laboratory Evergrow workshop, Jerusalem, Israel6February 19, 2007 The P-Grid Overlay Peers are organized in a binary trie structure –one node for every common prefix –trie is only virtual (exists only via routing tables) –all nodes remain at the leaf-level (no hierarchy) Multiple peers per key space partition Multiple routing entries (random choice) –per routing table level Logarithmic search complexity –even for skewed data distributions

© 2007, Roman Schmidt Distributed Information Systems Laboratory Evergrow workshop, Jerusalem, Israel7February 19, 2007 P-Grid routing 00* 0* 01* 1* 10*11* A A F F B B C C D D E E 1* : C, D 01* : B Stores data with key prefix 00 1* : E 01* : B Stores data with key prefix 00 1* : C, D 00* : F Stores data with key prefix 01 0* : A, B 11* : E Stores data with key prefix 10 0* : A, F 11* : E Stores data with key prefix 10 0* : B, F 10* : D Stores data with key prefix 11 query for ‘100’ Keys resolved by longest prefix matching –Insures logarithmic search cost for skewed trees

© 2007, Roman Schmidt Distributed Information Systems Laboratory Evergrow workshop, Jerusalem, Israel8February 19, 2007 The Distributed Inference System P-Grid –Bug reports, metadata, tags, etc. –Bayesian network Variables (spread over P-Grid nodes) Dependencies between variables Distributive learning Belief Propagation –Distributed inference Message-passing algorithm Identified problem –high message cost

© 2007, Roman Schmidt Distributed Information Systems Laboratory Evergrow workshop, Jerusalem, Israel9February 19, 2007 Spring Relaxation Bayesian network as spring network –find minimum energy configuration (relax springs) –energy is proportional to the distance between P-Grid nodes –variables at the same node require no energy –optimal: all variables at one node (load balancing) Decentralized algorithm –nodes try to relax their springs –move correlated variables close to each other –optimally, at the same node (no physical message) –considering load distribution

© 2007, Roman Schmidt Distributed Information Systems Laboratory Evergrow workshop, Jerusalem, Israel10February 19, 2007 Spring relations in P-Grid 00* 0* 01* 1* 10*11* A A F F B B C C D D E E 1* : C, D 01* : B a -> h, t f -> o, r … 1* : E 01* : B a -> h, t f -> o, r … 1* : C, D 00* : F h -> a, m m -> h, u … 0* : A, B 11* : E o -> f r -> f, t … 0* : A, F 11* : E o -> f r -> f, t … 0* : B, F 10* : D t -> a, r u -> m …

© 2007, Roman Schmidt Distributed Information Systems Laboratory Evergrow workshop, Jerusalem, Israel11February 19, 2007 The Relaxation Algorithm (relax variables) currentLoad = length(localVars); overload = currentLoad - avgLoad / 2; IF (overload <= 0) return; ENDIF undirVars = variables having a tension only at one level; WHILE ((overload > 0) AND (length(unidirVars) > 0)) move variable to a peer from the level with the tension; removeFirst(unidirVars); overload = overload - 1; ENDWHILE …

© 2007, Roman Schmidt Distributed Information Systems Laboratory Evergrow workshop, Jerusalem, Israel12February 19, 2007 The Relaxation Algorithm (balance load) … multidirVars = vars having tensions at multiple levels; WHILE ((currentLoad > avgLoad) AND (length(multidirVars) > 0)) FOR i = routingTable.levels TO 1 IF (level i is underpopulated) cand = vars having a tension at level i; FOR j = 1 TO length(cand) IF (cand(j).tension(i) >= max(cand(j).tension)) move variable to a peer from level i; remove(multidirVars, cand(j)); currentLoad = currentLoad - 1; IF (currentLoad <= avgLoad) break; ENDIF; ENDIF; ENDFOR; ENDIF; ENDFOR; ENDWHILE

© 2007, Roman Schmidt Distributed Information Systems Laboratory Evergrow workshop, Jerusalem, Israel13February 19, 2007 Algorithm execution Executed at each node –Iteratively –Independently (evaluated simultaneous) Termination –Max. number of iterations –No free or multi-directional variables to move –No tension reduction in last two iterations Effort –Variable movements require only 1 message –Trade-off to message reduction –Dynamic variables require “remote” updates

© 2007, Roman Schmidt Distributed Information Systems Laboratory Evergrow workshop, Jerusalem, Israel14February 19, 2007 Evaluation Matlab implementation Diverse Bayesian networks –random, binary trees, scale-free –up to 2048 Bayesian nodes –up to 512 P-Grid nodes 10 repetitions 2 main evaluation criterions –message reduction –load balance

© 2007, Roman Schmidt Distributed Information Systems Laboratory Evergrow workshop, Jerusalem, Israel15February 19, 2007 Random network 1024 nodes, average node degree 4

© 2007, Roman Schmidt Distributed Information Systems Laboratory Evergrow workshop, Jerusalem, Israel16February 19, 2007 Binary tree network 1023 nodes

© 2007, Roman Schmidt Distributed Information Systems Laboratory Evergrow workshop, Jerusalem, Israel17February 19, 2007 Scale-free network 1024 nodes, average node degree 4

© 2007, Roman Schmidt Distributed Information Systems Laboratory Evergrow workshop, Jerusalem, Israel18February 19, 2007 Message reduction (random) 128 / 2048 / / 2048 / 4512 / 2048 / 4 64 / 2048 / 4

© 2007, Roman Schmidt Distributed Information Systems Laboratory Evergrow workshop, Jerusalem, Israel19February 19, 2007 Message reduction (binary tree) 64 / / / / 2047

© 2007, Roman Schmidt Distributed Information Systems Laboratory Evergrow workshop, Jerusalem, Israel20February 19, 2007 Message reduction (scale-free) 64 / / / / 2048

© 2007, Roman Schmidt Distributed Information Systems Laboratory Evergrow workshop, Jerusalem, Israel21February 19, 2007 Load balancing (random) 128 / 2048 / / 2048 / / 2048 / 4 64 / 2048 / 4

© 2007, Roman Schmidt Distributed Information Systems Laboratory Evergrow workshop, Jerusalem, Israel22February 19, 2007 Load balancing (binary tree) 64 / / / / 2047

© 2007, Roman Schmidt Distributed Information Systems Laboratory Evergrow workshop, Jerusalem, Israel23February 19, 2007 Load balancing (scale-free) 64 / / / 2048

© 2007, Roman Schmidt Distributed Information Systems Laboratory Evergrow workshop, Jerusalem, Israel24February 19, 2007 Number of iterations Till relaxation algorithm termination Scale-free network (128 nodes, 1024 vars) 100 runs

© 2007, Roman Schmidt Distributed Information Systems Laboratory Evergrow workshop, Jerusalem, Israel25February 19, 2007 Reduction effort (random) 128 / 2048 / / 2048 / 4512 / 2048 / 4 64 / 2048 / 4

© 2007, Roman Schmidt Distributed Information Systems Laboratory Evergrow workshop, Jerusalem, Israel26February 19, 2007 Reduction effort (binary tree) 64 / / / / 2047

© 2007, Roman Schmidt Distributed Information Systems Laboratory Evergrow workshop, Jerusalem, Israel27February 19, 2007 Reduction effort (scale-free) 64 / / / 2048

© 2007, Roman Schmidt Distributed Information Systems Laboratory Evergrow workshop, Jerusalem, Israel28February 19, 2007 Conclusions Decentralized relaxation algorithm –Reduces message cost for Belief Propagation –Considers load balance Several scenarios (Distributed Knowledge Base) First evaluation looks promising Intermediate steps are still missing –Learning of Bayesian network

© 2007, Roman Schmidt Distributed Information Systems Laboratory Evergrow workshop, Jerusalem, IsraelFebruary 19, 2007 Thank you! Questions?