Industrial project 234313 Sergey Semenko & Ivan Nesmeyanov Under supervision of Eliezer Levy Erlang routing mesh overview and implementation details.

Slides:



Advertisements
Similar presentations
Network II.5 simulator ..
Advertisements

Zhongxing Telecom Pakistan (Pvt.) Ltd
Web Service Ahmed Gamal Ahmed Nile University Bioinformatics Group
CS425 /CSE424/ECE428 – Distributed Systems – Fall 2011 Material derived from slides by I. Gupta, M. Harandi, J. Hou, S. Mitra, K. Nahrstedt, N. Vaidya.
1 Transactions and Web Services. 2 Web Environment Web Service activities form a unit of work, but ACID properties are not always appropriate since Web.
Fanorona Game Manager James Andariese Jeremiah Lewis Matt Rykaczewski.
Developers: Alexey Rastvortsev, Ilya Kolchinsky Supervisors: Roy Friedman, Alex Kogan.
Apr 2, 2002Mårten Trolin1 Previous lecture On the assignment Certificates and key management –Obtaining a certificate –Verifying a certificate –Certificate.
1 Complexity of Network Synchronization Raeda Naamnieh.
A New Approach for the Construction of ALM Trees using Layered Coding Yohei Okada, Masato Oguro, Jiro Katto Sakae Okubo International Conference on Autonomic.
Rheeve: A Plug-n-Play Peer- to-Peer Computing Platform Wang-kee Poon and Jiannong Cao Department of Computing, The Hong Kong Polytechnic University ICDCSW.
Systems of Distributed Systems Module 2 -Distributed algorithms Teaching unit 3 – Advanced algorithms Ernesto Damiani University of Bozen Lesson 6 – Two.
Group Communications Group communication: one source process sending a message to a group of processes: Destination is a group rather than a single process.
CS 582 / CMPE 481 Distributed Systems
CS533 - Concepts of Operating Systems 1 Remote Procedure Calls - Alan West.
Erlang concurrency. Where were we? Finished talking about sequential Erlang Left with two questions  retry – not an issue; I mis-read the statement in.
1 Fault-Tolerant Consensus. 2 Failures in Distributed Systems Link failure: A link fails and remains inactive; the network may get partitioned Crash:
Distributed Systems 2006 Group Membership * *With material adapted from Ken Birman.
Objectives The key roles an architecture description plays in a software project. The key roles an architecture description plays in a software project.
Design and Implementation of a Server Director Project for the LCCN Lab at the Technion.
1 More on Distributed Coordination. 2 Who’s in charge? Let’s have an Election. Many algorithms require a coordinator. What happens when the coordinator.
P2P Course, Structured systems 1 Introduction (26/10/05)
Lecture 12 Synchronization. EECE 411: Design of Distributed Software Applications Summary so far … A distributed system is: a collection of independent.
DOMAIN NAME SYSTEM. Introduction  There are several applications that follow client server paradigm.  The client/server programs can be divided into.
Asynchronous Web Services Approach Enrique de Andrés Saiz.
Overview SAP Basis Functions. SAP Technical Overview Learning Objectives What the Basis system is How does SAP handle a transaction request Differentiating.
Presented by: Alvaro Llanos E.  Motivation and Overview  Frangipani Architecture overview  Similar DFS  PETAL: Distributed virtual disks ◦ Overview.
DIANE Overview Germán Carrera, Alfredo Solano (CNB/CSIC) EMBRACE COURSE Monday 19th of February to Friday 23th. CNB-CSIC Madrid.
LECTURE 9 CT1303 LAN. LAN DEVICES Network: Nodes: Service units: PC Interface processing Modules: it doesn’t generate data, but just it process it and.
1 Telematica di Base Applicazioni P2P. 2 The Peer-to-Peer System Architecture  peer-to-peer is a network architecture where computer resources and services.
1 © NOKIA Web Service Reliability NOKIA. 2 © NOKIA Content What is reliability ? Guaranteed Delivery Duplicate Elimination Ordering Crash tolerance State.
1 © NOKIA 1999 FILENAMs.PPT/ DATE / NN SIP Service Architecture Markus Isomäki Nokia Research Center.
Hybrid Overlay Multicast Framework draft-irtf-sam-hybrid-overlay-framework-02.txt John Buford, Avaya Labs Research IETF 71.
Web Services Description Language CS409 Application Services Even Semester 2007.
Ashley Hawley. Project Description Business Need User Profiles Development Technology Testing Plan Deliverables Demonstration Conclusion.
Application code Registry 1 Alignment of R-GMA with developments in the Open Grid Services Architecture (OGSA) is advancing. The existing Servlets and.
Fast Handoff for Seamless wireless mesh Networks Yair Amir, Clauiu Danilov, Michael Hilsdale Mobisys’ Jeon, Seung-woo.
1 WS-Routing. 2 Why WS-Routing? SOAP (by itself) doesn’t define a message path –Header blocks describe functions to be performed by intermediaries that.
SOA-based Collaborative Authoring Andrew Roczniak Multimedia Research Lab University of Ottawa.
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
 Distributed file systems having transaction facility need to support distributed transaction service.  A distributed transaction service is an extension.
Paxos: Agreement for Replicated State Machines Brad Karp UCL Computer Science CS GZ03 / M st, 23 rd October, 2008.
Replication (1). Topics r Why Replication? r System Model r Consistency Models – How do we reason about the consistency of the “global state”? m Data-centric.
Re-Configurable Byzantine Quorum System Lei Kong S. Arun Mustaque Ahamad Doug Blough.
EEC 688/788 Secure and Dependable Computing Lecture 10 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
DEADLOCK DETECTION ALGORITHMS IN DISTRIBUTED SYSTEMS
Building Dependable Distributed Systems, Copyright Wenbing Zhao
Peer to Peer Network Design Discovery and Routing algorithms
5. The Transport Layer 5.1 Role of Transport Layer It bridge the gab between applications and the network layer. Provides reliable cost-effective data.
Rehab AlFallaj.  Network:  Nodes: Service units: PC Interface processing Modules: it doesn’t generate data, but just it process it and do specific task.
Fusion Design Overview Object Interaction Graph Visibility Graph Class Descriptions Inheritance Graphs Fusion: Design The overall goal of Design is to.
Fault Tolerance (2). Topics r Reliable Group Communication.
The Raft Consensus Algorithm Diego Ongaro and John Ousterhout Stanford University.
ZOOKEEPER. CONTENTS ZooKeeper Overview ZooKeeper Basics ZooKeeper Architecture Getting Started with ZooKeeper.
Topologies and behavioral properties of the network Yvon Kermarrec Based on tml.
EEC 688/788 Secure and Dependable Computing Lecture 10 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Making Fault-Tolerance a Reality `
Distributed Systems – Paxos
CHAPTER 3 Architectures for Distributed Systems
Ivy Eva Wu.
CT1303 LAN Rehab AlFallaj.
Replication Middleware for Cloud Based Storage Service
CCNA 3 v3 JEOPARDY Module 8 CCNA3 v3 Module 8 K. Martin.
Initial job submission and monitoring efforts with JClarens
EEC 688/788 Secure and Dependable Computing
Introduction to Web Services
P1 : Distributed Bitcoin Miner
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Presentation transcript:

Industrial project Sergey Semenko & Ivan Nesmeyanov Under supervision of Eliezer Levy Erlang routing mesh overview and implementation details.

W W W W W W NM Virtual machine Node structure. User processes NM – Node Manager, process in charge of communicating with mesh and dispatching jobs to Workers. NM can play two different roles in mesh: Root and Leaf. More about this in mesh topology description W – Worker, process that executes jobs. Capable of executing one job at a time. When done, notifies NM and client.

Node structure. Supervision tree S – Supervisor, is a system process in charge of restarting crashed processes. Doesn’t implement user logic. NM and W are user processes NM w w w w w w

Node Manager’s roles. Leaf Leaf Node manager dispatches received jobs to local workers gets “done” notifications and notifies its root. Worker sends job result directly to client and notifies Node manager. W W W W W W NM job done result

Node manager’s roles. Root Roots are responsible for getting jobs from web server and forwarding them to the least occupied Leaf they have. Roots are also responsible for accepting join mesh requests from node managers and assigning roles to them. Roots do not execute jobs on their local workers. R LLL

Mesh topology overview Each HTTP request is forwarded by web server to a randomly chosen Root. Each Root forwards jobs to the least occupied Leafs registered at them Results are sent directly to web server from workers. Web Server R R R R R R R R R R HTTP requests Mesh job requests

Mesh topology. Join protocol Each root has at most MaxChildNum leaves. Root that has maximum children is called saturated. Root with less children is called hungry. There are always at least MinRootNum roots in operating mesh. If there are less, each new node is assigned a root role. If all roots are saturated, a new node is assigned to be a Root. Otherwise it becomes hungry root’s leaf.

Recovery protocol. When a leaf crashes, its root is notified and its jobs are reassigned. When root crashes, all its leafs perform a join request. If such a leaf gets root role, it reassigns its pending jobs. When both leaf and its root crash and there is no info about the job in mesh, web server resends the job after a timeout.

Sending job protocol Web server randomly chooses one of the roots and sends the request to it. If that root has no leaves registered, web server is notified and request is resent. Otherwise the job is forwarded to the least occupied leaf of the root. If all leaf’s workers are occupied the job is stored in the pending jobs list. When worker becomes available it is assigned one of the pending jobs

Implementation details. Process groups There are two registered pg2 groups: root_group and hungry_group. When a node manager becomes a root, it joins root_group and hungry_group. When root has maximum number of children, it leaves the hungry_group. When a saturated node looses a leaf, it rejoins hungry_group. Access to groups is synchronized by a global lock to avoid race conditions.

Locking clarification An Erlang way to synchronize access to a shared resource is by implementing a “resource manager” process that would get access requests and execute them. Due to requirement to have no single point of failure we decided not to implement such a process to sync access to root groups. Hence we were forced to implement a locking primitive which is not an Erlang way to solve the problem.

Implementation details. Monitoring Each leaf monitors its root (erlang:monitor() function). Each root monitors its leaves.

Implementation details. Node manager module Implements gen_server behavior. Role is preserved in the state. When the role is changed only the corresponding field in the state changes. Node manager is capable of processing other roles’ messages (this is useful when leaf turns into a root and might still get job_done messages from its workers)

Implementation details. Worker module Implements gen_server behavior. The only function is to execute jobs. Job execution is not of our interest. It is simulated by sleeping a certain amount of time.

Implementation details. Mesh module Provides interface for sending jobs protocol. Interface for joining mesh protocol.