Distributed Systems 2006 Virtual Synchrony II* *With material adapted from Ken Birman, Ben Wang, Bill Burke, Bela Ban.

Distributed Systems 2006 Virtual Synchrony II* *With material adapted from Ken Birman, Ben Wang, Bill Burke, Bela Ban

Distributed Systems 20062 Plan We skip Section 18.2 Tracking group membership: We’ll base it on 2PC and 3PC Fault-tolerant multicast: We’ll use membership Ordered multicast: We’ll base it on fault-tolerant multicast Tools for solving practical replication and availability problems: we’ll base them on ordered multicast Robust Web Services: We’ll build them with these tools 2PC and 3PC: Our first “tools” (lowest layer)

Distributed Systems 20063 Recap: Elements of Virtual Synchrony Support for process groups –Processes may join and leave dynamically –Excluded if (thought to) fail Various reliable, ordered multicast protocols –Strive to replace synchronous, totally-ordered, dynamically uniform protocols View-synchronous delivery –Two processes that are member of the same view receive same set of multicasts during that view Identical process groups views and rankings –Identical sequences of group membership lists Gap-freedom guarantees –If, after a failure, m1 has been delivered to a destination, then any message m0 that should be delivered prior to m1 is also delivered State transfer for joining processes –A new process may obtain group state from existing member –(Will develop this shortly)

Distributed Systems 20064 Distributed Algorithms Election Consensus Consistent snapshot Replicated data and synchronization State transfer Load-balancing Fault tolerance –Primary-backup –Coordinator-cohort

Distributed Systems 20065 State transfer We need to transfer current state of a process group to joining members? ”State” is application-dependent –Creating state should be done by application itself –E.g., org.jgroups.MessageListener byte[] getState() void setState(byte[] state) Simple approach –Just make all members transfer state to joining process Reasonable approach –Joining process pulls state from one existing member –Take another if first one fails

Distributed Systems 20066 Load-Balancing Coordinate members of a group to share workload in order to obtain a speed-ud for parallelism? Use groups communication to implement load- balancing of request... Styles of algorithms –Group decides who should handle request –Client decides who should handle request

Distributed Systems 20067 Group Decides Multicast request from client to full membership –May require expensive transfer to all Need deterministic rule for deciding who should handle request –E.g., with abcast, requests may be numbered in a total order –A process may handle the ith request if the process rank is i mod n With cbcast?

Distributed Systems 20068 Client affinity schemes Group members provide clients with information used to select appropriate serve to send request to Best choice dependent on data size, fault-tolerance needed, queries/updates,... E.g., –Static assigment of client to specific server + caching, - for very active clients –Pick a random server –Base choice on (approximate) load information Could also be used with previous approach

Distributed Systems 20069 Using approximate load information Assume that processes send out load reports using abcast –E.g., 0 for no load, 1 for currently handling 1 request etc. Represent load on group of n servers as a vector: [l 0,..., l n-1 ] –l max = max(l 0,..., l n-1 ) + 1 –[l 0 ’,..., l n-1 ’] = [l max - l 0,..., l max - l n-1 ] –L’ = l 0 ’ +... + l n-1 ’ Map incoming requests to process –Given a request, choose a random number, r, between 0 and L’ By applying pseudo-random generator, same seed at all processes –Now choose process i if l 0 ’ +... + l i ’ <= r < l 0 ’ +... + l i+1 ’ (i < n-1) (Think of l 0 ’,..., l i ’ as points on a line with length L’, then the algorithm selects the segment that r is within)

Distributed Systems 200610 Fault tolerance We want to offer clients “fault- tolerant request execution”? We can replace a traditional service with a group of members –Each request is assigned to a primary (ideally, spread the work around) and a backup Primary sends a “cc” of the response to the request to the backup –Backup keeps a copy of the request and steps in only if the primary crashes before replying Sometimes called “coordinator/cohort” just to distinguish from “primary/backup”

Distributed Systems 200611 Trade-offs

Distributed Systems 200612 Trade-offs Membership –Static Fixed membership, changing connectivity and availability –Dynamic Changing membership, fixed connectivity and availability Consistency –Internal Defined with respect to members of group observing messages –External Defined with respect to external observer (e.g., a database)

Distributed Systems 200613 Toolkits Isis Horus Ensemble JGroups Spread

Distributed Systems 200614 Features of major virtual synchrony platforms Isis: first and no longer widely used –But was perhaps the most successful; has major roles in NYSE, Swiss Exchange, French Air Traffic Control system (two major subsystems of it), US AEGIS Naval warship –Pioneered use of cbcast –Also was first to offer a publish-subscribe interface that mapped topics to groups

Distributed Systems 200615 Features of major virtual synchrony platforms Horus, JGroups and Ensemble –Successors to Isis –These focus on flexible protocol stack linked directly into application address space A stack is a pile of micro-protocols Can assemble an optimized solution fitted to specific needs of the application by plugging together “properties this application requires”, lego-style The system is optimized to reduce overheads of this compositional style of protocol stack Use –JGroups is very popular Used in, e.g., JBoss, OpenSymphony OSCache, Jetty, Tomcat –Ensemble is somewhat popular and supported by a user community –Horus works well but is not widely used.

Distributed Systems 200616 Horus/JGroups/Ensemble protocol stacks Application belongs to process group comm nak frag mbrshp fc comm nak frag comm nak frag mbrshp parcld comm nak frag mbrshp merge total

Distributed Systems 200617 Spread Toolkit Focused on a sort of “RISC” approach –Very simple architecture and system –Fairly fast, easy to use, rather popular Supports one large group within which user sees many small “lightweight” subgroups that seem to be free-standing Protocols implemented by Spread “agents” that relay messages to apps

Distributed Systems 200618 Case: J2EE/JBoss Java 2 Enterprise Edition (J2EE) –Multi-tiered, distributed application model / reference architecture Tiered = physically layered architecture –Technologies to support this reference architecture Enterprise Java Beans (EJB) –Server-side component model –Component: “… a unit of composition with contractually specified interfaces and explicit context dependencies only. A software component can be deployed independently and is subject to composition by third parties” JBoss –Open source J2EE Application Server –Arguably the most popular Java application server

Distributed Systems 200619 J2EE Business Context Powerful workstations (and servers) makes distributed computing viable –Programming abstractions for distributed computing needed The Web… –Internet-enabled business systems / enterprise information systems –Specific requirements for web applications Scalability –support variations in load –e.g., Amazon before Christmas Availability –Very small downtime periods –e.g., eBay (400 million transactions/day) Security –Authenticate and authorize users Usability –Different users should have different contents in different forms Performance –Reasonable response times needed –Requests often arrive in bursts –Also, e.g., time-to-market...

Distributed Systems 200620 Architectural Solution

Distributed Systems 200621 Tiers Client tier –User interfaces Internet browser Standalone Java clients (COM applications) Middle tier –Web tier Web server for handling requests from browsers –Gets request from client tier –Forwards to business component tier –Renders result –Business component tier Core “business logic” –E.g., customers, accounts, …, relationships and rules among these in Web shop Realized by EJBs running in an EJB container “Application server” = middle-tier component server compatible with J2EE Enterprise information systems tier –Databases –Backend systems

Distributed Systems 200622 Example: EHR in Ribe Amt

Distributed Systems 200623 Specific J2EE-Related Technologies JavaServer Pages (JSP) –Creating dynamic web-based content Java Servlets –Extending functionality of web servers Java Messaging Service (JMS) –Asynchronous point-to-point and many-to-many messaging Java Naming and Directory Interface (JNDI) –Directory-based retrieval of user-defined objects J2EE Connector Architecture –Standard architecture for integrating external systems RMI over IIOP –RMI using OMG’s Internet Inter-Orb Protocol Java DataBase Connectivity (JDBC) –Uniform interface to relational databases Enterprise JavaBeans (EJB)

Distributed Systems 200624 EJB Deployment View EJB container –Manage execution of components –Expose platform services

Distributed Systems 200625 EJB Code View Need to enable remote clients to access bean –Remote interface –~ Proxy object Manage lifecycle of bean –Home interface –Possibly functionality to locate specific instances –~ Factory object Implement functionality –Bean class Clients use generated stubs

Distributed Systems 200626 Detailed Example

Distributed Systems 200627 EJB Types Entity beans –Representing business data objects –Data members map to data items stored in associated data base –Container-managed persistence Container loads and stores data No application code required –Bean-managed persistence Bean code responsible itself for persistence –“handcrafted” JDBC Session beans –Business logic and services –“Stateful“ (SFSBs) Can keep state on behalf of client Successive calls go to same component Container handles life-cycle –Passivation –Activation E.g., CommandBean –“Stateless” (SLSBs) Does not keep any state on behalf of client Each successive call delegated to stateless session bean as needed Easy scalability and load balancing E.g., PrintHandlerBean Message-driven beans –Stateless –Asynchronous listener style of invocation

Distributed Systems 200628 ”Clustering” and J2EE Scalability –I want to handle x times the number of concurrent access than what I have now High availability –Services are accessible with reasonable (and predictable) response times at any time –E.g., 99.999 (5 Nines in Telco) Load balancing –A way to obtain high availability and better performance by dispatching incoming requests to different servers –Session affinity (or stickiness) –Checking heart beat Failover –Process can continue when it is re-directed to a “backup” node because the original one fails –What is the policy? Round-robin? Fault tolerance –A service that guarantees strictly correct behavior despite system failure

Distributed Systems 200629 JGroups in JBoss? Serverless JMS Clustering –Replication of entity beans, SLSBs and SFSBs –HA-JNDI –Session replication (integrated Tomcat, Jetty) Cache –Replicated transactional clustered cache

Distributed Systems 200630 The real world is complex...

Distributed Systems 200631 Default JBoss JGroup Configuration <UDP mcast_addr="${jboss.partition.udpGroup:228.1.2.3}" mcast_port="45566" ip_ttl="8" ip_mcast="true" mcast_send_buf_size="800000” mcast_recv_buf_size="150000" ucast_send_buf_size="800000" ucast_recv_buf_size="150000" loopback="false"/> <PING timeout="2000" num_initial_members="3" up_thread="true" down_thread="true"/> <FD shun="true" up_thread="true" down_thread="true" timeout="2500" max_tries="5"/> <VERIFY_SUSPECT timeout="3000" num_msgs="3" up_thread="true" down_thread="true"/> <pbcast.NAKACK gc_lag="50" retransmit_timeout="300,600,1200,2400,4800" max_xmit_size="8192" up_thread="true" down_thread="true"/> <UNICAST timeout="300,600,1200,2400,4800" window_size="100" min_threshold="10" down_thread="true"/> <pbcast.STABLE desired_avg_gossip="20000" up_thread="true" down_thread="true"/> <pbcast.GMS join_timeout="5000" join_retry_timeout="2000" shun="true" print_local_addr="true"/>

Distributed Systems 200632 Serverless JMS Java Messaging Service based on JGroups –Peer-to-peer architecture rather than Client/Server Client publishing to a topic –Other clients subscribe to topics –The ”publish/subscribe” paradigm Instead of sending message to server, and server distributes to multiple clients: publisher multicasts message –JMS Server just another member Handles persistent messages (DB)

Distributed Systems 200633 Serverless JMS Cost: 4 unicastsCost: 1 multicast

Distributed Systems 200634 Serverless JMS Clients are still able to publish even when server is down Caveat –works in scenario where client and server are in same multicast-reachable network

Distributed Systems 200635 Session Replication in Tomcat Tomcat and Jetty are both Java-based web servers Servlet sessions are replicated across Tomcat processes –New Tomcat instance gets sessions from existing Tomcat instance(s) –Modification (addition, removal of attributes) of session gets replicated

Distributed Systems 200636 Session replication in Tomcat Expiry of session will expire session everywhere Last timestamp update External load-balancer distributes requests to Tomcat instances –Round-robin –Sticky, next server on crash

Distributed Systems 200637 Summary We gave further examples of what can be built on top of Virtual Synchrony Brief examples of toolkits and uses –Isis, Ensemble Horus, Spread, JGroups J2EE introduction as bonus

Distributed Systems 2006 Virtual Synchrony II* *With material adapted from Ken Birman, Ben Wang, Bill Burke, Bela Ban.

Similar presentations

Presentation on theme: "Distributed Systems 2006 Virtual Synchrony II* *With material adapted from Ken Birman, Ben Wang, Bill Burke, Bela Ban."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Distributed Systems 2006 Virtual Synchrony II* *With material adapted from Ken Birman, Ben Wang, Bill Burke, Bela Ban.

Similar presentations

Presentation on theme: "Distributed Systems 2006 Virtual Synchrony II* *With material adapted from Ken Birman, Ben Wang, Bill Burke, Bela Ban."— Presentation transcript:

Similar presentations

About project

Feedback