Team 6: Slackers 18749: Fault Tolerant Distributed Systems Team Members Puneet Aggarwal Karim Jamal Steven Lawrance Hyunwoo Kim Tanmay Sinha.

Slides:



Advertisements
Similar presentations
What is RMI? Remote Method Invocation –A true distributed computing application interface for Java, written to provide easy access to objects existing.
Advertisements

COM vs. CORBA.
CERN LCG Overview & Scaling challenges David Smith For LCG Deployment Group CERN HEPiX 2003, Vancouver.
Remote Procedure Call (RPC)
Exceptions and Exception Handling Carl Alphonce CSE116 March 9, 2007.
Remote Method Invocation (RMI) Mixing RMI and sockets
The SMART Way to Migrate Replicated Stateful Services Jacob R. Lorch, Atul Adya, Bill Bolosky, Ronnie Chaiken, John Douceur, Jon Howell Microsoft Research.
Module 20 Troubleshooting Common SQL Server 2008 R2 Administrative Issues.
Approaches to EJB Replication. Overview J2EE architecture –EJB, components, services Replication –Clustering, container, application Conclusions –Advantages.
Team 1: Box Office : Analysis of Software Artifacts : Dependability Analysis of Middleware JunSuk Oh, YounBok Lee, KwangChun Lee, SoYoung Kim,
Enterprise Java Beans Welcome to the world of “Distributed System” Presented By: Sameer Nanda Date: 12/17/03.
SWE Introduction to Software Engineering
Server Architecture Models Operating Systems Hebrew University Spring 2004.
Team 2: The House Party Blackjack Mohammad Ahmad Jun Han Joohoon Lee Paul Cheong Suk Chan Kang.
1 Philippe. Team 3: Spam’n’Beans : Analysis of Software Artifacts : Dependability Analysis of Middleware Gary Ackley Andrew Boyer Charles.
Silberschatz, Galvin and Gagne ©2009Operating System Concepts – 8 th Edition Chapter 4: Threads.
Team 4: : Fault-Tolerant Distributed Systems Bryan Murawski Meg Hyland Jon Gray Joseph Trapasso Prameet Shah Michael Mishkin.
Lesson 3 Remote Method Invocation (RMI) Mixing RMI and sockets Rethinking out tic-tac-toe game.
Maintaining and Updating Windows Server 2008
Chapter 9 Overview  Reasons to monitor SQL Server  Performance Monitoring and Tuning  Tools for Monitoring SQL Server  Common Monitoring and Tuning.
Web-based Document Management System By Group 3 Xinyi Dong Matthew Downs Joshua Ferguson Sriram Gopinath Sayan Kole.
Students: Nadia Goshmir, Yulia Koretsky Supervisor: Shai Rozenrauch Industrial Project Advanced Tool for Automatic Testing Final Presentation.
Managing DHCP. 2 DHCP Overview Is a protocol that allows client computers to automatically receive an IP address and TCP/IP settings from a Server Reduces.
CSE 486/586 CSE 486/586 Distributed Systems PA Best Practices Steve Ko Computer Sciences and Engineering University at Buffalo.
Silberschatz, Galvin and Gagne ©2011Operating System Concepts Essentials – 8 th Edition Chapter 4: Threads.
COM vs. CORBA Computer Science at Azusa Pacific University September 19, 2015 Azusa Pacific University, Azusa, CA 91702, Tel: (800) Department.
6st ACS Workshop UTFSM ACS Course Component, Container, Lifecycle Management 6st ACS Workshop UTFSM, Valparaiso, Chile H. Sommer, G. Chiozzi.
MSE Presentation 3 By Padmaja Havaldar- Graduate Student
Enterprise JavaBeans. What is EJB? l An EJB is a specialized, non-visual JavaBean that runs on a server. l EJB technology supports application development.
INT-Evry (Masters IT– Soft Eng)IntegrationTesting.1 (OO) Integration Testing What: Integration testing is a phase of software testing in which.
Csi315csi315 Client/Server Models. Client/Server Environment LAN or WAN Server Data Berson, Fig 1.4, p.8 clients network.
Java Threads 11 Threading and Concurrent Programming in Java Introduction and Definitions D.W. Denbo Introduction and Definitions D.W. Denbo.
Team 5: Virtual Online Blackjack : Analysis of Software Artifacts : Dependability Analysis of Middleware Philip Bianco John Robert Vorachat.
Middleware for FIs Apeego House 4B, Tardeo Rd. Mumbai Tel: Fax:
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
CS 346 – Chapter 4 Threads –How they differ from processes –Definition, purpose Threads of the same process share: code, data, open files –Types –Support.
Sunday, October 15, 2000 JINI Pattern Language Workshop ACM OOPSLA 2000 Minneapolis, MN, USA Fault Tolerant CORBA Extensions for JINI Pattern Language.
CE Operating Systems Lecture 3 Overview of OS functions and structure.
Systems Management Server 2.0: Backup and Recovery Overview SMS Recovery Web Site location: Updated.
Bulk Data Movement: Components and Architectural Diagram Alex Sim Arie Shoshani LBNL April 2009.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation MongoDB Architecture.
Kuali Rice at Indiana University From the System Owner Perspective July 29-30, 2008 Eric Westfall.
Title Line Subtitle Line Top of Content Box Line Top of Footer Line Left Margin LineRight Margin Line Top of Footer Line Top of Content Box Line Subtitle.
Processes Introduction to Operating Systems: Module 3.
Implementing Simple Replication Protocols using CORBA Portable Interceptors and Java Serialization T. Bennani, L. Blain, L. Courtes, J.-C. Fabre, M.-O.
GLOBE DISTRIBUTED SHARED OBJECT. INTRODUCTION  Globe stands for GLobal Object Based Environment.  Globe is different from CORBA and DCOM that it supports.
The Process Manager in the ATLAS DAQ System G. Avolio, M. Dobson, G. Lehmann Miotto, M. Wiesmann (CERN)
Jigsaw Performance Analysis Potential Bottlenecks.
Eric Tryon Brian Clark Christopher McKeowen. System Architecture The architecture can be broken down to three different basic layers Stub/skeleton layer.
OOPSLA 2001 Choosing Transaction Models for Enterprise Applications Jim Tyhurst, Ph.D. Tyhurst Technology Group LLC.
CSI 3125, Preliminaries, page 1 SERVLET. CSI 3125, Preliminaries, page 2 SERVLET A servlet is a server-side software program, written in Java code, that.
The Project Presentation April 28, : Fault-Tolerant Distributed Systems Team 7-Sixers Kyu Hou Minho Jeung Wangbong Lee Heejoon Jung Wen Shu.
SPI NIGHTLIES Alex Hodgkins. SPI nightlies  Build and test various software projects each night  Provide a nightlies summary page that displays all.
EJB Enterprise Java Beans JAVA Enterprise Edition
EJB. Introduction Enterprise Java Beans is a specification for creating server- side scalable, transactional, multi-user secure enterprise-level applications.
/16 Final Project Report By Facializer Team Final Project Report Eagle, Leo, Bessie, Five, Evan Dan, Kyle, Ben, Caleb.
Process Management Deadlocks.
WebSphere Diego Leone.
What is RMI? Remote Method Invocation
LCGAA nightlies infrastructure
Top Reasons to Choose Angular. Angular is well known for developing robust and adaptable Single Page Applications (SPA). The Application structure is.
Architecting Availability Groups
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
Design and Programming
Modified by H. Schulzrinne 02/15/10 Chapter 4: Threads.
COMPONENTS – WHY? Object-oriented source-level re-use of code requires same source code language. Object-oriented source-level re-use may require understanding.
The SMART Way to Migrate Replicated Stateful Services
Chapter 4: Threads.
Presentation transcript:

Team 6: Slackers 18749: Fault Tolerant Distributed Systems Team Members Puneet Aggarwal Karim Jamal Steven Lawrance Hyunwoo Kim Tanmay Sinha

Team Slackers - Park 'n Park2 Team Members URL:

Team Slackers - Park 'n Park3 Overview Baseline Application Baseline Architecture FT-Baseline Goals FT-Baseline Architecture Fail-Over Mechanisms Fail-Over Measurements Fault Tolerance Experimentation Bounded “Real Time” Fail-Over Measurements FT-RT-Performance Strategy Other Features Conclusions Baseline Application Baseline Architecture FT-Baseline Goals FT-Baseline Architecture Fail-Over Mechanisms Fail-Over Measurements Fault Tolerance Experimentation Bounded “Real Time” Fail-Over Measurements FT-RT-Performance Strategy Other Features Conclusions

Team Slackers - Park 'n Park4 Baseline Application

Team Slackers - Park 'n Park5 Baseline Application A system that manages the information and status of multiple parking lots. Keeps track of how many spaces are available in the lot and at each level. Recommends other available lots that are nearby if the current lot is full. Allows drivers to enter/exit lots and move up/down levels once in a parking lot. A system that manages the information and status of multiple parking lots. Keeps track of how many spaces are available in the lot and at each level. Recommends other available lots that are nearby if the current lot is full. Allows drivers to enter/exit lots and move up/down levels once in a parking lot. What is Park ‘n Park?

Team Slackers - Park 'n Park6 Baseline Application Why is it interesting? Easy to implement Easy to distribute over multiple systems Potential of having multiple clients Middle-tier can be made stateless Hasn’t been done before in this class And most of all…who wants this? Easy to implement Easy to distribute over multiple systems Potential of having multiple clients Middle-tier can be made stateless Hasn’t been done before in this class And most of all…who wants this?

Team Slackers - Park 'n Park7 Baseline Application Development Tools Java –Familiarity with language –Platform independence CORBA –Long story (to be discussed later…) MySQL –Familiarity with the package –Free!!! –Available on ECE cluster Linux, Windows, and OS X –No one has the same system nowadays Eclipse, Matlab, CVS, and PowerPoint –Powerful tools in their target markets Java –Familiarity with language –Platform independence CORBA –Long story (to be discussed later…) MySQL –Familiarity with the package –Free!!! –Available on ECE cluster Linux, Windows, and OS X –No one has the same system nowadays Eclipse, Matlab, CVS, and PowerPoint –Powerful tools in their target markets

Team Slackers - Park 'n Park8 Baseline Application High-Level Components Client –Provides an interface to interact with the user –Creates an instance of Client Manager Server –Manages Client Manager Factory –Handles CORBA functions Client Manager –Part of middle tier –Manages various client functions –Unique for each client Client Manager Factory –Part of middle tier –Factory for Client Manager instances Database –Stores the state for each client –Stores the state of the parking lots (i.e. occupancy of lots and levels, distances to other parking lots) Naming Service –Allows client to obtain reference to a server Client –Provides an interface to interact with the user –Creates an instance of Client Manager Server –Manages Client Manager Factory –Handles CORBA functions Client Manager –Part of middle tier –Manages various client functions –Unique for each client Client Manager Factory –Part of middle tier –Factory for Client Manager instances Database –Stores the state for each client –Stores the state of the parking lots (i.e. occupancy of lots and levels, distances to other parking lots) Naming Service –Allows client to obtain reference to a server

Team Slackers - Park 'n Park9 Baseline Architecture

Team Slackers - Park 'n Park10 Baseline Architecture High-Level Components Server Client Naming Service Middleware Database 3. Contact naming service 4. Create client manager instance 7. Request data 2. Register name Client Manager Factory Client Manager 6. Invoke service method Processes and Threads x y Data Flow Legend 1. Create instance 5. Create instance

Team Slackers - Park 'n Park11 FT-Baseline Goals

Team Slackers - Park 'n Park12 FT-Baseline Goals Main Goals Replicate the entire middle tier in order to make the system fault- tolerant. The middle tier includes –Client Manager –Client Manager Factory –Server No need to replicate the naming service, replication manager, and database because of added complexity and limited development time Maintain the stateless nature of the middle tier by storing all state in the database For the fault tolerant baseline application –3 replicas of the servers on clue, chess, and go Naming service (boggle), Replication Manager (boggle) and Database (previously on mahjongg, now on girltalk) on the sacred servers –Have not been replicated and are single point-of-failures Replicate the entire middle tier in order to make the system fault- tolerant. The middle tier includes –Client Manager –Client Manager Factory –Server No need to replicate the naming service, replication manager, and database because of added complexity and limited development time Maintain the stateless nature of the middle tier by storing all state in the database For the fault tolerant baseline application –3 replicas of the servers on clue, chess, and go Naming service (boggle), Replication Manager (boggle) and Database (previously on mahjongg, now on girltalk) on the sacred servers –Have not been replicated and are single point-of-failures

Team Slackers - Park 'n Park13 FT-Baseline Goals FT Framework Replication Manager –Responsible for checking liveliness of servers –Performs fault detection and recovery of servers –Can handle an arbitrary amount of server replicas –Can be restarted Fault Injector –kill -9 –Script to periodically kill primary server Added in the RT-FT-Baseline implementation Replication Manager –Responsible for checking liveliness of servers –Performs fault detection and recovery of servers –Can handle an arbitrary amount of server replicas –Can be restarted Fault Injector –kill -9 –Script to periodically kill primary server Added in the RT-FT-Baseline implementation

Team Slackers - Park 'n Park14 FT-Baseline Architecture

Team Slackers - Park 'n Park15 FT-Baseline Architecture High Level Components Server Client Naming Service Middleware Database 4. Contact naming service 5. Create client manager instance 8. Request data 2. Register name Client Manager Factory Client Manager 7. Invoke service method Processes and Threads x y Data Flow Legend 1. Create instance 6. Create instance Replication Manager 3. Notify of existence poke() bind() / unbind()

Team Slackers - Park 'n Park16 Fail-Over Mechanism

Team Slackers - Park 'n Park17 Fail-Over Mechanism Fault Tolerant Client Manager Resides on the client side Invokes service methods on the client Manager on behalf of the client Responsible for fail-over –Detects faults by catching exceptions –If an exception is thrown during a service call/invocation, it gets the primary server reference from the naming service and retries the failed operation using the new server reference Resides on the client side Invokes service methods on the client Manager on behalf of the client Responsible for fail-over –Detects faults by catching exceptions –If an exception is thrown during a service call/invocation, it gets the primary server reference from the naming service and retries the failed operation using the new server reference

Team Slackers - Park 'n Park18 Fail-Over Mechanism Replication Manager Detects faults using method called “poke” Maintains a dynamic list of active servers Restarts failed/corrupted servers Performs naming service maintenance –Unbinds names of crashed servers –Rebinds name of primary server Uses the most-recently-active methodology to choose a new primary server in case the primary server experiences a fault Detects faults using method called “poke” Maintains a dynamic list of active servers Restarts failed/corrupted servers Performs naming service maintenance –Unbinds names of crashed servers –Rebinds name of primary server Uses the most-recently-active methodology to choose a new primary server in case the primary server experiences a fault

Team Slackers - Park 'n Park19 Fail-Over Mechanism The Poke Method “Pokes” the server periodically Not only checks whether or not the server is alive, but also whether the server’s database connectivity is intact or is corrupted Throws exceptions in case of faults (i.e. can’t connect to database) The replication manager handles faults accordingly “Pokes” the server periodically Not only checks whether or not the server is alive, but also whether the server’s database connectivity is intact or is corrupted Throws exceptions in case of faults (i.e. can’t connect to database) The replication manager handles faults accordingly

Team Slackers - Park 'n Park20 Fail-Over Mechanism Exceptions Handled COMM_FAILURE: CORBA exception OBJECT_NOT_EXIST: CORBA exception SystemException: CORBA exception Exception: Java exception AlreadyInLotException: Client is already in a lot AtBottomLevelException: Car cannot move to a lower level because it's on the bottom floor AtTopLevelException: Car cannot move to a higher level because it's on the top floor InvalidClientException: ID provided by Client doesn’t match the ID stored in the system LotFullException: System throws exception when the lot is full LotNotFoundException: Lot number not found in the database NotInLotException: Client's car is not in the lot NotOnExitLevelException: Client is not on an exit level in the lot ServiceUnavailableException: Exception that gets thrown when an unrecoverable database exception or some other error prevents the server from successfully completing a client-requested operation COMM_FAILURE: CORBA exception OBJECT_NOT_EXIST: CORBA exception SystemException: CORBA exception Exception: Java exception AlreadyInLotException: Client is already in a lot AtBottomLevelException: Car cannot move to a lower level because it's on the bottom floor AtTopLevelException: Car cannot move to a higher level because it's on the top floor InvalidClientException: ID provided by Client doesn’t match the ID stored in the system LotFullException: System throws exception when the lot is full LotNotFoundException: Lot number not found in the database NotInLotException: Client's car is not in the lot NotOnExitLevelException: Client is not on an exit level in the lot ServiceUnavailableException: Exception that gets thrown when an unrecoverable database exception or some other error prevents the server from successfully completing a client-requested operation

Team Slackers - Park 'n Park21 Fail-Over Mechanism Response to Exceptions Get new server reference and then re-try the failed operation when the following exception occurs –COMM_FAILURE –OBJECT_NOT_EXIST –ServiceUnavailableException Report error to user and prompt for next command when the following exceptions occur –AlreadyInLotException –AtBottomLevelException –AtTopLevelException –LotFullException –LotNotFoundException –NotInLotException –NotOnExitLevelException Client terminates when the following exceptions occur –InvalidClientException –SystemException –Exception Get new server reference and then re-try the failed operation when the following exception occurs –COMM_FAILURE –OBJECT_NOT_EXIST –ServiceUnavailableException Report error to user and prompt for next command when the following exceptions occur –AlreadyInLotException –AtBottomLevelException –AtTopLevelException –LotFullException –LotNotFoundException –NotInLotException –NotOnExitLevelException Client terminates when the following exceptions occur –InvalidClientException –SystemException –Exception

Team Slackers - Park 'n Park22 Fail-Over Mechanism Server References The client obtains the reference to the primary server when –it is initially started –it notices that the server has crashed or been corrupted (i.e. COMM_FAILURE, ServiceUnavailableException) When the client notices that there is no primary server reference in the naming service, it displays an appropriate message and then terminates The client obtains the reference to the primary server when –it is initially started –it notices that the server has crashed or been corrupted (i.e. COMM_FAILURE, ServiceUnavailableException) When the client notices that there is no primary server reference in the naming service, it displays an appropriate message and then terminates

Team Slackers - Park 'n Park23 RT-FT-Baseline Architecture

Team Slackers - Park 'n Park24 RT-FT-Baseline Architecture High Level Components Server Client Naming Service Middleware Database 4. Contact naming service 5. Create client manager instance 8. Request data 2. Register name Client Manager Factory Client Manager 7. Invoke service method 1. Create instance 6. Create instance Replication Manager 3. Notify of existence poke() bind()/unbind() Testing Manager Processes and Threads x y Data Flow Legend x y Launches

Team Slackers - Park 'n Park25 Fault Tolerance Experimentation

Team Slackers - Park 'n Park26 Fault Tolerance Experimentation The Fault Free Run - Graph 1 While the mean latency stayed almost constant, the maximum latency varied

Team Slackers - Park 'n Park27 Fault Tolerance Experimentation The Fault Free Run - Graph 2 This demonstrates the conformance with the magical 1% theory

Team Slackers - Park 'n Park28 Fault Tolerance Experimentation The Fault Free Run - Graph 3 Mean latency increases as the reply size increases

Team Slackers - Park 'n Park29 Fault Tolerance Experimentation The Fault Free Run - Conclusions Our data conforms to the magical 1% theory, indicating that outliers account for less than 1% of the data points We hope that this helps with Tudor’s research Our data conforms to the magical 1% theory, indicating that outliers account for less than 1% of the data points We hope that this helps with Tudor’s research

Team Slackers - Park 'n Park30 Bounded “Real Time” Fail Over Measurements

Team Slackers - Park 'n Park31 Bounded “Real-Time” Fail Over Measurements The Fault Induced Run - Graph High latency is observed during faults

Team Slackers - Park 'n Park32 Bounded “Real-Time” Fail Over Measurements The Fault Induced Run - Pie Chart Client’s fault recovery timeout causes most of the latency

Team Slackers - Park 'n Park33 Bounded “Real-Time” Fail Over Measurements The Fault Induced Run - Conclusions We noticed that there is an observable latency when a fault occurs Most of the latency was caused by the client’s fault recovery timeout The second-highest contributor was the time that the client has to wait for the client manager to be restored on the new server We noticed that there is an observable latency when a fault occurs Most of the latency was caused by the client’s fault recovery timeout The second-highest contributor was the time that the client has to wait for the client manager to be restored on the new server

Team Slackers - Park 'n Park34 FT-RT-Performance Strategy

Team Slackers - Park 'n Park35 FT-RT-Performance Strategy Reducing Fail-Over Time Implemented strategies –Adjust client fault recovery timeout –Use IOGRs and cloning-like strategies –Pre-create TCP/IP connections to all servers Other strategies that could potentially be implemented –Database connection pool –Load balancing –Remove client ID consistency check Implemented strategies –Adjust client fault recovery timeout –Use IOGRs and cloning-like strategies –Pre-create TCP/IP connections to all servers Other strategies that could potentially be implemented –Database connection pool –Load balancing –Remove client ID consistency check

Team Slackers - Park 'n Park36 Measurements after Strategies Adjusting Waiting time The following graphs are for different values of wait time at the client end This is the time that the client waits in order to give the replication manager sufficient time to update the naming service with the new primary. The following graphs are for different values of wait time at the client end This is the time that the client waits in order to give the replication manager sufficient time to update the naming service with the new primary.

Team Slackers - Park 'n Park37 Measurements after Strategies Plot for 0 waiting time

Team Slackers - Park 'n Park38 Measurements after Strategies Plot for 500ms waiting time

Team Slackers - Park 'n Park39 Measurements after Strategies Plot for 1000ms waiting time

Team Slackers - Park 'n Park40 Measurements after Strategies Plot for 2000ms waiting time

Team Slackers - Park 'n Park41 Measurements after Strategies Plot for 2500ms waiting time

Team Slackers - Park 'n Park42 Measurements after Strategies Plot for 3000ms waiting time

Team Slackers - Park 'n Park43 Measurements after Strategies Plot for 3500ms waiting time

Team Slackers - Park 'n Park44 Measurements after Strategies Plot for 4000ms waiting time

Team Slackers - Park 'n Park45 Measurements after Strategies Plot for 4500ms waiting time

Team Slackers - Park 'n Park46 Measurements after Strategies Observations after After Adjusting Wait times The best results can be seen with 4000ms wait time. Even though there is a lot of reduction in fail-over time for lower values, we can observe significant amount of jitter. The reason for the jitter is that the client doesn’t get the updated primary from the naming service. Since our primary concern is bounded fail-over, we chose the strategy that has the least jitter, rather than the strategy that has the lowest latencies. The average recovery time is reduced by a decent amount (from about 5-6 secs to sec for 4000ms wait time). The best results can be seen with 4000ms wait time. Even though there is a lot of reduction in fail-over time for lower values, we can observe significant amount of jitter. The reason for the jitter is that the client doesn’t get the updated primary from the naming service. Since our primary concern is bounded fail-over, we chose the strategy that has the least jitter, rather than the strategy that has the lowest latencies. The average recovery time is reduced by a decent amount (from about 5-6 secs to sec for 4000ms wait time).

Team Slackers - Park 'n Park47 Measurements after Strategies Implementing IOGR Interoperable Object Group Reference In this, the client gets the list of all active servers from the naming service The client refreshes this list if all the servers in the list have failed The following graphs were produced after this strategy was implemented Interoperable Object Group Reference In this, the client gets the list of all active servers from the naming service The client refreshes this list if all the servers in the list have failed The following graphs were produced after this strategy was implemented

Team Slackers - Park 'n Park48 Measurements after Strategies > Plot after IOGR strategy (same axis)

Team Slackers - Park 'n Park49 Measurements after Strategies Plot after IOGR strategy (different axis)

Team Slackers - Park 'n Park50 Measurements after Strategies Pie Chart after IOGR strategy

Team Slackers - Park 'n Park51 Measurements after Strategies Observations after IOGR Strategy The recovery time is significantly reduced, from between 5- 6 seconds to less than half a second The time to get the new primary from the naming service is eliminated Most of the time is spent in obtaining an object of client manager The graph that is plotted on the different axis shows some amount of jitter, since, when all the servers in the client’s list are dead, then the client will have to go to the naming service The recovery time is significantly reduced, from between 5- 6 seconds to less than half a second The time to get the new primary from the naming service is eliminated Most of the time is spent in obtaining an object of client manager The graph that is plotted on the different axis shows some amount of jitter, since, when all the servers in the client’s list are dead, then the client will have to go to the naming service

Team Slackers - Park 'n Park52 Measurements after Strategies Implementing Open TCP/IP Connections This strategy was implemented since, after implementing the IOGR strategy, most of the time was spent in establishing a connection with the next server and getting the client manager In this, the client maintains the open TCP/IP connections with all the servers So the time to create a connection is saved The following graphs were produced after the open TCP/IP connections strategy was implemented This strategy was implemented since, after implementing the IOGR strategy, most of the time was spent in establishing a connection with the next server and getting the client manager In this, the client maintains the open TCP/IP connections with all the servers So the time to create a connection is saved The following graphs were produced after the open TCP/IP connections strategy was implemented

Team Slackers - Park 'n Park53 Measurements after Strategies Plot after maintaining opening connections (same axis, 1 client)

Team Slackers - Park 'n Park54 Measurements after Strategies Plot after maintaining opening connections (different axis, 1 client)

Team Slackers - Park 'n Park55 Measurements after Strategies Pie Chart after maintaining opening connections (1 Client)

Team Slackers - Park 'n Park56 Measurements after Strategies Plot after maintaining opening connections (same axis, 10 clients)

Team Slackers - Park 'n Park57 Measurements after Strategies Plot after opening connections (different axis, 10 clients)

Team Slackers - Park 'n Park58 Measurements after Strategies Pie Chart after Opening Connections (10 Clients)

Team Slackers - Park 'n Park59 Measurements after Strategies Observations after implementation of open connections for 1 client The recovery time is reduced compared to the cloning strategy Maximum time taken in is still in obtaining an object of client manager There is noticeable jitter when observed from different axis The recovery time is reduced compared to the cloning strategy Maximum time taken in is still in obtaining an object of client manager There is noticeable jitter when observed from different axis

Team Slackers - Park 'n Park60 Measurements after Strategies Observations after implementation of open connections for 10 clients Significant reduction is observed in fail-over time Maximum time taken in is still in obtaining an object of client manager It can also be observed that a significant amount of time is taken in waiting for acquiring a lock on the thread Significant reduction is observed in fail-over time Maximum time taken in is still in obtaining an object of client manager It can also be observed that a significant amount of time is taken in waiting for acquiring a lock on the thread

Team Slackers - Park 'n Park61 Other Features

Team Slackers - Park 'n Park62 Other Features The Long Story - EJB 3.0 It’s actually not that long of a story…we tried to use EJB 3.0 and failed miserably. The End. The main issues with using EJB 3.0 are –It is a new technology, so documentation on it is very sparse –It is still evolving and changing, which can cause problems (e.g. JBoss vs 4.0.4) –The development and deployment is significantly different from EJB 2.1, which introduces a new learning curve –It is not something that can be learned in one weekend… It’s actually not that long of a story…we tried to use EJB 3.0 and failed miserably. The End. The main issues with using EJB 3.0 are –It is a new technology, so documentation on it is very sparse –It is still evolving and changing, which can cause problems (e.g. JBoss vs 4.0.4) –The development and deployment is significantly different from EJB 2.1, which introduces a new learning curve –It is not something that can be learned in one weekend…

Team Slackers - Park 'n Park63 Other Features Bells and Whistles Replication manager can be restarted Replication manager can handle an arbitrary number of servers Any server can be dynamically added and removed due to no hard-coding Cars can magically teleport in and out of parking lots (for testing robustness) Clients can manually corrupt the server’s database connection (for testing robustness) Use the Java Reflection API in the client to consolidate fault detection and recovery code Prevents Sun’s CORBA implementation from spewing exception stack traces to the user Highly-modularized dependency structure in the code (as proved by Lattix LDM) Other stuff that we can’t remember … Replication manager can be restarted Replication manager can handle an arbitrary number of servers Any server can be dynamically added and removed due to no hard-coding Cars can magically teleport in and out of parking lots (for testing robustness) Clients can manually corrupt the server’s database connection (for testing robustness) Use the Java Reflection API in the client to consolidate fault detection and recovery code Prevents Sun’s CORBA implementation from spewing exception stack traces to the user Highly-modularized dependency structure in the code (as proved by Lattix LDM) Other stuff that we can’t remember …

Team Slackers - Park 'n Park64 Other Features Lessons Learned It’s difficult to implement real-time, fault tolerance, and high performance, especially if it is not factored into the architecture from the start Choose an application that will permit you to easily apply the concepts learned in the class Don’t waste time with bells and whistles until you have time to do so Run your measurements before other teams hog and crash the server Set up your own database server Kill the server such that logs are flushed before the server dies Catch and handle as many exceptions as possible It’s a good thing that we did not use JBoss! Use the girltalk server because no one else is going to use that one … It’s difficult to implement real-time, fault tolerance, and high performance, especially if it is not factored into the architecture from the start Choose an application that will permit you to easily apply the concepts learned in the class Don’t waste time with bells and whistles until you have time to do so Run your measurements before other teams hog and crash the server Set up your own database server Kill the server such that logs are flushed before the server dies Catch and handle as many exceptions as possible It’s a good thing that we did not use JBoss! Use the girltalk server because no one else is going to use that one …

Team Slackers - Park 'n Park65 Other Features … Painful Lessons Learned Most painful lessons learned: 1.The EJB concept takes time to learn and use 2.EJB 3.0 introduces another learning curve 3.JBoss provides many, many configuration options, which makes deploying an application a challenging task 4 – 10. Don’t try to learn the concepts of EJB… …and EJB 3.0… …and JBoss… …all at the same time… …in one weekend… …especially when the project is due the following Monday… …!!!!!!!!!!!!!!!!!!!! Most painful lessons learned: 1.The EJB concept takes time to learn and use 2.EJB 3.0 introduces another learning curve 3.JBoss provides many, many configuration options, which makes deploying an application a challenging task 4 – 10. Don’t try to learn the concepts of EJB… …and EJB 3.0… …and JBoss… …all at the same time… …in one weekend… …especially when the project is due the following Monday… …!!!!!!!!!!!!!!!!!!!!

Team Slackers - Park 'n Park66 Conclusions

Team Slackers - Park 'n Park67 Conclusions If we had the Time Turner!! We would start right away with CORBA, (100+ hours were a little too much) And we just found out… … would have counted the number of invocations in the experiments before submitting We would start right away with CORBA, (100+ hours were a little too much) And we just found out… … would have counted the number of invocations in the experiments before submitting

Team Slackers - Park 'n Park68 Conclusions …the final word. Yeayyyyyyyyyyyyyyyyyyyyyyyyyy!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Team Slackers - Park 'n Park69 Thank You.