Business Continuity and DR, A Practical Implementation Mich Talebzadeh, Consultant, Deutsche Bank
Challenges we are facing Mission Critical businesses such as the majority of trading systems in the City are faced with the critical need to ensure the availability and continuous operation of their business systems. We need to make this happen in spite of potential failures ranging from disk crashes and CPU failures to catastrophic losses of the computing facilities or communications networks. While solutions exist to provide tolerance to component failure, the issue of catastrophic failures i.e. a failure that results in the production host being totally out of action needs to be looked at in view of new technical possibilities and recent experiences post September 11th.
Challenges we are facing We know that trading Products are inter-linked across the globe. We need to provide current information to business on a 24x7basis. Following recent experiences, there is currently a great demand from the Business Managers (as opposed to IT) that most trading systems across the globe must be able to function independently of each other in different geographical locations. In simple term this means inter-linked hubs functioning as a logical unit globally but capable of carrying on independently on their own (i.e. provide full functionality to the business), if a hub or a number of hubs get out of action. This does happen!
Systems we have Two-tier architectures are fine for small-to-moderate workloads. Examples are the majority of third party packages or existing systems. Most e-businesses and systems for company wide use will be based on a three-tier architecture which includes this hierarchy: –The first tier is the Web client. –The second (or middle) tier includes a Web server and possibly a Web application server. –The third tier includes the enterprise database and provides transaction management
Systems we have A three-tier architecture helps you manage your workload because they allow you to add redundant clients or servers at any of the three tiers. For example, because almost everyone has a Web browser, you already have redundancy at the first tier: You can access your applications from any of these browsers. Redundancy at the second tier has several benefits, including excess capacity for peak traffic times and fail-over support to increase application availability should any one server fail. This capability also allows you to partition the workload based on the incoming traffic. For example, you might want Web requests coming from your European offices to use your London-based Web servers (rather than your New York servers) to minimize client response and allow for third-tier redundancy.
Systems we have Redundancy at the third tier is usually necessary because of a need for increased application availability and distributed processing. We can provide redundant database servers for two major reasons: –Geographic distribution of data, a trade may include the exchange of securities in multiple geographical regions such as London, New York and Tokyo –Fail-over capability. Ability to use another location’s database if the local one goes out of action
Getting There To achieve these, as a complete solution is a great challenge and inevitably will have to cover many aspects from application servers to shared file systems and complete back-end solutions. Setting up redundant Web servers and application servers is not a trivial task, but generally only involves copying the applications and network set-up. However, setting up more than one third-tier server involves ensuring that each server has a copy of the data needed for the applications. This means having a truly global data (i.e. information) repository. That is a system capable of providing the ability to connect into separate independent systems (most probably in different geographical locations), to update the data on any system concurrently, and to have an underlying technology that keeps all this data in sync.
Getting there There are very few vendors on the database side that can provide the mission critical tools to make this happen. Sybase can provide this functionality by means of a very high performance Data Server, its state of the art replication server and tools like OpenSwitch. So you will have a scenario in which your trading systems in London, New York and Tokyo will share common data by means of local databases continuously exchanging data through Sybase replication (peer-to-peer). All these systems will be totally fault tolerant. That is the loss of one of these locations will only impact that location. The users in other locations will carry on as usual. Those users affected will be automatically transferred to fail-over location by means of OpenSwitch.
Getting there Setting up redundant Web servers and application servers is not a trivial task, but generally only involves copying the applications and network set-up – it doesn’t involve copying data. However, setting up more than one third-tier server involves ensuring that each server has a copy of the data needed for the applications. Each copy of the data must also be consistent, correct and current.
Getting there Let us look at a Typical set-up to enable peer-to-peer data replication among a global e-commerce/trading systems serving Europe, North America and Far East. In this scenario London is the European hub, New York is the North American hub and Tokyo serves as the Far Eastern one These three hubs act as fail-over for each other. OpenSwitch will automatically detect failures and re- directs clients to the fail-over data server
Peer-to-Peer Data Replication Set-up
What we aim at Distributed Processing High Availability Reliability Scalability
What we gain Distributed Processing –Reduced load on the network –Exchange only deltas –Much faster response if latency (as opposed to bandwidth) is the bottleneck for your application –Multiple Data servers can exist in one geographical location
What we gain High Availability –Allows you to avoid the loss of service by reducing or managing failures –Planned downtime for a hub (hardware upgrade, software changes) with service still provided to business –Practically tested and proven in Deutsche Bank
What we gain Reliability –Systems constructed this way proven to be very reliable with Sybase Replication Server performing very well with facilities to log and resolve failures –Because Sybase’s Replication server operates at the logical level by turning transactions into SQL operations, the risk of hardware corruption is entirely avoided.
What we gain Reliability Continued.. –That is if one of your nodes has a hardware corruption undetected, your other nodes will not be infected and more importantly you will be promptly alerted by the Replication server!
What we gain Scalability –You can scale up and down depending on your requirements and user base –Create specific databases for reporting etc in the same node. –Only replicate sub set of your data for a specific business. –Divide and conquer!