IMPROUVEMENT OF COMPUTER NETWORKS SECURITY BY USING FAULT TOLERANT CLUSTERS Prof. S ERB AUREL Ph. D. Prof. PATRICIU VICTOR-VALERIU Ph. D. Military Technical Academy Bucharest, Romania
FAULT TOLERANT SYSTEMS F A fault tolerant system is one that can continue to operate reliably by producing acceptable outputs in spite of occasional occurrences of component failures. F The basic principle of fault tolerant design is the use of redundancy. F A fault tolerant system can be viewed as a nested set of subsystems. F Fault tolerant architectures package redundant partitions into replaceable units.
CLUSTERS AND FAULT TOLERANT CLUSTERS F A cluster is a set of computers connected over a local network, that function as a single large multicomputer. The cluster software is a layer that runs on top of local operating systems running on each computer. F A fault tolerant cluster is a cluster with external storage devices connected to the nodes on a common input/output bus. Clients are connected over the networks to a server application that is executing on the nodes.
SINGLE POINTS OF FAILURE OF A CLUSTER F nodes in the cluster; F disks used to store application or data, adapters, controllers and cables used to connect the nodes to the disks; F the network backbones over which the users are accessing the cluster nodes and network adapters attached to each node; F power sources; F applications.
A SAMPLE CONFIGURATION FOR A FAULT TOLERANT CLUSTER
ELIMINATING NODES AS SINGLE POINTS OF FAILURE F When a node providing critical services in a cluster fails, another node in the cluster takes over its resources and provides the same services to the end user, in a process known as failover. F After the failover, clients can access the second node as easily as the first. F The process of failover is handled by special high availability software running at the top level cluster operating system.
ELIMINATING DISKS AS SINGLE POINTS OF FAILURE F Disks are physically connected to all nodes, so that applications and data are also accessible by another node in the event of failover. F There are two methods available for providing disk redundancy: –using disk arrays in a RAID configuration; –using software mirroring.
ELIMINATING NETWORKS AS SINGLE POINTS OF FAILURE F For eliminating network failure can be provided fully redundant LAN connections, and configured local switching of LAN interfaces. F For eliminating cable failures, can be configured redundant cabling and redundant LAN interface cards on each node. F For eliminating the loss of client connectivity, can be configured redundant routers or redundant hubs or switches through which clients can access the services of the cluster.
ELIMINATING POWER SOURCES AS SINGLE POINTS OF FAILURE F The use of multiple power circuits with different circuit breakers reduces the likelihood of a complete power outage. F An uninterruptible power supply provides standby in the event of an interruption to the power source. F Small local uninterruptible power supply can be used to protect individual system processor units and data disks.
ELIMINATING APPLICATIONS AND DATA AS SINGLE POINTS OF FAILURE F The cluster management software provides services like as failure detection, recovery, load balancing, and the ability to manage the servers as a single system. F If there is a node failure, the cluster reconfigures itself and the applications that were running on the failed node and data used by these applications are made available on another node. F Another approach is to provide different instances of the same application running on multiple nodes.
INTEROPERABILITY BETWEEN M&S AND C4ISR SYSTEMS F A key task for the M&S community is to link M&S systems with live or real C4ISR systems. F Within the C4ISR community there is a similar pressing need to link C4ISR equipments with simulations.
COMMON KEY CONCEPTS IN M&S SYSTEMS, C4ISR SYSTEMS, AND FAULT TOLERANT CLUSTERS F open and distributed systems; F networks; F high level operating systems; F segments, federates (federations) and packages; F hierarchical architecture; F commercial standards, specifications, and products F interoperability and reusability; F high availability systems.
OPEN AND DISTRIBUTED SYSTEMS F All modern systems used for modeling and simulation and C4ISR are open and distributed systems. F The architecture of all modern fault tolerant systems is that of a cluster, which is one of the best open and distributed systems.
NETWORKS F A fault tolerant cluster is a set of independent computers connected over a network, and always with external storage devices, containing applications and data, connected to the nodes on a common input/output bus. Clients are connected over the networks to a server application that is executing on the nodes. F The basic High Level Architecture protocol establishes that the communications path between any federates is over the network.
HIGH LEVEL OPERATING SYSTEMS F In a fault tolerant system the cluster software is a layer that runs on top of local operating systems running on each computer. F The high availability applications in the fault tolerant cluster run at the top level cluster software. F In the High Level Architecture the Runtime Infrastructure is a high level distributed operating system for the federation.
SEGMENTS, FEDERATES (FEDERATIONS) AND PACKAGES F The basic components of the High Level Architecture are the simulations themselves, or more generally, the federates. F In DII-COE-based systems, all software and data are packaged in self-contained units called segments. F By using the high-level cluster software, application services and all the resources needed to support the application can be putted together into special entities called application packages.
HIERARCHICAL ARCHITECTURE F All fault-tolerant clusters are partitioned at several levels, but in addition it contains redundant components and recovery mechanisms which may be employed in different ways at different levels. F Simulations that use the HLA are modular in nature allowing federates to join and resign from the federation as the simulation executes. F At top of any fault tolerant cluster, command and control, and High Level Architecture compliant system there is a distributed operating system that runs on top of local operating systems running on each computer or on top of federates and federations.
COMMERCIAL STANDARDS, SPECIFICATIONS, AND PRODUCTS F The commercial marketplace generally moves at a faster pace than the military marketplace F Using already built items lowers production costs F The probability of product enhancements is increased because the marketplace is larger F The probability of standardization is increased because a larger customer base drives it
INTEROPERABILITY AND REUSABILITY F The High Level Architecture can be seen as a “software bus” that allow applications and data to communicate with one another, regardless of who designed them, the platform they are running on, and the language they are written in. F The fault tolerant cluster can offer a good architecture for High Level Architecture to work with these federations or for applications running on C4ISR systems.
HIGH AVAILABILITY SYSTEMS F The military systems used in M&S and command and control must not succumb to different faults and must continue to operate reliably in spite of occasional occurrences of component failures. F High availability and security must be designed into the architecture. F Fault tolerance is the best guarantee that the system will be high available, and the essential services will be offered in real-time to the users of M&S systems or C4ISR systems.