Download presentation
Presentation is loading. Please wait.
Published byMarian Fox Modified over 9 years ago
1
Storage Management W.lilakiatsakun
2
Storage Technology JBOD (Just Bunch Of Disk) JBOD (Just Bunch Of Disk) RAID (Redundant arrays of inexpensive disks) RAID (Redundant arrays of inexpensive disks) ESS (Enterprise Storage System) ESS (Enterprise Storage System) SSA (Serial Storage Architecture) SSA (Serial Storage Architecture)
3
JBOD (Just Bunch Of Disk) (1)
4
JBOD (Just Bunch Of Disk) (2) Depending on the Host Bus Adapter a JBOD can be used as individual disks or any RAID configuration supported by the HBA. Depending on the Host Bus Adapter a JBOD can be used as individual disks or any RAID configuration supported by the HBA.Host Bus AdapterHBAHost Bus AdapterHBA Concatenation (SPAN) Concatenation (SPAN) Concatenation or Spanning of disks is not one of the numbered RAID levels, but it is a popular method for combining multiple physical disk drives into a single virtual disk. Concatenation or Spanning of disks is not one of the numbered RAID levels, but it is a popular method for combining multiple physical disk drives into a single virtual disk. It provides no data redundancy. As the name implies, disks are merely concatenated together, end to beginning, so they appear to be a single large disk. It provides no data redundancy. As the name implies, disks are merely concatenated together, end to beginning, so they appear to be a single large disk.concatenated
5
JBOD (Just Bunch Of Disk) (3) it consists of an array of independent disks, it can be thought of as a distant relative of RAID. Concatenation is sometimes used to turn several odd-sized drives into one larger useful drive, which cannot be done with RAID 0. it consists of an array of independent disks, it can be thought of as a distant relative of RAID. Concatenation is sometimes used to turn several odd-sized drives into one larger useful drive, which cannot be done with RAID 0. For example, JBOD (Just a Bunch Of Disks) could combine 3 GB, 15 GB, 5.5 GB, and 12 GB drives into a logical drive at 35.5 GB, which is often more useful than the individual drives separately. For example, JBOD (Just a Bunch Of Disks) could combine 3 GB, 15 GB, 5.5 GB, and 12 GB drives into a logical drive at 35.5 GB, which is often more useful than the individual drives separately.
6
Redundant arrays of inexpensive disks (RAID) The organization distributes the data across multiple The organization distributes the data across multiple smaller disks, offering protection from a crash that could wipe out all data on a single, shared disk. smaller disks, offering protection from a crash that could wipe out all data on a single, shared disk. Benefits of RAID include the following Benefits of RAID include the following –Increased storage capacity per logical disk volume –High data transfer or I/O rates that improve information throughput –Lower cost per megabyte of storage
7
RAID0 (stripe set or striped volume) RAID Level 0 splits data evenly across two or more disks (striped) with no parity information for redundancy. RAID Level 0 splits data evenly across two or more disks (striped) with no parity information for redundancy. It is important to note that RAID 0 provides zero data redundancy. It is important to note that RAID 0 provides zero data redundancy. RAID 0 is normally used to increase performance RAID 0 is normally used to increase performance A RAID0 can be created with disks of differing sizes, but the storage space added to the array by each disk is limited to the size of the smallest disk A RAID0 can be created with disks of differing sizes, but the storage space added to the array by each disk is limited to the size of the smallest disk
8
RAID0 – Summary (1) RAID 0 uses a very simple design and is easy to implement with a HUGE performance advantage. RAID 0 uses a very simple design and is easy to implement with a HUGE performance advantage. I/O performance is greatly improved by spreading the I/O load across many channels and drives while the best performance is achieved when data is striped across multiple controllers with only one drive per controller. I/O performance is greatly improved by spreading the I/O load across many channels and drives while the best performance is achieved when data is striped across multiple controllers with only one drive per controller.
9
RAID0 – Summary (2) No parity calculation overhead is involved No parity calculation overhead is involved Not a "True" RAID because it is NOT fault- tolerant. The failure of just one drive will result in all data in an array being lost. Not a "True" RAID because it is NOT fault- tolerant. The failure of just one drive will result in all data in an array being lost.
10
RAID1 (mirrorring) A RAID 1 creates an exact copy of a set of data on two or more disks. A RAID 1 creates an exact copy of a set of data on two or more disks. This is useful when read performance or reliability are more important than data storage capacity. This is useful when read performance or reliability are more important than data storage capacity. Such an array can only be as big as the smallest member disk. Such an array can only be as big as the smallest member disk. A classic RAID 1 mirrored pair contains two disks which increases reliability A classic RAID 1 mirrored pair contains two disks which increases reliability
11
RAID1 – Summary (1) RAID Level 1 requires a minimum of 2 drives to implement. RAID Level 1 requires a minimum of 2 drives to implement. For highest performance, the controller must be able to perform two concurrent separate Reads per mirrored pair or two duplicate Writes per mirrored pair. For highest performance, the controller must be able to perform two concurrent separate Reads per mirrored pair or two duplicate Writes per mirrored pair. 100 redundancy of data means no rebuild is necessary in case of a disk failure, just a copy to the replacement disk. 100 redundancy of data means no rebuild is necessary in case of a disk failure, just a copy to the replacement disk. Transfer rate per block is equal to that of a single disk. Transfer rate per block is equal to that of a single disk. Simplest RAID storage subsystem design. Simplest RAID storage subsystem design.
12
RAID1 – Summary (2) Highest disk overhead of all RAID types - inefficient due to the duplication of Write tasks. Highest disk overhead of all RAID types - inefficient due to the duplication of Write tasks. Typically the RAID function is done by system software, loading the CPU/Server and possibly degrading throughput at high activity levels. Typically the RAID function is done by system software, loading the CPU/Server and possibly degrading throughput at high activity levels. Hardware implementation is strongly recommended. Hardware implementation is strongly recommended. –May not support hot swap of failed disk when implemented in "software".
13
RAID 0 +1 (A Mirror of Stripes) RAID Level 0+1 is implemented as a mirrored array whose segments are RAID 0 arrays. RAID Level 0+1 is implemented as a mirrored array whose segments are RAID 0 arrays. RAID Level 0+1 requires a minimum of 4 drives to implement RAID Level 0+1 requires a minimum of 4 drives to implement
14
RAID 0 +1 – Summary (1) RAID 0+1 provides high data transfer performance. RAID 0+1 provides high data transfer performance. It also has the same fault tolerance as RAID level 5. It also has the same fault tolerance as RAID level 5. RAID 0+1 has the same overhead for fault-tolerance as mirroring alone. RAID 0+1 has the same overhead for fault-tolerance as mirroring alone. The high I/O rates are achieved thanks to multiple stripe segments. The high I/O rates are achieved thanks to multiple stripe segments. RAID 0+1 provides excellent solution for sites that need high performance but are not concerned with achieving maximum reliability. RAID 0+1 provides excellent solution for sites that need high performance but are not concerned with achieving maximum reliability.
15
RAID 0 +1 – Summary (2) A single drive failure will cause the whole array to become a RAID Level 0 array. A single drive failure will cause the whole array to become a RAID Level 0 array. It has a high overhead and is very expensive. It has a high overhead and is very expensive. All the drives must move in parallel to proper track, thereby lowering sustained performance. All the drives must move in parallel to proper track, thereby lowering sustained performance. It has very limited scalability at a very high inherent cost. It has very limited scalability at a very high inherent cost.
16
RAID 10 (A Stripe of Mirrors) RAID 10 is implemented as a striped array whose segments are RAID 1 arrays. RAID 10 is implemented as a striped array whose segments are RAID 1 arrays. RAID Level 10 requires a minimum of 4 drives to implement. RAID Level 10 requires a minimum of 4 drives to implement.
17
RAID 10 – Summary (1) RAID 10 has as the same fault tolerance as RAID level 1 and can achieve the same high I/O rates. RAID 10 has as the same fault tolerance as RAID level 1 and can achieve the same high I/O rates. It has the same overhead for fault-tolerance as mirroring alone. It has the same overhead for fault-tolerance as mirroring alone. It provides an excellent solution for sites that would have otherwise gone with RAID 1 but need some additional performance boost. It provides an excellent solution for sites that would have otherwise gone with RAID 1 but need some additional performance boost. Very expensive with a high overhead. Very expensive with a high overhead. All drives must move in parallel to proper track lowering sustained performance. All drives must move in parallel to proper track lowering sustained performance. Plus it has a very limited scalability at a very high inherently cost. Plus it has a very limited scalability at a very high inherently cost.
18
RAID3 (Parallel access with a dedicated parity disk) RAID Level 3uses byte-level striping with a dedicated parity disk. RAID Level 3uses byte-level striping with a dedicated parity disk. This comes about because any single block of data will be spread across all members of the set and will reside in the same location. This comes about because any single block of data will be spread across all members of the set and will reside in the same location. So, any I/O operation requires activity on every disk. So, any I/O operation requires activity on every disk.
19
RAID3 – Summary (1) Level 3 only requires one dedicated disk in the array to hold parity information. Level 3 only requires one dedicated disk in the array to hold parity information. The server's data is then striped across the remaining drives, usually one byte at a time. The server's data is then striped across the remaining drives, usually one byte at a time. The parity drive then keeps track of all the info on the striped drive(s) and uses it to restore info if the drive should fail. The parity drive then keeps track of all the info on the striped drive(s) and uses it to restore info if the drive should fail. Because of the parity information that is stored and because Write operations take place on a byte level, Read/Write operations often take longer than other RAID configurations. Because of the parity information that is stored and because Write operations take place on a byte level, Read/Write operations often take longer than other RAID configurations.
20
RAID3 – Summary (2) RAID Level 3 requires a minimum of 3 drives to implement. RAID Level 3 requires a minimum of 3 drives to implement. Very high Read data transfer rate. Very high Read data transfer rate. Very high Write data transfer rate. Very high Write data transfer rate. Disk failure has an insignificant impact on throughput. Disk failure has an insignificant impact on throughput. Low ratio of ECC (Parity) disks to data disks means high efficiency. Low ratio of ECC (Parity) disks to data disks means high efficiency.
21
RAID3 – Summary (3) Transaction rate equal to that of a single disk drive at best (if spindles are synchronized). Transaction rate equal to that of a single disk drive at best (if spindles are synchronized). Controller design is fairly complex. Controller design is fairly complex. Very difficult and resource intensive to do as a "software" RAID because of the parity generation and checking Very difficult and resource intensive to do as a "software" RAID because of the parity generation and checking
22
RAID5 (Independent access with distributed parity) A RAID 5 uses block-level striping with parity data distributed across all member disks. A RAID 5 uses block-level striping with parity data distributed across all member disks. A minimum of 3 disks is generally required for a complete RAID 5 configuration. A minimum of 3 disks is generally required for a complete RAID 5 configuration. In the example, a read request for block "A1" would be serviced by disk 0. In the example, a read request for block "A1" would be serviced by disk 0. A simultaneous read request for block B1 would have to wait, but a read request for B2 could be serviced concurrently by disk 1 A simultaneous read request for block B1 would have to wait, but a read request for B2 could be serviced concurrently by disk 1
23
RAID 5 – Summary (1) Level 5 also relies on parity information to provide redundancy and fault tolerance using independent data disks with distributed parity blocks. Level 5 also relies on parity information to provide redundancy and fault tolerance using independent data disks with distributed parity blocks. Each entire data block is written onto a data disk; parity for blocks in the same rank is generated on Writes, recorded in a distributed location and checked on Reads. Compared to RAID 3, RAID 5 uses striping to spread parity information across multiple drives. Each entire data block is written onto a data disk; parity for blocks in the same rank is generated on Writes, recorded in a distributed location and checked on Reads. Compared to RAID 3, RAID 5 uses striping to spread parity information across multiple drives. Requirements: RAID Level 5 requires a minimum of 3 drives to implement. Requirements: RAID Level 5 requires a minimum of 3 drives to implement.
24
RAID 5 – Summary (2) It has the highest Read data transaction rate and with a medium Write data transaction rate. It has the highest Read data transaction rate and with a medium Write data transaction rate. A low ratio of ECC (Parity) disks to data disks means high efficiency along with a good aggregate transfer rate. A low ratio of ECC (Parity) disks to data disks means high efficiency along with a good aggregate transfer rate. Disk failure has a medium impact on throughput. Disk failure has a medium impact on throughput. It also has the most complex controller design. It also has the most complex controller design. It's often difficult to rebuild in the event of a disk failure (as compared to RAID level 1) and individual block data transfer rate same as single disk. It's often difficult to rebuild in the event of a disk failure (as compared to RAID level 1) and individual block data transfer rate same as single disk.
25
SSA (Serial Storage Architecture) (1) Serial Storage Architecture (SSA) defines a high- performance serial link for the attachment of input/output devices. Serial Storage Architecture (SSA) defines a high- performance serial link for the attachment of input/output devices. It has been optimized for storage applications such as hard disk drives, host adapter cards, and array controllers. It has been optimized for storage applications such as hard disk drives, host adapter cards, and array controllers. SSA has many advantages over existing parallel interfaces such as the Small Computer Systems Interface (SCSI-2). SSA has many advantages over existing parallel interfaces such as the Small Computer Systems Interface (SCSI-2). It uses compact cables and connectors, and it has better performance, connectivity, and reliability. It uses compact cables and connectors, and it has better performance, connectivity, and reliability. However, to facilitate migration, SSA retains much of the SCSI-2 logical protocol. However, to facilitate migration, SSA retains much of the SCSI-2 logical protocol. Current SSA implementations such as the IBM 7133 Current SSA implementations such as the IBM 7133
26
SSA (Serial Storage Architecture) (2) Disk Subsystem provide a peak data rate of 20 MB/s in each direction. Disk Subsystem provide a peak data rate of 20 MB/s in each direction. However, a typical loop configuration with one host adapter can provide a total sustained bandwidth of up to 73 MB/s, and higher speeds are becoming available. However, a typical loop configuration with one host adapter can provide a total sustained bandwidth of up to 73 MB/s, and higher speeds are becoming available. The physical medium is usually a copper cable up to 20 meters long, but fiber optics can also be used for longer distances. The physical medium is usually a copper cable up to 20 meters long, but fiber optics can also be used for longer distances.
27
SSA (Serial Storage Architecture) (3)
28
SSA (Serial Storage Architecture) (4) Architecture overview Architecture overview SSA is defined in three layers: SSA is defined in three layers: SSA-PH1 defines the electrical specifications, cables, and connectors. SSA-PH1 defines the electrical specifications, cables, and connectors. SSA-TL1 is a general-purpose transport layer. It defines the transmission protocol, configuration, and error recovery. SSA-TL1 is a general-purpose transport layer. It defines the transmission protocol, configuration, and error recovery. SSA-S2P is a mapping of the SCSI-2 queuing model, command set, status, and sense bytes. SSA-S2P is a mapping of the SCSI-2 queuing model, command set, status, and sense bytes.
29
Storage Model
31
Storage Area Network The Storage Network Industry Association (SNIA) defines the SAN as a network whose primary purpose is the transfer of data between computer systems and storage elements. The Storage Network Industry Association (SNIA) defines the SAN as a network whose primary purpose is the transfer of data between computer systems and storage elements. A SAN consists of a communication infrastructure, which provides physical connections; and a management layer, which organizes the connections, storage elements, and computer systems so that data transfer is secure and robust. A SAN consists of a communication infrastructure, which provides physical connections; and a management layer, which organizes the connections, storage elements, and computer systems so that data transfer is secure and robust.
32
SAN ‘s definition A SAN is a specialized, high-speed network attaching servers and storage devices A SAN is a specialized, high-speed network attaching servers and storage devices It is sometimes referred to as “the network behind the servers.” It is sometimes referred to as “the network behind the servers.” A SAN introduces the flexibility of networking to enable one server or many heterogeneous servers to share a common storage utility, which may comprise many storage devices, including disk, tape, and optical storage. A SAN introduces the flexibility of networking to enable one server or many heterogeneous servers to share a common storage utility, which may comprise many storage devices, including disk, tape, and optical storage.
34
SAN Component SAN Connectivity SAN Connectivity –the connectivity of storage and server components typically using Fibre Channel (FC). SAN Storage SAN Storage –TAPE /RAID /JBOD (Just Bunch of Disk) /SSA (Serial Storage Architecture) SAN Server SAN Server –Windows /Unix /Linux and etc
36
Switched Fabric An infrastructure specially designed to handle storage communications called a fabric. An infrastructure specially designed to handle storage communications called a fabric.fabric A typical Fibre Channel SAN fabric is made up of a number of Fibre Channel switches. A typical Fibre Channel SAN fabric is made up of a number of Fibre Channel switches.Fibre Channel switchesFibre Channel switches Today, all major SAN equipment vendors also offer some form of Fibre Channel routing solution, and these bring substantial scalability benefits to the SAN architecture by allowing data to cross between different fabrics without merging them. Today, all major SAN equipment vendors also offer some form of Fibre Channel routing solution, and these bring substantial scalability benefits to the SAN architecture by allowing data to cross between different fabrics without merging them.
38
Fiber Channel protocol Fibre Channel is a layered protocol. It consists of 5 layers, namely: Fibre Channel is a layered protocol. It consists of 5 layers, namely: FC0 The physical layer, which includes cables, fiber optics, connectors, pinouts etc. FC0 The physical layer, which includes cables, fiber optics, connectors, pinouts etc. connectors, pinouts connectors, pinouts FC1 The data link layer, which implements the 8b/10b encoding and decoding of signals. FC1 The data link layer, which implements the 8b/10b encoding and decoding of signals.8b/10b FC2 The network layer, defined by the FC-PI-2 standard, consists of the core of Fibre Channel, and defines the main protocols. FC2 The network layer, defined by the FC-PI-2 standard, consists of the core of Fibre Channel, and defines the main protocols. FC3 The common services layer, a thin layer that could eventually implement functions like encryption or RAID. FC3 The common services layer, a thin layer that could eventually implement functions like encryption or RAID. FC4 The Protocol Mapping layer. Layer in which other protocols, such as SCSI, are encapsulated into an information unit for delivery to FC2. FC4 The Protocol Mapping layer. Layer in which other protocols, such as SCSI, are encapsulated into an information unit for delivery to FC2.
39
IP Storage Networking FCIP (Fiber Channel over IP) FCIP (Fiber Channel over IP) –It is a method for allowing the transmission of Fibre Channel information to be tunneled through the IP network. iFCP (Internet Fiber Channel Protocol) iFCP (Internet Fiber Channel Protocol) –It is a mechanism for transmitting data to and from Fibre Channel storage devices in a SAN, or on the Internet using TCP/IP Internet SCSI (iSCSI) Internet SCSI (iSCSI) –It is a transport protocol that carries SCSI commands from an initiator to a target.
40
FCIP (Fiber Channel over IP) FCIP encapsulates FC frames within TCP/IP, allowing islands of FC SANs to be interconnected over an IP- based network FCIP encapsulates FC frames within TCP/IP, allowing islands of FC SANs to be interconnected over an IP- based network TCP/IP is used as the underlying transport to provide congestion control and in-order delivery FC Frames TCP/IP is used as the underlying transport to provide congestion control and in-order delivery FC Frames All classes of FC frames are treated the same as datagrams All classes of FC frames are treated the same as datagrams End-station addressing, address resolution, message routing, and other elements of the FC network architecture remain unchanged End-station addressing, address resolution, message routing, and other elements of the FC network architecture remain unchanged
42
iFCP iFCP is a gateway-to-gateway protocol for implementing a fibre channel fabric over a TCP/IP iFCP is a gateway-to-gateway protocol for implementing a fibre channel fabric over a TCP/IP Traffic between fibre channel devices is routed and switched by TCP/IP network Traffic between fibre channel devices is routed and switched by TCP/IP network The iFCP layer maps Fibre Channel frames to a predetermined TCP connection for transport The iFCP layer maps Fibre Channel frames to a predetermined TCP connection for transport FC messaging and routing services are terminated at the gateways so the fabrics are not merged to one another FC messaging and routing services are terminated at the gateways so the fabrics are not merged to one another
44
iSCSI iSCSI is a SCSI transport protocol for mapping of block-oriented storage data over TCP/IP networks iSCSI is a SCSI transport protocol for mapping of block-oriented storage data over TCP/IP networks The iSCSI protocol enables universal access to storage devices and Storage Area Networks (SANs) over standard TCP/IP networks The iSCSI protocol enables universal access to storage devices and Storage Area Networks (SANs) over standard TCP/IP networks
49
Storage Management Monitoring disk use Monitoring disk use –Disk monitor agent scans the server volumes to collect disk use information Hierarchical storage management Hierarchical storage management –Files will be archived according to certain criteria Prevention against Data Loss Prevention against Data Loss –To protect and recovery from loss Outsourcing storage management Outsourcing storage management
50
Monitoring disk use One or more the following categories of information can be collected One or more the following categories of information can be collected –Volumes: Date and time data was collected, server name, volumes scanned, capacity, total space used and available –Directories: Date and time data was collected, server volume and directory names, creation date and time, file count directory size (in bytes), owner name, groups to which owner is a member –Directory and file owners: Date and time data was collected, server and volume names, groups to which owner is a member, total number of files, total space used
51
Hierarchical storage management When disk space becomes exhausted, data files need to be backup (as archived file or back up tape) When disk space becomes exhausted, data files need to be backup (as archived file or back up tape) With the right tools, user are assured of having enough disk space to accommodate new files With the right tools, user are assured of having enough disk space to accommodate new files When a file system reaches a predefined threshold of X percent full, When a file system reaches a predefined threshold of X percent full, –automated procedure are initiated that determine which files are eligible for archive and are currently backed up –The file catalog is then updated to indicate that files have been archived and deletes them from the disk file system
52
Prevention against data loss (1/2) Backups sent off-site in regular intervals Backups sent off-site in regular intervals –Includes software as well as all data information, to facilitate recovery Create an insurance copy on Microfilm or similar and store the records off-site. Create an insurance copy on Microfilm or similar and store the records off-site. –Use a Remote backup facility if possible to minimize data loss Storage Area Networks (SANs) over multiple sites make data immediately available without the need to recover or synchronize it Storage Area Networks (SANs) over multiple sites make data immediately available without the need to recover or synchronize it
53
Prevention against data loss (2/2) Surge Protectors — to minimize the effect of power surges on delicate electronic equipment Surge Protectors — to minimize the effect of power surges on delicate electronic equipment Uninterruptible Power Supply (UPS) and/or Backup Generator Uninterruptible Power Supply (UPS) and/or Backup Generator Fire Preventions — more alarms, accessible extinguishers Fire Preventions — more alarms, accessible extinguishers Anti-virus software and other security measures Anti-virus software and other security measures
54
Techniques and technology Mirroring Mirroring –Disk mirroring : Redundant arrays of inexpensive disks 1 (RAID1) –Server mirroring: web / ftp /email RAID : RAID0 – 6 and combination RAID : RAID0 – 6 and combination On-site data storage On-site data storage –Back up - Tape / optical disk Off-site data storage (backup-site) Off-site data storage (backup-site) –Cold sites –Warm sites –Hot site
55
Mirroring Mirroring can occur locally or remotely. Mirroring can occur locally or remotely. –Locally means that a server has a second hard drive that stores data. –A remote mirror means that a remote server contains an exact duplicate of the data. The second drive is called a mirrored drive. Data is written to the original drive when a write request is issued and then copied to the mirrored drive, providing a mirror image of the primary drive. Data is written to the original drive when a write request is issued and then copied to the mirrored drive, providing a mirror image of the primary drive. If one of the hard drives fails, all data is protected from loss. If one of the hard drives fails, all data is protected from loss.
56
Disk mirroring (RAID1) The replication of logical disk volumes onto separate physical hard disks in real time to ensure continuous availability, currency and accuracy. The replication of logical disk volumes onto separate physical hard disks in real time to ensure continuous availability, currency and accuracy. A mirrored volume is a complete logical representation of separate volume copies A mirrored volume is a complete logical representation of separate volume copies
57
Server mirroring Mirror sites are most commonly used to provide multiple sources of the same information, and are of particular value as a way of providing reliable access to large downloads. Mirror sites are most commonly used to provide multiple sources of the same information, and are of particular value as a way of providing reliable access to large downloads. Web server Web server –To preserve a website or page, especially when it is closed or is about to be closed –Load balancing Email server Email server –To protect loss of email information ftp server ftp server –To allow faster downloads for users at a specific geographical location –Load balancing
58
Back up site A backup site is a location where a business can easily relocate following a disaster, such as fire, flood, or terrorist threat. This is an integral part of the disaster recovery plan of a business. A backup site is a location where a business can easily relocate following a disaster, such as fire, flood, or terrorist threat. This is an integral part of the disaster recovery plan of a business.fire floodterroristdisaster recovery planbusinessfire floodterroristdisaster recovery planbusiness A backup site can be another location operated by the business, or contracted via a company that specializes in disaster recovery services. A backup site can be another location operated by the business, or contracted via a company that specializes in disaster recovery services. In some cases, a business will have an agreement with a second business to operate a joint disaster recovery facility. In some cases, a business will have an agreement with a second business to operate a joint disaster recovery facility.
59
Cold Sites A cold site is the most inexpensive type of backup site for a business to operate. A cold site is the most inexpensive type of backup site for a business to operate. It provides office spaces to operate It provides office spaces to operate It does not include backed up copies of data and information from the original location of the business, nor does it include hardware already set up. It does not include backed up copies of data and information from the original location of the business, nor does it include hardware already set up. The lack of hardware contributes to the minimal startup costs of the cold site, but requires additional time following the disaster to have the operation running at a capacity close to that prior to the disaster. The lack of hardware contributes to the minimal startup costs of the cold site, but requires additional time following the disaster to have the operation running at a capacity close to that prior to the disaster.
60
Warm Sites A warm site is a location where the business can relocate to after the disaster that is already stocked with computer hardware similar to that of the original site, but does not contain backed up copies of data and information. A warm site is a location where the business can relocate to after the disaster that is already stocked with computer hardware similar to that of the original site, but does not contain backed up copies of data and information.
61
Hot Sites A hot site is a duplicate of the original site of the business, with full computer systems as well as near- complete backups of user data. A hot site is a duplicate of the original site of the business, with full computer systems as well as near- complete backups of user data. Ideally, a hot site will be up and running within a matter of hours. This type of backup site is the most expensive to operate. Ideally, a hot site will be up and running within a matter of hours. This type of backup site is the most expensive to operate. Hot sites are popular with stock exchanges and other financial institutions who may need to evacuate due to potential bomb threats and must resume normal operations as soon as possible. Hot sites are popular with stock exchanges and other financial institutions who may need to evacuate due to potential bomb threats and must resume normal operations as soon as possible.stock exchanges financial institutionsbomb threatsstock exchanges financial institutionsbomb threats
62
How to choose Choosing the type is mainly decided by a company's cost vs. benefit strategy. Choosing the type is mainly decided by a company's cost vs. benefit strategy. Hot sites are traditionally more expensive than cold sites since much of the equipment the company needs has already been purchased and thus the operational costs are higher. Hot sites are traditionally more expensive than cold sites since much of the equipment the company needs has already been purchased and thus the operational costs are higher. However if the same company loses a substantial amount of revenue for each day they are inactive then it may be worth the cost. However if the same company loses a substantial amount of revenue for each day they are inactive then it may be worth the cost.
63
The advantages of a cold site are simple--cost. It requires much fewer resources to operate a cold site because no equipment has been bought prior to the disaster. The advantages of a cold site are simple--cost. It requires much fewer resources to operate a cold site because no equipment has been bought prior to the disaster. The downside with a cold site is the potential cost that must be incurred in order to make the cold site effective. The downside with a cold site is the potential cost that must be incurred in order to make the cold site effective. The costs of purchasing equipment on very short notice may be higher and the disaster may make the equipment difficult to obtain. The costs of purchasing equipment on very short notice may be higher and the disaster may make the equipment difficult to obtain.
64
Disaster Recovery Planning (DRP) W.lilakiatsakun
65
DRP is the process of regaining access to the data, hardware and software necessary to resume critical business operations after a natural or human-induced disaster. DRP is the process of regaining access to the data, hardware and software necessary to resume critical business operations after a natural or human-induced disaster. naturalhuman-induceddisaster naturalhuman-induceddisaster DRP is part of a larger process known as business continuity planning (BCP). DRP is part of a larger process known as business continuity planning (BCP). business continuity planning business continuity planning Disaster recovery is the process by which you resume business after a disruptive event. Disaster recovery is the process by which you resume business after a disruptive event.
66
What is the difference DRP and BCP (1/2) The event might be The event might be –something huge-like an earthquake or the terrorist attacks on the World Trade Center –something small, like malfunctioning software caused by a computer virus. Many business executives are prone to ignoring "disaster recovery" because disaster seems an unlikely event. Many business executives are prone to ignoring "disaster recovery" because disaster seems an unlikely event.
67
What is the difference DRP and BCP (2/2) "Business continuity planning" suggests a more comprehensive approach to making sure you can keep making money. "Business continuity planning" suggests a more comprehensive approach to making sure you can keep making money. Often, the two terms are married under the acronym BC/DR. Often, the two terms are married under the acronym BC/DR. DR and/or BC determines how a company will keep functioning after a disruptive event until its normal facilities are restored. DR and/or BC determines how a company will keep functioning after a disruptive event until its normal facilities are restored.
68
What do these plans include (1/2) All BC/DR plans need to encompass All BC/DR plans need to encompass –How employees will communicate –Where they will go –How they will keep doing their jobs. The details can vary greatly, depending on the size and scope of a company and the way it does business. The details can vary greatly, depending on the size and scope of a company and the way it does business.
69
What do these plans include (2/2) For example :The plan at one global manufacturing company For example :The plan at one global manufacturing company –restore critical mainframes with vital data at a backup site within four to six days of a disruptive event, –obtain a mobile PBX unit with 3000 telephones within two days –recover the company's 1000-plus LANs in order of business need – set up a temporary call center for 100 agents at a nearby training facility.
70
Events that necessitate disaster recovery Natural disasters Natural disasters Fire Fire Power failure Power failure Terrorist attacks Terrorist attacks Organized or deliberate disruptions Organized or deliberate disruptions Theft Theft System and/or equipment failures System and/or equipment failures Human error Human error Computer viruses Computer viruses Testing Testing
71
Discovery Planning steps (1) I. Information Gathering I. Information Gathering Step One - Organize the Project Step One - Organize the Project –Appoint coordinator/project leader, if the leader is not the dean or chairperson. –Determine most appropriate plan organization for the unit (e.g., single plan at college level or individual plans at unit level) –Set project timetable –Draft project plan, including assignment of task responsibilities
72
Discovery Planning steps (2) Step Two – Conduct Business Impact Analysis Step Two – Conduct Business Impact Analysis In order to complete the business impact analysis, most units will perform the following steps: In order to complete the business impact analysis, most units will perform the following steps: –Identify functions, processes and systems –Interview information systems support personnel –Interview information systems support personnel –Interview business unit personnel –Interview business unit personnel –Analyze results to determine critical systems, applications and business processes –Prepare impact analysis of interruption on critical systems –Prepare impact analysis of interruption on critical systems impact analysis critical systems impact analysis critical systems
73
Discovery Planning steps (3) Step Three – Conduct Risk Assessment Step Three – Conduct Risk Assessment The risk assessment will assist in determining the probability of a critical system becoming severely disrupted and documenting the acceptability of these risks to a unit. The risk assessment will assist in determining the probability of a critical system becoming severely disrupted and documenting the acceptability of these risks to a unit. –Review physical security (e.g. secure office, building access off hours, etc.) –Review backup systems –Review data security
74
Discovery Planning steps (3/1) –Review policies on personnel termination and transfer –Review policies on personnel termination and transfer –Identify systems supporting mission critical functions –Identify vulnerabilities (Such as flood, tornado, physical attacks, etc.) –Identify vulnerabilities (Such as flood, tornado, physical attacks, etc.) vulnerabilities –Assess probability of system failure or disruption –Prepare risk and security analysis –Prepare risk and security analysis security analysis security analysis
75
Discovery Planning steps (4/1) Step Four - Develop Strategic Outline for Recovery Step Four - Develop Strategic Outline for Recovery 1 Assemble groups as appropriate for: 1 Assemble groups as appropriate for:groups groups –Hardware and operating systems –Communications –Applications –Facilities –Other critical functions and business processes as identified in the Business Impact Analysis
76
Discovery Planning steps (4/2) For each system/process above quantify the following processing requirements: For each system/process above quantify the following processing requirements: –Light, normal and heavy processing days –Light, normal and heavy processing days –Transaction volumes Dollar volume (if any) Dollar volume (if any) Estimated processing time Estimated processing time Allowable delay (days, hours, minutes, etc.) Allowable delay (days, hours, minutes, etc.)
77
Discovery Planning steps (4/3) 3 Detail all the steps in your workflow for each critical business function (e.g., for student payroll processing each step that must be complete and the order in which to complete them.) 4 Identify systems and applications –Component name and technical id (if any) –Type (online, batch process, script) –Frequency –Run time –Allowable delay (days, hours, minutes, etc.)
78
Discovery Planning steps (4/4) Identify vital records (e.g., libraries, processing schedules, procedures, research, advising records, etc.) Identify vital records (e.g., libraries, processing schedules, procedures, research, advising records, etc.) –Name and description – Type (e.g., backup, original, master, history, etc.) – Where are they stored –Source of item or record –Can the record be easily replaced from another source (e.g., reference materials)
79
Discovery Planning steps (4/5) –Backup Backup generation frequency Backup generation frequency Number of backup generations available onsite Number of backup generations available onsite Number of backup generations available off-site Number of backup generations available off-site Location of backups Location of backups Media type Media type Retention period Retention period Rotation cycle Rotation cycle Who is authorized to retrieve the backups? Who is authorized to retrieve the backups?
80
Discovery Planning steps (4/6) 6 Identify if a severe disruption occurred what would be the minimum requirements/replacement needs to perform the critical function during the disruption. –Type (e.g. server hardware, software, research materials, etc.) –Item name and description –Quantity required –Location of inventory, alternative, or offsite storage –Vendor/supplier
81
Discovery Planning steps (4/7) 7 Identify if alternate methods of processing either exist or could be developed, quantifying where possible, impact on processing. (Include manual processes.) 8 Identify person(s) who supports the system or application 9 Identify primary person to contact if system or application cannot function as normal 10 Identify secondary person to contact if system or application cannot function as normal
82
Discovery Planning steps (4/8) 11 Identify all vendors associated with the system or application 12 Document unit strategy during recovery (conceptually how will the unit function?) 13 Quantify resources required for recovery, by time frame (e.g., 1 pc per day, 3 people per hour, etc.) 14 Develop and document recovery strategy, including: –Priorities for recovering system/function components –Recovery schedule Form – critical system processing requirement for recovery critical system processing requirement for recoverycritical system processing requirement for recovery
83
Discovery Planning steps (5) Step Five – Review Onsite and Offsite Backup and Recovery Procedures Step Five – Review Onsite and Offsite Backup and Recovery ProceduresBackup and Recovery ProceduresBackup and Recovery Procedures The planning team as identified in Step 1 Task 3 would normally perform this task. The planning team as identified in Step 1 Task 3 would normally perform this task. Review current records (OS, Code, System Instructions, documented processes, etc.) requiring protection Review current records (OS, Code, System Instructions, documented processes, etc.) requiring protection Review current offsite storage facility or arrange for one Review current offsite storage facility or arrange for one Review backup and offsite storage policy or create one Review backup and offsite storage policy or create one Present to unit leader for approval Present to unit leader for approval
84
Discovery Planning steps (6) Step Six – Select Alternate Facility Step Six – Select Alternate Facility ALTERNATE SITE: A location, other than the normal facility, used to process data and/or conduct critical business functions in the event of a disaster. ALTERNATE SITE: A location, other than the normal facility, used to process data and/or conduct critical business functions in the event of a disaster. –Determine resource requirements –Assess platform uniqueness of unit systems (e.g., MacIntosh, IBM Compatible, Oracle database, Windows 3.1, etc.) –Identify alternative facilities –Review cost/benefit –Evaluate and make recommendation
85
Discovery Planning steps (7/1) II. Plan Development and Testing Step Seven – Develop Recovery Plan Step Seven – Develop Recovery Plan This step would ordinarily be completed by the coordinator/Project Manager working with the planning team. This step would ordinarily be completed by the coordinator/Project Manager working with the planning team. Sample Plan Outline Sample Plan OutlineSample Plan OutlineSample Plan Outline
86
Discovery Planning steps (7/2) 1 Objective 2 Plan Assumptions 3 Criteria for invoking the plan –Document emergency response procedures to occur during and after an emergency –Document procedures for assessment and declaring a state of emergency –Document notification procedures for alerting unit and university officials –Document notification procedures for alerting vendors –Document notification procedures for alerting unit staff and notifying of alternate work procedures or locations.
87
Discovery Planning steps (7/3) 4 Roles Responsibilities and Authority –Identify unit personnel –Recovery team description and charge –Recovery team staffing –Transportation schedules for media and teams
88
Discovery Planning steps (7/4) 5 Procedures for operating in contingency mode –Process descriptions –Minimum processing requirements –Determine categories for vital records –identify location of vital records –Identify forms requirements –Document critical forms –Establish equipment descriptions –Document equipment - in the recovery site –Document equipment - in the unit
89
Discovery Planning steps (7/4) – Software descriptions –Software used in recovery –Software used in production –Produce logical drawings of communication and data networks in the unit –Produce logical drawings of communication and data networks during recovery –Vendor list –Review vendor restrictions –Miscellaneous inventory –Communication needs - production –Communication needs - in the recovery site
90
Discovery Planning steps (7/5) 6 Resource plan for operating in contingency mode 7 Criteria for returning to normal operating mode 8 Procedures for returning to normal operating mode 9 Procedures for recovering lost or damaged data 10 Testing and Training –Document Testing Dates –Complete disaster/disruption scenarios –Develop action plans for each scenario Sample Testing Diagram Sample Testing Diagram Sample Testing Diagram Sample Testing Diagram
91
Discovery Planning steps (7/6) 11 Plan Maintenance –Document Maintenance Review Schedule (yearly, quarterly, etc.) – Maintenance Review action plans –Maintenance Review recovery teams –Maintenance Review team activities –Maintenance Review/revise tasks –Maintenance Review/revise documentation
92
Discovery Planning steps (7/7) 12 Appendices for Inclusion –inventory and report forms –maintenance forms –hardware lists and serial numbers –software lists and license numbers –contact list for vendors –contact list for staff with home and work numbers
93
Discovery Planning steps (7/8) –contact list for other interfacing departments – network schematic diagrams – equipment room floor grid diagrams – contract and maintenance agreements – special operating instructions for sensitive equipment – cellular telephone inventory and agreements
94
Discovery Planning steps (8) Step Eight - Test the Plan 1 Develop test strategy 2 Develop test plans 3 Conduct tests 4 Modify the plan as necessary Samples Samples Test Plan Strategy Test Plan StrategyTest Plan StrategyTest Plan Strategy Test Plan Scenario Test Plan ScenarioTest Plan ScenarioTest Plan Scenario Test Results/Test Evaluation Test Results/Test EvaluationTest Results/Test EvaluationTest Results/Test Evaluation
95
Discovery Planning steps (9) III. Ongoing Maintenance III. Ongoing Maintenance Step Nine - Maintain the Plan Step Nine - Maintain the Plan Dean/Director/Unit Administrator will be responsible for overseeing this. Dean/Director/Unit Administrator will be responsible for overseeing this. 1 Review changes in the environment, technology, and procedures 2 Develop maintenance triggers and procedures 3 Submit changes for systems development procedures 4 Modify unit change management procedures 5 Produce plan updates and distribute
96
Discovery Planning steps (10) Step Ten – Perform Periodic Audit 1 Establish periodic review and update procedures 1 Establish periodic review and update procedures
97
Important factors (1/3) Communication Communication –Personnel — notify all key personnel of the problem and assign them tasks focused toward the recovery plan. –Customers — notifying clients about the problem minimizes panic. Recall backups Recall backups –If backup tapes are taken offsite, these need to be recalled. If using remote backup services, a network connection to the remote backup location (or the Internet) will be required.
98
Important factors (2/3) Facilities Facilities –having backup hot sites or cold sites for larger companies. –Mobile recovery facilities are also available from many suppliers. Prepare your employees Prepare your employees –during a disaster, employees are required to work longer, more stressful hours, and a support system should be in place to alleviate some of the stress. –Prepare them ahead of time to ensure that work runs smoothly.
99
Important factors (3/3) Business information Business information –backups should be stored in a completely separate location from the company Testing the plan Testing the plan –provisions, directions, frequency for testing the plan should be stipulated.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.