Backup and Restore CPTE 433 John Beckett
Why Back Up? So you can restore later! SLA Restore Policy Backup Policy Backup Schedule
Toward a Backup Strategy Corporate guidelines define terminology and specs for data recovery SLA defines specs for specific site or app Policy documents implementation of the SLA Procedure says how policy will be implemented Schedule anchors the policy in real time
Why are we planning a Restore? Accidental file deletion Data corruption –Hardware or system software –Application error –Procedural error Disk failure Archive –Snapshot of system: legal or fiduciary reasons
Self-Service Restores? History: VAX, File Motel, NetApp Filer Great for document-oriented systems Lousy for database restores Uh…what if they overwrite the new one? JB’s take: Best to have two people involved in a restore, unless one is technical Store current state before restoring. Use non-rewriteable media for backups if you can
Restore Requests Should be logged Use a Web page to collect these, and log them in a database –Identify clearly who did the restore, when, and exactly what was restored –Your backup software should facilitate this – it should not be a manual process –User should receive clear notification when the process is complete
What Order For Full Restore? Right 1.Most recent incremental –Including complete directory 2.Get updates from incremental 3.Get older files from full backup Wrong 1.Most recent full backup 2.Most recent incremental What about files that were purged since the full backup? –Use the directory on the Incremental This is a good time to have your terminology standardized!
Where’s the Backup? On site –Vulnerable to employee error and misdeeds Off site or safe/vault –Not available when you need it –How do you get it in there in the first place? –Are you putting your most valuable backups in the safest place, or doing it “later?”
When Do You Back Up? Backups can take a lot of CPU time –Compression can sometimes be tuned to mitigate this, at the expense of more media –Consider the split-RAID and Zap-copy methods Schedule them at off-hours –Do you have an employee to load media? –Maybe you need to buffer the backup for later transfer (buffer drive is cheap)
The Storage Dilemma Disk space is increasing much faster than backup media space Use disk space to buffer backups so that you can do them during off-times Perhaps: Copy to a disk drive, then do your compression on a separate computer
Dangerous Ideas - 1 Differential Backups –Increases the number of tapes that must be good for you to survive. Leaning on RAID –RAID is not a backup solution, it is an uptime improver. –If the RAID hardware fails, you’re still dead. So buy good RAID hardware and drives! Multiple drives on a single chain is not a good idea. Back up anyhow
Dangerous Ideas - 2 Closed Loop –Remember to cycle tapes out of your backup group regularly. It’s a way to avoid wearing them out. It’s a way to give you a backup in case all your rotated tapes are bad (it happens). Do it fairly often.
Dangerous Ideas - 3 Capitalizing tapes –Tapes are expense items, not capital investments because they wear out. –The diabolical truth about many IT items is that their longevity falls between the accounting definitions of “expense” and “capital”.
Dangerous Ideas - 4 “The backup went OK” –Did you read it back to see if it has internal integrity? –Did you restore a file to see if it came back? –Idea: Use a cron job (or translate into Gates-ese) to grab (wget) the home page for cnn.com daily. Make sure it goes in a directory near the end of the tape. That way you can check any backup easily to see if you can restore from it.
Dangerous Ideas - 5 You don’t need to consider network bandwidth in designing your backup scheme. –At some point, tape drives and computers will be fast enough to move the bottleneck to your network – at which point making things better may get a lot more expensive –One answer: A separate high-speed network between disk farm and backup device
Dangerous Ideas - 6 All backups should be kept off-site. –Will the most important (recent) backups be kept there? –Can you get them when you need them? –On the other hand, a professional off-site company will provide clear records of tape transactions which will help prevent fraud.
Dangerous Ideas - 7 The drive needs to be replaced –This happened with DDS drives. The system mfgr bought them from another company (Sony) in this case, and had no idea how to make them work or control quality. All failures were serviced by replacing drives. Failures occurred often as a result of this lack of understanding. –For a year, this was the single greatest contributor to downtime in our system.
Does It Have To Be Tape? Disk drives are faster, probably cheaper, and more reliable. The only disadvantage of disk drives is that the media are not removable. Perhaps back up to a disk, then copy that disk to tape. Also: If your data fits on a DVD or CD, that is probably a better solution anyhow. –But allow room for data expansion.
Transaction Backups Many database systems log transactions to a file, which can be located in a safe place (like a separate drive – perhaps located in a different building). These transaction logs are useful for: –Rebuilding the database to the moment of failure. –Analyzing transaction patterns when tuning for performance or function.
Expiry Dates Automatic tape libraries should label tapes according to their retention cycle, and automatically handle re-use. The same function should “kick out” tapes that are considered worn out.
The Backup Process Quiesce Stop transactions? –Put transactions into a buffer? –Do transactions matter? Separate –May be combined with next step Save Un-quiesce Verify
What Are You Saving? Simply the files A disk image
How Do You Restore Can it be a disk image? –Faster Can it be a single file? –If not, do you have to take the entire system offline just to pull back a single file? Ideally: Either