Backup policies Or - how not to get annoyed when you accidentally delete stuff. Warning - this does get a little technical
This is the bottom line The EGTDC cannot be held responsible for loss of data on Bio-Linux machines, either from user error, hardware failure or software failure. First of all users are responsible for their own data. The nominated systems administrator is responsible for ensuring that users back up important data to another location to preserve data integrity.
Looking after your /home If it fits on a floppy - put it on a floppy!! If it doesn't fit on a floppy - put it on a CD! Keep two copies of files locally – just in case you accidentally delete one copy! We suggest that users on the system, as a first level of disaster recovery periodically burn all the data in their home directories onto a CD-R or CD- RW. Many institutions have networked drives where people can store data. Using 'tar' and 'gzip' to compress data in your home directory and moving it elsewhere is also an efficient backup system (ftp or scp).
Speaking to your IT staff If your IT department provides a networked backup solution, we strongly recommend that you integrate with them in order to provide a more regular system of backups. The EGTDC will assist, where possible, in integrating Bio-Linux into an existing backup framework.
Disaster recovery on Bio-Linux Bio-Linux does have a built in backup system, but this is intended for disaster recovery of data in the event of hard drive failure - and is not recommended as a system for recovering files accidentally deleted as it requires good familiarity with Unix in order to use. Bio-Linux machines have two hard drives. One (called /dev/hda) contains the operating system, and all data stored on the machine. The second (called /dev/hdb) contains backups of the primary hard drive. /dev/hdb has a single 'partition' and is mounted on /backups
Disaster recovery (2) Backups are controlled by a 'cron' job, and run daily. Cron is a system daemon (program that runs all the time and provides a service) that runs certain tasks at certain times (hour/day/week) The backup script can be found at /etc/cron.daily/backup.sh. Each Sunday a full backup of all files on /dev/hda is taken. Every other day of the week an incremental backup of files which have been changed on that day is done. The backup script uses 'dump' to store a compressed version of your primary hard drive (onto /dev/hdb1).
Disaster recovery (3) Backups have the following naming convention; hda[X].bak.[Y] X in this case is either 1, 2, 5 or 6 and refers to the partition number that has been backed up. the following partitions map to the following mountpoints: /dev/hda1=>/ /dev/hda2=>/var /dev/hda5=>/usr /dev/hda6=>/home /dev/hdb1=>/backups Y refers to the backup level where 0 = full and 9 = incremental
Disaster recovery (4) In order to restore files from these backups, you must use the Unix 'restore' command. This is not for the inexperienced, you must have a good knowledge of the system architechture and know how to repair the system once a file is recovered. You must also be logged in as root - using sudo will not work!
EGTDC and data collection As mentioned, the EGTDC is not responsible for loss of data, and is not a 'backup service' for your data. However part of the remit of the EGTDC is to act as a repository for data that is generated such as EST data, microaray data etc. This is to enable meta-analysis of data, and open up data generated to the EG community. If you are generating large volumes of data, and would like to speak to the EGTDC about solutions for data storage and analysis please do not hesitate to ask. Microarray storage already provided by GeneSprings GeNet – a MIAME compliant data repository.