Download presentation
Presentation is loading. Please wait.
Published byJasper Hood Modified over 8 years ago
1
Funded by GFBio – Education module Preserve Lesson in Data Preservation and how you can contribute
2
Preserve Preserving is more than a backup. It implies recurring activities. Performed by curators / data managers in data centres, archives etc. Challenge due to amount of data, variety and complexity of formats and types Aims at integrity of data through Accessibility: Data can be retrieved, displayed and used. Authenticity: Data have not been manipulated, substituted or faked. Longevity: Data reusable for long-term, independently of software and hardware decay.
3
Preserve Some definitions Backup: copy/copies of original file before the original is overwritten Archive: preservation of the file Preservation Includes archiving, backups and processes like data rescue, data reformatting, data conversion, metadata Ensures that datasets are in the best shape to be stored, discovered, accessed and reused.
4
Preserve Bit rot / Data rot Digital objects decay over time Errors reduce ability to be read accurately Accessibility and authenticity threatened Paper records: can last for centuries/millennia Digital data (bits/bytes): can deteriorate quickly Rate of deterioration varies depending on the storage medium used - magnetic, optical, etc
5
Your turn! Die Zeit Nr. 42, 10. Oct. 2013 Please estimate the lifespan of the media with green arrows!
6
Your turn! Die Zeit Nr. 42, 10. Oct. 2013 The resolution
7
Preserve Strategies to minimise bit rot Refreshment - move data files onto new storage media Replication - keep multiple copies of a file in different locations (to reduce risk of data loss)
8
Preserve Juliane Steckel29./30.08.2015 GFO Pre-Meeting Workshop DM GFBio Software obsolescence Redundant existing technologies New version of a software product unable to read files produced by superseded versions No alternative newer version capable of running on later operating systems
9
Preserve Preservation methods to reduce software obsolescence include: Migration: data file is converted to a newer software version or package Emulation: recreate the functionality of the obsolete software package on a new operating system Format conversion: pro-active select a neutral or non-proprietary format importable into a number of suitable software programs based on a universal standard made by Freepik from www.flaticon.com Freepik www.flaticon.com
10
Preserve Data rescue Older files: no usable format Finished projects or no longer funded – No responsible data manager – No usable formats – Locations not accessible
11
Preserve (Things you can do during your study in advance.) 1.Data Conversions and Formats a.Use non-proprietary, standard formats b.Convert text files from.doc or.xls to.txt, image files to.tiff or.pdf c.Check files after converting them, to avoid data, metadata, and formatting loss Type of dataRecommended file formats Avoid Tabular dataCSV, TSVExcel TextPlain text, HTML, RTF, txt Word Structured data XML, RDF ImageTiff, pdf
12
Preserve (Things you can do during your study in advance.) 2.Data Migration a.Check the requirements of your data center/archive b.Migrate your data in an open access file format if necessary c.Add preservation metadata and document all migration procedures
13
Preserve (Things you can do during your study in advance.) 2.Data Migration d.Assure authenticity of data to prevent information loss during migration process Check original and migrated bit stream Check the sums e.3… 2… 1… backup! at least 3 copies of a file on at least 2 different media with at least 1 off site
14
Preserve 3.Versioning a.Include version number at end of file name, e.g. v01 b.Change this number each time the file is saved c.For final version, substitute the word FINAL for the version number (especially important if files shared) (Things you can do during your study in advance.)
15
Preserve 3.Versioning d.Turn on versioning/ tracking in collaborative works or storage spaces e.g. Wikis, GoogleDocs, MyWebSpace e.Use versioning software e.g. ‘Apache Subversion ‘ to automatically track versions of computer code (Things you can do during your study in advance.)
16
Preserve 4.File Naming a.Use consistent, descriptive, concise names b.Rename default file names e.g. “image.jpg” or “archive.zip” c.Avoid special characters e.g. & * % $ £ ] { ! @ d.Use underscores ‘_’ instead of full-stops ’. ‘ or spaces ‘ ‘ (Things you can do during your study in advance.)
17
Preserve 4.File Naming e.Include descriptive information to assist identification, independent of where it is stored f.If including dates, format them consistently e.g. Year-Month-Day: YYYY-MM-DD to maintain chronological order of files g.Assume that ‘YIELD’, ‘Yield’ and ‘yield’ are the same h.Use file extensions (often defaults), e.g. ‘.xls’ or ‘.xlsx’ for Excel files, ‘.txt’ for text files, ‘.R’ for R-Scripts etc. (Things you can do during your study in advance.)
18
Your turn! What is the best filename? a)24 March 2006 Attachment b)240306attch c)2006-03-24_Attachment (1)2010-08-11_bioassay_toxicity_V1.sps (2)labtox_recent_110810_old version.sps (3)FFTX_3776438656_old.sps
19
Your turn! What is the best filename? a)24 March 2006 Attachment b)240306attch c)2006-03-24_Attachment (1)2010-08-11_bioassay_toxicity_V1.sps (2)labtox_recent_110810_old version.sps (3)FFTX_3776438656_old.sps
20
Useful links http://www.lib.cam.ac.uk/dataman/pages/preservation.html http://www.dcc.ac.uk/resources/how-guides/license-research-data http://arrow.monash.edu.au/vital/access/manager/Repository/monash:7533 http://datalib.edina.ac.uk/mantra/sharingpreservationandlicensing/ http://library.stanford.edu/research/data-management-services/data-best- practices/data-versioning http://library.stanford.edu/research/data-management-services/data-best- practices/data-versioning http://datalib.edina.ac.uk/mantra/organisingdata/ http://researchdata.wisc.edu/file-naming-and-versioning/
21
Further Education Modules are downloadable from: http://www.gfbio.org/education-modules Suggested citation: GFBio Education Module: Preserve - Lesson in Data Preservation and how you can contribute. GFBio. Retrieved Nov23, 2015. From http://www.gfbio.org/education-moduleshttp://www.gfbio.org/education-modules Copyright license information: GFBio Education Module: Publish - Lesson in Data Publishing. by GFBio is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. GFBio Education Module: Publish - Lesson in Data PublishingGFBioCreative Commons Attribution-NonCommercial 4.0 International License
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.