Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Management for Geoinformatics A short course on good data management for taught postgraduate students in geoinformatics and related data sciences.

Similar presentations


Presentation on theme: "Data Management for Geoinformatics A short course on good data management for taught postgraduate students in geoinformatics and related data sciences."— Presentation transcript:

1 Data Management for Geoinformatics A short course on good data management for taught postgraduate students in geoinformatics and related data sciences. John Murtagh, UEL

2 Data Management

3 What is research data management?
Looking after data throughout the data lifecycle (from conception to destruction) Good documentation and record-keeping Transfer of responsibility after project ends Keeping safe and possibly confidential Access, preservation and re-use Destruction “It’s just good research”

4 Preparing your data The following slides are taken from the Research Data MANTRA online course by Data Library and EDINA, University of Edinburgh & is licensed under a Creative Commons Attribution 2.5 UK: Scotland License.

5 The benefits of consistent data file labelling:
Data files are distinguishable from each other within their containing folder Data file naming prevents confusion when multiple people are working on shared files Data files are easier to locate and browse Data files can be retrieved not only by the creator but by other users Research data files and folders need to be labelled and organised in a systematic way so that they are both identifiable and accessible for current and future users. File labelling

6 Data files can be sorted in logical sequence
Data files are not accidentally overwritten or deleted Different versions of data files can be identified If data files are moved to other storage platform their names will retain useful context

7 3. Consistency - choose a naming convention and ensure that the rules are followed systematically by always including the same information (such as date and time) in the same order (e.g. YYYYMMDD) 1. Organisation - important for future access and retrieval 2. Context - this could include content specific or descriptive information independent of where the data is stored There are three main criteria to consider regarding the naming and labelling of research data files, namely:

8 The following video is from a talk given by Dave Anderson from the National Oceanic and Atmospheric Administration's (NOAA) National Climatic Data Center at the Data Management workshop sponsored by the Earth Science Information Partners (ESIP).  It highlights some of the research data organisation issues such as proprietary formats, cryptic labelling and vague filenames.

9 Windows: Ant Renamer (http://www. antp
Windows: Ant Renamer ( RenameIT ( Mac: Renamer4Mac ( Name Changer ( Linux: GNOME Commander ( GPRename ( Unix The use of the grep command to search for regular expressions If you need to rename data file names in bulk there are a number of tools available. Here are some examples for different operating systems:

10 Backing up & storing your data

11 Data loss will happen to you
Dropping your laptop Hard drive failures are updates Obsolescence/upgrades Poorly described data (metadata) Theft of equipment People move on Research trends (follow the money consequences) Overwriting data/versioning File formats Media degradation (CDR’s, memory sticks, SSD’s) Slide from Data Management Planning and Storage for Psychology (DMSPpsych) The University of Sheffield 18/09/2018

12 Research data loss – read this article!
December 2012 The laptop was left by a graduate student in the backseat of a car parked outside a downtown restaurant Someone broke in to the car and stole the computer Trophic ecologist contained a vast amount of experimental data from tracked fish (cost $50,000 CND) “Unfortunately none of the data had been backed up yet.If we don’t get this laptop back, that data is lost forever.”

13 HOWEVER You can prevent total loss of your data by backing up.
It is recommended that you keep at least 3 copies of your data. For example, original, external (locally), and external (remotely), and have a policy for maintaining regular backups.

14 A guide to backing up your data

15 Questions to ask yourself

16 How will I back up my data? How regularly will backups be made?
Will all data, or only changed data, will be backed up? (A backup of changed data is known as an "incremental backup", while a backup of all data is known as a "full backup"). How often full and incremental backups will be made? How long will backups be stored?

17 How much hard drive space or number of Digital Video Discs (DVDs) will I require to maintain this backup schedule? If the data is sensitive, how will they be secured and (possibly) destroyed? What backup services are available that meet these needs and, if none, what will be done about it? Who will be responsible for ensuring backups are available?

18 In the following video Professor Lynn Jamieson from the University of Edinburgh talks about the importance of keeping regular backups of research data. 

19 Storing it in the Cloud

20 “Cloud storage is a model of networked enterprise storage where data is stored not only in the user's computer, but in virtualized pools of storage which are generally hosted by third parties, too.”

21 Fortunately….26 Online Backup Services have been reviewed
Cloud services Fortunately….26 Online Backup Services have been reviewed

22

23 The University of Hertfordshire has reviewed the most popular cloud storage services….
It has also analysed the pros and cons of their data and security policies as well as their costs and access. You can read it here:

24 Cloud Storage: Advantages and Disadvantages
The following slide is taken from the Research Data MANTRA online course by Data Library and EDINA, University of Edinburgh & is licensed under a Creative Commons Attribution 2.5 UK: Scotland License.

25 Advantages No user intervention is required (change tapes, label CDs, perform manual tasks).
Remote backup maintains data offsite. Most provide versioning and encryption. They are multi-platform. Disadvantages Restoration of data may be slow (dependent upon network bandwidth). Stored data may not be entirely private (thus pre- encryption). Service provider may go out of business. Protracted intellectual property rights/copyright/data protection licences.

26 Access control

27 Data security is the means of ensuring that research data is kept safe from corruption and that access is suitably controlled. It is important to consider the security of your data to prevent: Accidental or malicious damage/modification to data. Theft of valuable data. Breach of confidentiality agreements & privacy laws. Premature release of data, which can void intellectual property claims. Release before data have been checked for accuracy and authenticity.

28 Access control You need to consider the following questions for securing your research data How will you manage access arrangements and data security? How will you enforce permissions, restrictions and embargoes? Other security issues such as sensitive data, off-network storage, storage on mobile devices (laptops, smartphones, flash drives, etc), policy on making copies of data, etc. where relevant.

29 Encryption There are a number of ways to encrypt your data where it is stored. There are many software programs which allow you to do this easily and are also for free. See the following Wikipedia page: Comparison of disk encryption software

30

31 Encryption - TrueCrypt
One of the most popular encryption tools is TrueCrypt. You can see why…

32 Other sessions as part of Data Management in Geoinformatics:
Data Collection Data Integration Data Sharing Data Management for Geoinformatics by John Murtagh as part of the Jisc funded project TraD (University of East London is licensed under a Creative Commons Attribution Share Alike Licence


Download ppt "Data Management for Geoinformatics A short course on good data management for taught postgraduate students in geoinformatics and related data sciences."

Similar presentations


Ads by Google