Download presentation
Presentation is loading. Please wait.
Published byNigel Franklin Powers Modified over 8 years ago
1
Data Management
2
all a scientist never wanted to know but will not be able to avoid Data Management Plans To support research activity to accomplish scientific goals provide necessary data to users
3
Why data management? Selfish reasons –Work more efficiently –Avoid data corruption and loss Altruistic reasons –Facilitates data exchange –Avoid data loss Altruistic = selfish in long run –Treat others like you want to be treated
4
Why conserve data? Moral obligation –Price of data collection –Uniqueness of observations You can’t measure a 2003 temperature in 2009 Allow peer review and audit of results –Cfr molecular genetics – requirement to deposit sequences in international databases (Genbank)
5
Tools of the trade Principles, attitude more important than hardware –in principle, dissociated from computer use cfr gigantic card indices of some libraries –in practice, involves the use of computerised databases often RDBMS Not always! (Genbank, World Ocean Database)
6
E2EDM Data management starts from day α –Data management plan should be part of any ‘project’ description Data management ends on day –as last activity of project –submitting final data set to ‘deep archive’ ‘End-to-End Data Management’
7
Data? Results of measurements or observations –Monitoring vs scientific –‘Operational’ vs delayed-mode –Supporting data (eg ‘underway data’ collected automatically by research vessels) Not necessarily numerical (eg species identifications) –Measurement scales: nominal, rank, interval, ratio –Representation: string, boolean, integer, real
8
Information? Widely different meanings –(supporting data) –Interpreted data –Metadata: data about data –Data about the science rather than about the scientific subject Eg bibliographies, directories
9
Different aspects Documentation and inventories Recording and logging procedures Quality control Exchange, redistribution Back up Archive
10
Documentation Creating information about the dataset: metadata –what, where, objectives, limitations… –make available as widely as possible avoid duplication attract partners (scientific!) Store metadata together with data
11
Documentation Different types of metadata –Discovery –Documentation –Technical Serve different purposes, often different systems Ideally ‘harvested’ from data
12
Inventorising Metadata database –Discovery type information Document not only what has been measured, but also planned campaigns –Make inventory searchable –Facilitate exchange of data and information –Avoid duplication
13
Recording Often in systems other than final data management system –Paper forms Reminder of what information should be recorded –Spreadsheet Makes quality control possible during first steps Needs system to control data flow
14
Quality control Automated –Range check (impossible values) –Statistical (improbable values) Danger of excluding unexpected phenomena (eg hole in ozone layer, El Nino) Expert –‘manually’, anything that requires knowledge of the subject area –Often involves creating graphs Flag, don’t delete
15
Backing up Needs rigorous procedures Keeping separate copy of working data sets –Disaster recovery Needs copy to be kept in separate location –Wrong manipulation On larger systems: on specialised hardware (tape drives…), necessitated by large volume –But the principle is more important!!
16
Exchanging Communicating data to others –To systems – distributed data systems –To people Requires data exchange protocols –Agree on the formats for exchange Requires data exchange policy –Agree on what can be done with data by ‘recipient’
17
Archiving Important to ensure long-term integrity of the data –On time scales that are typically much longer than a project… Often will involve specialised organisations –Data repositories – data centres Needs careful thinking about storage medium –Magnetic media are not ideal, certainly not in tropical countries Documentation, viewer software
18
Role of data centres Data management tasks –Inventorising and documenting –Archiving Specific tasks –Redistribution –Integration Support
19
Redistribution Preferably on line –Fast and efficient –No marginal costs Inventory –Metadatabase as a tool Data rescue –Recovering data that are in danger of being lost Respecting rights of data providers –Data policy –Proper use statement
20
Integration Over different disciplines Over different institutions –Implies ‘trust’ –Needs formal arrangements Data policy Creates possibility of extra quality control –Checks on consistency
21
New technologies Technological developments make new types of applications possible –Internet, bandwidth –Standard protocols DiGIR, XML –Distributed databases Data centres are forced to rething their role –No longer passive archive, but active service centre
22
Data policy Formal agreement between partners exchanging data Describes rights and duties of data provider and data user Considerations –Data are public property –Rights of data collector
23
Data policy Breaking the prisoner’s dilemma –Defector earns the most in prisoner’s dilemma Rewards for data providers? –Co-authorship –Dataset citation
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.