Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Management. all a scientist never wanted to know but will not be able to avoid Data Management Plans To support research activity to accomplish scientific.

Similar presentations


Presentation on theme: "Data Management. all a scientist never wanted to know but will not be able to avoid Data Management Plans To support research activity to accomplish scientific."— Presentation transcript:

1 Data Management

2 all a scientist never wanted to know but will not be able to avoid Data Management Plans To support research activity to accomplish scientific goals provide necessary data to users

3 Why data management? Selfish reasons –Work more efficiently –Avoid data corruption and loss Altruistic reasons –Facilitates data exchange –Avoid data loss Altruistic = selfish in long run –Treat others like you want to be treated

4 Why conserve data? Moral obligation –Price of data collection –Uniqueness of observations You can’t measure a 2003 temperature in 2009 Allow peer review and audit of results –Cfr molecular genetics – requirement to deposit sequences in international databases (Genbank)

5 Tools of the trade Principles, attitude more important than hardware –in principle, dissociated from computer use cfr gigantic card indices of some libraries –in practice, involves the use of computerised databases often RDBMS Not always! (Genbank, World Ocean Database)

6 E2EDM Data management starts from day α –Data management plan should be part of any ‘project’ description Data management ends on day  –as last activity of project –submitting final data set to ‘deep archive’ ‘End-to-End Data Management’

7 Data? Results of measurements or observations –Monitoring vs scientific –‘Operational’ vs delayed-mode –Supporting data (eg ‘underway data’ collected automatically by research vessels) Not necessarily numerical (eg species identifications) –Measurement scales: nominal, rank, interval, ratio –Representation: string, boolean, integer, real

8 Information? Widely different meanings –(supporting data) –Interpreted data –Metadata: data about data –Data about the science rather than about the scientific subject Eg bibliographies, directories

9 Different aspects Documentation and inventories Recording and logging procedures Quality control Exchange, redistribution Back up Archive

10 Documentation Creating information about the dataset: metadata –what, where, objectives, limitations… –make available as widely as possible avoid duplication attract partners (scientific!) Store metadata together with data

11 Documentation Different types of metadata –Discovery –Documentation –Technical Serve different purposes, often different systems Ideally ‘harvested’ from data

12 Inventorising Metadata database –Discovery type information Document not only what has been measured, but also planned campaigns –Make inventory searchable –Facilitate exchange of data and information –Avoid duplication

13 Recording Often in systems other than final data management system –Paper forms Reminder of what information should be recorded –Spreadsheet Makes quality control possible during first steps Needs system to control data flow

14 Quality control Automated –Range check (impossible values) –Statistical (improbable values) Danger of excluding unexpected phenomena (eg hole in ozone layer, El Nino) Expert –‘manually’, anything that requires knowledge of the subject area –Often involves creating graphs Flag, don’t delete

15 Backing up Needs rigorous procedures Keeping separate copy of working data sets –Disaster recovery Needs copy to be kept in separate location –Wrong manipulation On larger systems: on specialised hardware (tape drives…), necessitated by large volume –But the principle is more important!!

16 Exchanging Communicating data to others –To systems – distributed data systems –To people Requires data exchange protocols –Agree on the formats for exchange Requires data exchange policy –Agree on what can be done with data by ‘recipient’

17 Archiving Important to ensure long-term integrity of the data –On time scales that are typically much longer than a project… Often will involve specialised organisations –Data repositories – data centres Needs careful thinking about storage medium –Magnetic media are not ideal, certainly not in tropical countries Documentation, viewer software

18 Role of data centres Data management tasks –Inventorising and documenting –Archiving Specific tasks –Redistribution –Integration Support

19 Redistribution Preferably on line –Fast and efficient –No marginal costs Inventory –Metadatabase as a tool Data rescue –Recovering data that are in danger of being lost Respecting rights of data providers –Data policy –Proper use statement

20 Integration Over different disciplines Over different institutions –Implies ‘trust’ –Needs formal arrangements Data policy Creates possibility of extra quality control –Checks on consistency

21 New technologies Technological developments make new types of applications possible –Internet, bandwidth –Standard protocols DiGIR, XML –Distributed databases Data centres are forced to rething their role –No longer passive archive, but active service centre

22 Data policy Formal agreement between partners exchanging data Describes rights and duties of data provider and data user Considerations –Data are public property –Rights of data collector

23 Data policy Breaking the prisoner’s dilemma –Defector earns the most in prisoner’s dilemma Rewards for data providers? –Co-authorship –Dataset citation


Download ppt "Data Management. all a scientist never wanted to know but will not be able to avoid Data Management Plans To support research activity to accomplish scientific."

Similar presentations


Ads by Google