Download presentation
Presentation is loading. Please wait.
1
Report for Wed. Question 1 In your view/experience what parts of data life cycle, data citation and data integration implementations/applications or frameworks are well established (or not) in your disciplines and what are the common gaps?
2
Understanding Value Streams Single, continuous stream? – Collect, Process, Archive, Discover, Access, Use Two distinct streams? – “Original Intent”: Collect, Process, Archive, Use – Secondary Use: Discover, Access, Use – Key differences: funding org, different community – BUT: can anything be done in the original intent stream to facilitate the secondary use?? Spiral Model? – Collect, Process, Archive, Discover, Access, Use (=Process further) – Process further, Archive, Discover, Access, Use, …
3
What Part of Framework is Working: Standards WORKING – Some standards are useful and widely used “Self-describing” formats: SEED, HDF, netCDF Climate-Forecast (CF) convention
4
Gaps in Standards Some standards are underused – E.g., ISO metadata: cost, learning curve Need to consider the human factor – Tool availability Interdisciplinary standards are problematic – Discontinuities between disciplines in standards use – Observations and MeasurementsModel may help here for some disciplines Need standards to support the scientific workflow – E.g., when to add metadata and what kind of metadata Standards Churn (changing too fast/often)
5
The Human Factor in Data Lifecycle Management Incentives – Sticks: funding, publication requirements – Carrots: wider use of data, citations
6
The problem with citations… Human and Process problems – Citations are not being used where they should be – Digital data citations are not accepted in some citation indices – Data are not often peer-reviewed, therefore of uncertain quality and citability. Technical problems – Agreement and widespread use of data identifiers – Citation granularity (dataset vs files vs columns in files)
7
Metadata Capture We need to capture more metadata at the point of data origin – Ideally, built into the collection mechanism – Also, following standards – Exemplars: EXIF standard for cameras ArcCatalog SEED format from seismometers EarthChem We need to capture more metadata at later processing steps (beyond basic provenance) – Gap: handling provenance granularity
8
Where/how to implement robust data management practices Federal data centers – NOAA data centers, NASA DAACs Federally Funded Research Centers – NCAR University Consortia – IRIS DMC Libraries (Could collaborate more with data centers) Collaborations between scientists and data managers – Argonne “catalysts” example of helping scientists leverage computing facilities: apply to data mgmt Professional Societies Individual Universities – U. of Oklahoma Climate Services Center(?) Key Gap: Robust Business Model for Long-term Persistence of Data Archive
9
Some Proposals to Involve More People in the Data Lifecycle… Teach students about data management and require them to make data and metadata available as part of their thesis – Partnership with university libraries would be key Involve 4-yr colleges more (not just graduate programs) Provide a mechanism for people other than the data provider to add annotations to data Provide more education on data management to practicing scientists
10
Unresolved Questions Model Output: treat like data or something else? What to do about identifiers and locators for data? Discussion assumed the web to be an integral part of the lifecycle. Is this Good or Bad, considering the overall low reliability of info on the web? – Establishing trust for data is clearly important
11
Comments/Questions Ted: – Need to stop talking about hard metadata is, or people will believe it – Hard to make generalized tools Maybe make more domain specific tools? Did you discuss metrics? – JG, maybe use SEI CMM model
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.