Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Provenance and Attribution for Published Datasets The Challenge and the reality check April 9-10, 2009 National Academy of Sciences, Woods Hole, MA.

Similar presentations


Presentation on theme: "Data Provenance and Attribution for Published Datasets The Challenge and the reality check April 9-10, 2009 National Academy of Sciences, Woods Hole, MA."— Presentation transcript:

1 Data Provenance and Attribution for Published Datasets The Challenge and the reality check April 9-10, 2009 National Academy of Sciences, Woods Hole, MA Cyndy Chandler Biological and Chemical Oceanography Data Management Office Woods Hole Oceanographic Institution

2 09 April 2009Cyndy Chandler ~ Woods Hole Oceanographic Institution2 of 18 What is the goal? to establish best practice guidelines for metadata capture and recording to support data provenance and attribution of published datasets to establish best practice guidelines for metadata capture and recording to support data provenance and attribution of published datasets this talk will focus on oceanographic data this talk will focus on oceanographic data

3 09 April 2009Cyndy Chandler ~ Woods Hole Oceanographic Institution3 of 18 What is the problem? Why arent we doing this already? Why arent we doing this already? provenance tracking and attribution systems have been in use for a long time provenance tracking and attribution systems have been in use for a long time works of art works of art works of literature works of literature

4 09 April 2009Cyndy Chandler ~ Woods Hole Oceanographic Institution4 of 18 Why arent we doing this already? What is so difficult about associating source data with a journal publication? What is so difficult about associating source data with a journal publication? data acquisition data publication journal publication

5 09 April 2009Cyndy Chandler ~ Woods Hole Oceanographic Institution5 of 18 Why arent we doing this already? What are the challenges? Technical Technical Cultural Cultural Usual Usual

6 09 April 2009Cyndy Chandler ~ Woods Hole Oceanographic Institution6 of 18 Why arent we doing this already? Technical reasons … data are not published data are not published what is the definition of a published dataset? what is the definition of a published dataset? and if the data are published and if the data are published its not clear how to cite them its not clear how to cite them they lack sufficient metadata they lack sufficient metadata metadata are non-standard metadata are non-standard or they lack a persistent identifier or they lack a persistent identifier

7 09 April 2009Cyndy Chandler ~ Woods Hole Oceanographic Institution7 of 18 Why arent we doing this already? Technical reasons … data sets used to be smaller and were often published on paper (in a journal article or a data report, and they fit in Table 1) data sets used to be smaller and were often published on paper (in a journal article or a data report, and they fit in Table 1) data were published as a tangible thing data were published as a tangible thing as data acquisition becomes automated, rate of acquisition and volume increases as data acquisition becomes automated, rate of acquisition and volume increases but metadata acquisition (data documentation) is not being automated at the same rate but metadata acquisition (data documentation) is not being automated at the same rate

8 09 April 2009Cyndy Chandler ~ Woods Hole Oceanographic Institution8 of 18 Why arent we doing this already? Cultural reasons … little incentive for researchers to publish their data little incentive for researchers to publish their data often augmented by the perception that the data are the property of the originating investigator, and might be stolen often augmented by the perception that the data are the property of the originating investigator, and might be stolen Conventional wisdom is still that publish or perish applies predominantly to journal publications, not data publication. (Funding agency program managers are beginning to effect change in this area.)

9 09 April 2009Cyndy Chandler ~ Woods Hole Oceanographic Institution9 of 18 Why arent we doing this already? Usual reasons … lack of resources lack of resources Funding Funding Expertise Expertise Time Time

10 09 April 2009Cyndy Chandler ~ Woods Hole Oceanographic Institution10 of 18 remember where these data come from … … this is the office ! Think Ill go record some metadata. Whos recording the metadata?

11 09 April 2009Cyndy Chandler ~ Woods Hole Oceanographic Institution11 of 18 Why arent we doing this already? What is so difficult about associating source data with a journal publication? What is so difficult about associating source data with a journal publication? data acquisition data publication journal publication

12 09 April 2009Cyndy Chandler ~ Woods Hole Oceanographic Institution12 of 18 data acquisition data publication journal publication a relatively simple case a relatively simple case Many of the VERTIGO project cruise data sets are available online from BCO-DMO Many of the VERTIGO project cruise data sets are available online from BCO-DMO and theyre tagged with metadata. and theyre tagged with metadata. The introductory paper refers to the online data server. The introductory paper refers to the online data server. Source data are available online for this special volume. Source data are available online for this special volume.

13 09 April 2009Cyndy Chandler ~ Woods Hole Oceanographic Institution13 of 18 Why arent we doing this already? Lets assume this effort is fully funded ~ so all the usual reasons are no longer an issue ~ funding, expertise, time ~ no longer a challenge ! Combined cultural and technical challenges … The simplest system for data publication and attribution involves at least one representative from each of these three communities: The simplest system for data publication and attribution involves at least one representative from each of these three communities: Oceanographer ( research discipline )Oceanographer ( research discipline ) Data manager ( information science )Data manager ( information science ) Editor ( publishing community )Editor ( publishing community )

14 09 April 2009Cyndy Chandler ~ Woods Hole Oceanographic Institution14 of 18 Why arent we doing this already? Combined cultural and technical challenges … The successful system for data publication and attribution more likely involves six communities The successful system for data publication and attribution more likely involves six communities Oceanographer (research discipline )Oceanographer (research discipline ) Data manager (information science )Data manager (information science ) Library scienceLibrary science Information technology expertise from these fieldsInformation technology expertise from these fields Social scienceSocial science Editor ( publishing community )Editor ( publishing community ) and effective communication between those communities

15 09 April 2009Cyndy Chandler ~ Woods Hole Oceanographic Institution15 of 18 Additional Challenges What if all the whining from the previous slides could be addressed somehow? What if all the whining from the previous slides could be addressed somehow? Education Education Cultural changes Cultural changes Standards development and implementation Standards development and implementation Funding sources Funding sources Communication Communication challenges

16 09 April 2009Cyndy Chandler ~ Woods Hole Oceanographic Institution16 of 18 Additional Challenges micro attribution – what level is required to support scientific inquiry? micro attribution – what level is required to support scientific inquiry? what are the identifiable entities within a publication that require data attribution what are the identifiable entities within a publication that require data attribution the entire article?the entire article? each table? each figure?each table? each figure? publications often have many source data setspublications often have many source data sets who does all that work? The author(s) ? who does all that work? The author(s) ?

17 09 April 2009Cyndy Chandler ~ Woods Hole Oceanographic Institution17 of 18 It is important to figure this out. Data are difficult and expensive to collect, and can not be recollected. Data are difficult and expensive to collect, and can not be recollected. We want to maximize data reuse. We want to maximize data reuse.

18 09 April 2009Cyndy Chandler ~ Woods Hole Oceanographic Institution18 of 18 thank you


Download ppt "Data Provenance and Attribution for Published Datasets The Challenge and the reality check April 9-10, 2009 National Academy of Sciences, Woods Hole, MA."

Similar presentations


Ads by Google