Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 The Dryad Data Repository: Metadata Workflows and Processes 2nd Data Management Workshop November 28th – 29th 2014 University of Cologne, Germany Jane.

Similar presentations


Presentation on theme: "1 The Dryad Data Repository: Metadata Workflows and Processes 2nd Data Management Workshop November 28th – 29th 2014 University of Cologne, Germany Jane."— Presentation transcript:

1 1 The Dryad Data Repository: Metadata Workflows and Processes 2nd Data Management Workshop November 28th – 29th 2014 University of Cologne, Germany Jane Greenberg Professor, College of Computing & Informatics (CCI) Director, Metadata Research Center Erin Clary, Dryad Curator, CCI/MRC

2

3 3

4 http://datadryad.org/

5

6 Pre-populated metadata field

7 7

8 8 Elsevier’s Science Direct: EXAMPLE: Dryad Unmack, et al, Phylogeny and biogeography…Molecular Phylogenetics and Evolution http://dx.doi.org/10.1016/j.ympev.2012.12.019.http://dx.doi.org/10.1016/j.ympev.2012.12.019

9 Elsevier’s Science Direct: EXAMPLE: Dryad Unmack, et al, Phylogeny and biogeography…Molecular Phylogenetics and Evolution http://dx.doi.org/10.1016/j.ympev.2012.12.019http://dx.doi.org/10.1016/j.ympev.2012.12.019

10

11 Data downloads  reuse  citation Observations, motivating study of metadata capital 1.Metadata generation costs money a BIG part 2.Metadata reuse is a BIG part of Dryad’s workflow 3.Metadata reuse via OAI 4.Metadata reuse via data sharing, reuse, and repurposing Download 10678 times

12 Greenberg J, Swauger S, Feinstein EM (2013) Data from: Metadata capital in a data repository. Proceedings of the International Conference on Dublin Core and Metadata Applications http://dx.doi.org/10.5061/dry ad.8c1p6 http://dx.doi.org/10.5061/dry ad.8c1p6

13 JournalRe. Wrkfl Blackout AmNtrlNN MBENN BioRiskYN BMJ Open YN …. Y TypeTotal30 days Data packages 6867198 Data files21056977 Journals36477 Authors245003492 Downloads 63931436006 Journals (80+…PLOS): http://datadryad.org/pages/i ntegratedJournals http://datadryad.org/pages/i ntegratedJournals X >10GB = $15,$10+

14 http://wiki.datadryad.org/Sample_Dryad_Content#Examples_by_file_type

15 Technology DSpace DOIs via CDL/DataCite CC0 ( + data) Integration with specialized repositories and databases  Federated searching with TreeBASE and KNB LTER  TreeBASE submission (OAI-PMH)  GenBank (currently in development) Governance “non-profit status, 12 member Board of Directors”  Sets policy, goals science, journals, societies, OCLC, MS  2006 Dryad development – NESCent + Stakeholders: journals, publishers and scientific societies, and researchers.  2009-2012: Interim Board $ PAYMENT-Sept. 1,2014

16 Sustainability: Plan Comparison Payment PlanMemberNon-memberMinimum purchase 1. Voucher Plan USD$65 per data package USD$70 per data package 25 vouchers 2. Deferred Payment Plan USD$70 per data package USD$75 per data package 1 yr contract 3. Subscription Plan Annual fee based on USD$25 per published research article Annual fee based on USD$30 per published research article 2 yr contract For individuals: Pay on acceptance NA USD$80 per data package, payable by the submitter 1 data package

17 More on grown and sustainability  Membership: http://datadryad.org/pages/ membershipOverviewhttp://datadryad.org/pages/ membershipOverview  Pricing and sponsorship of deposits: http://datadryad.org/pages/prici nghttp://datadryad.org/pages/prici ng  Journal integration:  http://datadryad.org/pages/journalIntegra tion http://datadryad.org/pages/journalIntegra tion

18 18

19 Metadata research & development 1.Curation workflow - cognitive walkthroughs 2.Dryad metadata scheme development - crosswalk analyses (Dube, et al, 2007; Carrier, et al, 2007; White et al., 2008, Greenberg, et al, 2010; Greenberg 2009; 2010) 3.Metadata reuse - content analysis (Greenberg, IDCC Research Summit, 2010) 4.Instantiation - multi-method study (comprehensions assessment) (Greenberg, RDAP, 2010, UNAM 2012) 5.Name-authority control - exploratory study (Haven, 2009, INLS 720) 6.KO/metadata community practices - Concurrent triangulation mixed methods (survey + simulation experiment) (White, 2010, ASIST, 2010 JLM) 7.Metadata functions - quantitative categorical analysis (Willis, Greenberg, and White, 2010, CODATA, 2012, JASIST) (HIVE) 8.Vocabulary needs (HIVE) – mapping study (Greenberg, 2009, CCQ; Scherle, 2010, Code4Lib) 9.Metadata theory – deductive analysis (Greenberg, 2009)

20 Singapore Framework Dryad DCAP, ver. 3.0  bibo (The Bibliographic Ontology)  dcterms (Dublin Core terms)  dryad (Dryad)  DwC (Darwin Core) Vision 1.Simple: automatic metadata gen; heterogeneous datasets *Data-package centric 2.Interoperable: harvesting, cross- system searching 3.Semantic Web compatible : sustainable; supporting machine processing Greenberg, et al, 2009, Metadata Best Practice for a Scientific Data Repository, JLM, DOI:10.1080/1938638090 3405090.

21 21 HIVE) Helping Interdisciplinary Vocabulary Engineering ( HIVE)

22 ~~~~Amy DATA publication

23

24

25 Package metadata harvested from email Subj. 177 (gr. 97%, rd. 2%, bl. 1%) Contr. 101 (gr. 99%, bl. 1%)

26 Modified Capital- sigma notation Reuse  Cost / value n R + ∑ a i = R + a 1 + a 2 +a 3 + …a n i=1 R = value of the metadata record i= number of usages a = incremental increase in value n = maximum number of reuse

27 27 Author/Submitter | Curator 100 metadata instantiations 8 of 12 metadata properties had reuse @ 50% or greater 5 of 8 confirmed reuse at 80% or higher. Basic bib. vs. complex

28 Author Subject Dcterms.spatial   DwC.ScientificName

29 Conclusion…other Valuation Approaches  Market cap of Facebook per user: $40 – $300  Revenues per record per user: $4 – $7 per year Facebook Experian  Market prices of personal data: $0.50 for street address $2.00 for date of birth $8 for social security number $3 for driver’s license number $35 for military record SOURCE: OECD. Exploring the Economics of Personal Data: A Survey of Methodologies for Measuring Monetary Value. OECD Digital Economy Papers. Office for Economic Cooperation and Development Publishing, 2013.

30 Concluding comments  Success story  Contribution, have to start somewhere… Good timing, the right discipline  Confirmed use, reuse  Machine capabilities  An educative commons, intellectually engaging

31 http://wiki.datadryad.org/Sample_Dryad_Content

32 32 Acknowledgments  Dryad Consortium Board, journal partners, and data authors  NESCent: Laura Wendell (Executive Director), Hilmar Lapp, Heather Piwowar, Peggy Schaeffer, Ryan Scherle, Todd Vision (PI)  **Drexel/UNC : Jose R. Pérez- Agüera, Sarah Carrier, Elena Feinstein, Lina Huang, Robert Losee, Hollie White, Craig Willis, Jane Smith, Shea Swuager, Liz Turner, Christine Mayo, Adrian Ogletree, Erin Clary  U British Columbia: Michael Whitlock  NCSU Digital Libraries: Kristin Antelman  HIVE: Library of Congress, USGS, and The Getty Research Institute; and workshop hosts  Yale/TreeBASE: Youjun Guo, Bill Piel  DataONE: Rebecca Koskela, Bill Michener, Dave Veiglais, and many others  British Library: Lee-Ann Coleman, Adam Farquhar, Brian Hole  Oxford University: David Shotton

33 33 http://datadryad.org http://blog.datadryad.org http://datadryad.org/wiki http://code.google.com/p/dryad dryad-users@nescent.org Facebook: Dryad Twitter: @datadryad http://ils.unc.edu/mrc/hive/ http://code.google.com/p/hive-mrc/ Metsdata Reserch Center: http://cci.drexel.edu/mrc http://cci.drexel.edu/mrc http://datadryad.org http://blog.datadryad.org http://datadryad.org/wiki http://code.google.com/p/dryad dryad-users@nescent.org Facebook: Dryad Twitter: @datadryad http://ils.unc.edu/mrc/hive/ http://code.google.com/p/hive-mrc/ Metsdata Reserch Center: http://cci.drexel.edu/mrc http://cci.drexel.edu/mrc


Download ppt "1 The Dryad Data Repository: Metadata Workflows and Processes 2nd Data Management Workshop November 28th – 29th 2014 University of Cologne, Germany Jane."

Similar presentations


Ads by Google