Download presentation
Presentation is loading. Please wait.
Published byNelson Fleming Modified over 9 years ago
1
1 The Dryad Data Repository: Metadata Workflows and Processes 2nd Data Management Workshop November 28th – 29th 2014 University of Cologne, Germany Jane Greenberg Professor, College of Computing & Informatics (CCI) Director, Metadata Research Center Erin Clary, Dryad Curator, CCI/MRC
3
3
4
http://datadryad.org/
6
Pre-populated metadata field
7
7
8
8 Elsevier’s Science Direct: EXAMPLE: Dryad Unmack, et al, Phylogeny and biogeography…Molecular Phylogenetics and Evolution http://dx.doi.org/10.1016/j.ympev.2012.12.019.http://dx.doi.org/10.1016/j.ympev.2012.12.019
9
Elsevier’s Science Direct: EXAMPLE: Dryad Unmack, et al, Phylogeny and biogeography…Molecular Phylogenetics and Evolution http://dx.doi.org/10.1016/j.ympev.2012.12.019http://dx.doi.org/10.1016/j.ympev.2012.12.019
11
Data downloads reuse citation Observations, motivating study of metadata capital 1.Metadata generation costs money a BIG part 2.Metadata reuse is a BIG part of Dryad’s workflow 3.Metadata reuse via OAI 4.Metadata reuse via data sharing, reuse, and repurposing Download 10678 times
12
Greenberg J, Swauger S, Feinstein EM (2013) Data from: Metadata capital in a data repository. Proceedings of the International Conference on Dublin Core and Metadata Applications http://dx.doi.org/10.5061/dry ad.8c1p6 http://dx.doi.org/10.5061/dry ad.8c1p6
13
JournalRe. Wrkfl Blackout AmNtrlNN MBENN BioRiskYN BMJ Open YN …. Y TypeTotal30 days Data packages 6867198 Data files21056977 Journals36477 Authors245003492 Downloads 63931436006 Journals (80+…PLOS): http://datadryad.org/pages/i ntegratedJournals http://datadryad.org/pages/i ntegratedJournals X >10GB = $15,$10+
14
http://wiki.datadryad.org/Sample_Dryad_Content#Examples_by_file_type
15
Technology DSpace DOIs via CDL/DataCite CC0 ( + data) Integration with specialized repositories and databases Federated searching with TreeBASE and KNB LTER TreeBASE submission (OAI-PMH) GenBank (currently in development) Governance “non-profit status, 12 member Board of Directors” Sets policy, goals science, journals, societies, OCLC, MS 2006 Dryad development – NESCent + Stakeholders: journals, publishers and scientific societies, and researchers. 2009-2012: Interim Board $ PAYMENT-Sept. 1,2014
16
Sustainability: Plan Comparison Payment PlanMemberNon-memberMinimum purchase 1. Voucher Plan USD$65 per data package USD$70 per data package 25 vouchers 2. Deferred Payment Plan USD$70 per data package USD$75 per data package 1 yr contract 3. Subscription Plan Annual fee based on USD$25 per published research article Annual fee based on USD$30 per published research article 2 yr contract For individuals: Pay on acceptance NA USD$80 per data package, payable by the submitter 1 data package
17
More on grown and sustainability Membership: http://datadryad.org/pages/ membershipOverviewhttp://datadryad.org/pages/ membershipOverview Pricing and sponsorship of deposits: http://datadryad.org/pages/prici nghttp://datadryad.org/pages/prici ng Journal integration: http://datadryad.org/pages/journalIntegra tion http://datadryad.org/pages/journalIntegra tion
18
18
19
Metadata research & development 1.Curation workflow - cognitive walkthroughs 2.Dryad metadata scheme development - crosswalk analyses (Dube, et al, 2007; Carrier, et al, 2007; White et al., 2008, Greenberg, et al, 2010; Greenberg 2009; 2010) 3.Metadata reuse - content analysis (Greenberg, IDCC Research Summit, 2010) 4.Instantiation - multi-method study (comprehensions assessment) (Greenberg, RDAP, 2010, UNAM 2012) 5.Name-authority control - exploratory study (Haven, 2009, INLS 720) 6.KO/metadata community practices - Concurrent triangulation mixed methods (survey + simulation experiment) (White, 2010, ASIST, 2010 JLM) 7.Metadata functions - quantitative categorical analysis (Willis, Greenberg, and White, 2010, CODATA, 2012, JASIST) (HIVE) 8.Vocabulary needs (HIVE) – mapping study (Greenberg, 2009, CCQ; Scherle, 2010, Code4Lib) 9.Metadata theory – deductive analysis (Greenberg, 2009)
20
Singapore Framework Dryad DCAP, ver. 3.0 bibo (The Bibliographic Ontology) dcterms (Dublin Core terms) dryad (Dryad) DwC (Darwin Core) Vision 1.Simple: automatic metadata gen; heterogeneous datasets *Data-package centric 2.Interoperable: harvesting, cross- system searching 3.Semantic Web compatible : sustainable; supporting machine processing Greenberg, et al, 2009, Metadata Best Practice for a Scientific Data Repository, JLM, DOI:10.1080/1938638090 3405090.
21
21 HIVE) Helping Interdisciplinary Vocabulary Engineering ( HIVE)
22
~~~~Amy DATA publication
25
Package metadata harvested from email Subj. 177 (gr. 97%, rd. 2%, bl. 1%) Contr. 101 (gr. 99%, bl. 1%)
26
Modified Capital- sigma notation Reuse Cost / value n R + ∑ a i = R + a 1 + a 2 +a 3 + …a n i=1 R = value of the metadata record i= number of usages a = incremental increase in value n = maximum number of reuse
27
27 Author/Submitter | Curator 100 metadata instantiations 8 of 12 metadata properties had reuse @ 50% or greater 5 of 8 confirmed reuse at 80% or higher. Basic bib. vs. complex
28
Author Subject Dcterms.spatial DwC.ScientificName
29
Conclusion…other Valuation Approaches Market cap of Facebook per user: $40 – $300 Revenues per record per user: $4 – $7 per year Facebook Experian Market prices of personal data: $0.50 for street address $2.00 for date of birth $8 for social security number $3 for driver’s license number $35 for military record SOURCE: OECD. Exploring the Economics of Personal Data: A Survey of Methodologies for Measuring Monetary Value. OECD Digital Economy Papers. Office for Economic Cooperation and Development Publishing, 2013.
30
Concluding comments Success story Contribution, have to start somewhere… Good timing, the right discipline Confirmed use, reuse Machine capabilities An educative commons, intellectually engaging
31
http://wiki.datadryad.org/Sample_Dryad_Content
32
32 Acknowledgments Dryad Consortium Board, journal partners, and data authors NESCent: Laura Wendell (Executive Director), Hilmar Lapp, Heather Piwowar, Peggy Schaeffer, Ryan Scherle, Todd Vision (PI) **Drexel/UNC : Jose R. Pérez- Agüera, Sarah Carrier, Elena Feinstein, Lina Huang, Robert Losee, Hollie White, Craig Willis, Jane Smith, Shea Swuager, Liz Turner, Christine Mayo, Adrian Ogletree, Erin Clary U British Columbia: Michael Whitlock NCSU Digital Libraries: Kristin Antelman HIVE: Library of Congress, USGS, and The Getty Research Institute; and workshop hosts Yale/TreeBASE: Youjun Guo, Bill Piel DataONE: Rebecca Koskela, Bill Michener, Dave Veiglais, and many others British Library: Lee-Ann Coleman, Adam Farquhar, Brian Hole Oxford University: David Shotton
33
33 http://datadryad.org http://blog.datadryad.org http://datadryad.org/wiki http://code.google.com/p/dryad dryad-users@nescent.org Facebook: Dryad Twitter: @datadryad http://ils.unc.edu/mrc/hive/ http://code.google.com/p/hive-mrc/ Metsdata Reserch Center: http://cci.drexel.edu/mrc http://cci.drexel.edu/mrc http://datadryad.org http://blog.datadryad.org http://datadryad.org/wiki http://code.google.com/p/dryad dryad-users@nescent.org Facebook: Dryad Twitter: @datadryad http://ils.unc.edu/mrc/hive/ http://code.google.com/p/hive-mrc/ Metsdata Reserch Center: http://cci.drexel.edu/mrc http://cci.drexel.edu/mrc
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.