Download presentation
Presentation is loading. Please wait.
Published byClifton Randall Modified over 9 years ago
1
1 11-20-14/Greenberg Metadata Quality and Capital Disseminators and Service Providers November 20, 2014 Jane Greenberg Professor, College of Computing & Informatics Director, Metadata Research Center
2
2 11-20-14/Greenberg Your data is only as good as your metadata Metadata is a first class object
3
Toothbrush
4
4 11-20-14/Greenberg The topic… (DRYAD) Good enough is not bad (DRYAD) (CAPITAL) ROI – return on investment (CAPITAL) (COMMUNITY)…. time permitting RDA – Research Data Alliance (COMMUNITY)…. time permitting
6
6 11-20-14/Greenberg
8
8 11-20-14/Greenberg Pre-populated metadata field
9
9 11-20-14/Greenberg
10
10 11-20-14/Greenberg Data downloads reuse citation Observations, motivating study of metadata capital 1.Metadata generation costs money a BIG part 2.Metadata reuse is a BIG part of Dryad’s workflow 3.Metadata reuse via OAI 4.Metadata reuse via data sharing, reuse, and repurposing Download 10678 times
11
JournalRe. Wrkfl Blackout AmNtrlNN MBENN BioRiskYN BMJ Open YN …. Y TypeTotal30 days Data packages 6781198 Data files20832957 Journals36172 Authors241663312 Downloads 63534837611 Journals (80+…PLOS): http://datadryad.org/pages/i ntegratedJournals http://datadryad.org/pages/i ntegratedJournals X >10GB = $15,$10+
12
12 11-20-14/Greenberg Technology DSpace DOIs via CDL/DataCite CC0 ( + data) Integration with specialized repositories and databases Federated searching with TreeBASE and KNB LTER TreeBASE submission (OAI-PMH) GenBank (currently in development) Governance “non-profit status, 12 member Board of Directors” Sets policy, goals science, journals, societies, OCLC, MS 2006 Dryad development – NESCent + Stakeholders: journals, publishers and scientific societies, and researchers. 2009-2012: Interim Board $ PAYMENT-Sept. 1,2014
13
13 11-20-14/Greenberg
14
14 11-20-14/Greenberg Singapore Framework Dryad DCAP, ver. 3.0 bibo (The Bibliographic Ontology) dcterms (Dublin Core terms) dryad (Dryad) DwC (Darwin Core) Vision 1.Simple: automatic metadata gen; heterogeneous datasets *Data-package centric 2.Interoperable: harvesting, cross- system searching 3.Semantic Web compatible : sustainable; supporting machine processing Greenberg, et al, 2009, Metadata Best Practice for a Scientific Data Repository, JLM, DOI:10.1080/1938638090 3405090.
15
15 11-20-14/Greenberg Metadata research & development 1.Curation workflow - cognitive walkthroughs 2.Dryad metadata scheme development - crosswalk analyses (Dube, et al, 2007; Carrier, et al, 2007; White et al., 2008, Greenberg, et al, 2010; Greenberg 2009; 2010) 3.Metadata reuse - content analysis (Greenberg, IDCC Research Summit, 2010) 4.Instantiation - multi-method study (comprehensions assessment) (Greenberg, RDAP, 2010, UNAM 2012) 5.Name-authority control - exploratory study (Haven, 2009, INLS 720) 6.KO/metadata community practices - Concurrent triangulation mixed methods (survey + simulation experiment) (White, 2010, ASIST, 2010 JLM) 7.Metadata functions - quantitative categorical analysis (Willis, Greenberg, and White, 2010, CODATA, 2012, JASIST) (HIVE) 8.Vocabulary needs (HIVE) – mapping study (Greenberg, 2009, CCQ; Scherle, 2010, Code4Lib) 9.Metadata theory – deductive analysis (Greenberg, 2009)
16
Interoperability slope Dublin Core application profile OAI-PMH DOI DataCite DataONE TR: Data Citation Index Elsevier, Science Direct Semantic ontologies Researcher names Agency/ institution
17
17 11-20-14/Greenberg
19
Package metadata harvested from email Subj. 177 (gr. 97%, rd. 2%, bl. 1%) Contr. 101 (gr. 99%, bl. 1%)
20
20 11-20-14/Greenberg The leap - capital to metadata capital An economic concept (Weber, 1905; Smith’s, 1776) Business and operations (net gains or losses) Finances, goods and services, and public needs Intellectual capital, social capital a tangible result, value increase Metadata as an asset, a product Reuse of good quality metadata increase value of initial investment Poor quality may reduce metadata capital ? Metadata reuse prevalence Cooperative cataloging, CIP, ISBD, MARC, FRBR, LCC, VIAF, OAI-PMH, CrossRef, PubMed, Zotero, BibTex, DataCite. Linked data/Semantic Web, PIDs, etc.
21
Modified Capital- sigma notation Reuse Cost / value n R + ∑ a i = R + a 1 + a 2 +a 3 + …a n i=1 R = value of the metadata record i= number of usages a = incremental increase in value n = maximum number of reuse
22
22 11-20-14/Greenberg Author/Submitter | Curator 100 metadata instantiations 8 of 12 metadata properties had reuse @ 50% or greater 5 of 8 confirmed reuse at 80% or higher. Basic bib. vs. complex
23
Author Subject Dcterms.spatial DwC.ScientificName
24
linked data Modified Capital-sigma notation for linked data Cost / value Reuse of linked data concept/URI P = Determined by the number of terms in an ontology, labor hours to generate, integrate, etc,
25
25 HIVE) Helping Interdisciplinary Vocabulary Engineering ( HIVE) C V cost, interoperability, and usability constraints C V cost, interoperability, and usability constraints Linked Open Vocabulary initiative, to support inter/transdisciplinary…. SKOS (a little dumb) AMG + machine learning approach for integrating discipline terminologies
27
27 11-20-14/Greenberg ~~~~Amy Meet Amy Zanne. She is a botanist. Like every good scientist, she publishes, and she deposits data in Dryad. Amy’s data
28
28 11-20-14/Greenberg
29
29 11-20-14/Greenberg Successive growth rates N ∑ i c = Θ (n c +1) i=1 Cycles… What about successive growth rate tied to a concept? A concept can be in ~ vernacular to canonical fall by the wayside, less popular out (deprecated)
30
30 11-20-14/Greenberg Conclusion…other Valuation Approaches Market cap of Facebook per user: $40 – $300 Revenues per record per user: $4 – $7 per year Facebook Experian Market prices of personal data: $0.50 for street address $2.00 for date of birth $8 for social security number $3 for driver’s license number $35 for military record SOURCE: OECD. Exploring the Economics of Personal Data: A Survey of Methodologies for Measuring Monetary Value. OECD Digital Economy Papers. Office for Economic Cooperation and Development Publishing, 2013.
31
Concluding remarks Interest….traction Limitations: bad data, cost/value We should care about cost Metadata capital can contextualize Generic formula for further research
32
32 11-20-14/Greenberg Metadata Standards Directory Working Group…. Jane Greenberg, Alex Ball, Keith Jeffery, Rebecca Koskela
33
33 11-20-14/Greenberg “…develop a collaborative, open directory of metadata standards applicable to scientific data” Stakeholders: Researchers, data managers, data scientists, tool developers, repositories, agencies, societies (RDA’s growing community) Goals and workplan - DCC Disciplinary Directory: http://www.dcc.ac.uk/resources/metadata- standards http://www.dcc.ac.uk/resources/metadata- standards
34
34 11-20-14/Greenberg Acknowledgments Dryad Consortium Board, journal partners, and data authors NESCent: Laura Wendell (Executive Director), Hilmar Lapp, Heather Piwowar, Peggy Schaeffer, Ryan Scherle, Todd Vision (PI) **Drexel/UNC : Jose R. Pérez- Agüera, Sarah Carrier, Elena Feinstein, Lina Huang, Robert Losee, Hollie White, Craig Willis, Jane Smith, Shea Swuager, Liz Turner, Christine Mayo, Adrian Ogletree, Erin Clary U British Columbia: Michael Whitlock NCSU Digital Libraries: Kristin Antelman HIVE: Library of Congress, USGS, and The Getty Research Institute; and workshop hosts Yale/TreeBASE: Youjun Guo, Bill Piel DataONE: Rebecca Koskela, Bill Michener, Dave Veiglais, and many others British Library: Lee-Ann Coleman, Adam Farquhar, Brian Hole Oxford University: David Shotton
35
35 11-20-14/Greenberg http://datadryad.org http://blog.datadryad.org http://datadryad.org/wiki http://code.google.com/p/dryad dryad-users@nescent.org Facebook: Dryad Twitter: @datadryad http://ils.unc.edu/mrc/hive/ http://code.google.com/p/hive-mrc/ Metsdata Reserch Center: http://cci.drexel.edu/mrc http://cci.drexel.edu/mrc http://datadryad.org http://blog.datadryad.org http://datadryad.org/wiki http://code.google.com/p/dryad dryad-users@nescent.org Facebook: Dryad Twitter: @datadryad http://ils.unc.edu/mrc/hive/ http://code.google.com/p/hive-mrc/ Metsdata Reserch Center: http://cci.drexel.edu/mrc http://cci.drexel.edu/mrc
36
36 11-20-14/Greenberg Sustainability: Plan Comparison Payment PlanMemberNon-memberMinimum purchase 1. Voucher Plan USD$65 per data package USD$70 per data package 25 vouchers 2. Deferred Payment Plan USD$70 per data package USD$75 per data package 1 yr contract 3. Subscription Plan Annual fee based on USD$25 per published research article Annual fee based on USD$30 per published research article 2 yr contract For individuals: Pay on acceptance NA USD$80 per data package, payable by the submitter 1 data package
37
37 11-20-14/Greenberg More on grown and sustainability Membership: http://datadryad.org/pages/ membershipOverviewhttp://datadryad.org/pages/ membershipOverview Pricing and sponsorship of deposits: http://datadryad.org/pages/prici nghttp://datadryad.org/pages/prici ng Journal integration: http://datadryad.org/pages/journalIntegra tion http://datadryad.org/pages/journalIntegra tion
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.