1 11-20-14/Greenberg Metadata Quality and Capital Disseminators and Service Providers November 20, 2014 Jane Greenberg Professor, College of Computing.

Slides:



Advertisements
Similar presentations
“Classifying Scientific Data Objects with Bibliographic Relationship
Advertisements

Introduction The field of evolutionary biology draws from ecology, paleontology, population genetics, physiology, systematics, and new biological sub disciplines.
Theories of Evolution and Cultural Diffusion: The Dryad Repository Case Study for Understanding Changes in Organizing Information Practices ~~~~~~ ~~~~~~
Jane Greenberg SILS/Metadata Research Center School of Info. & Library Science Univ. of North Carolina at Chapel Hill The DRYAD Repository.
Building Support for a Discipline-Based Data Repository Ryan Scherle 1, Sarah Carrier 2, Jane Greenberg 2, Hilmar Lapp 1, Abbey Thompson 2, Todd Vision.
Evolutionary biology Population genetics Systematics Paleontology Botany and Zoology Genomics Ecology Medicine Agriculture Anthropology Bioinformatics.
The Dryad Data Repository Ryan Scherle 1, Hilmar Lapp 1, Amol Bapat 2, Sarah Carrier 2, Jane Greenberg 2, Peggy Schaeffer 1, Todd Vision 1,3, Hollie White.
Open Access Niamh Brennan Trinity College Dublin DRIVER Summit, Goettingen, January 17th 2008 Local Integration, National Federation TCD-RSS, TARA, IReL-Open,
S.J. Coles a*, M.B. Hursthouse a, R.A. Stephenson a, P. Cliff b, E. Lyon b, M. Patel b J. Downing c & P. Murray-Rust.
A centre of expertise in digital information management UKOLN is supported by: British Academy e-Resources Policy Review: UKOLN Report.
Jane Greenberg, Professor and Director, Metadata Research Center School of Information And Library Science University of North Carolina at Chapel Hill.
DSpace: the MIT Libraries Institutional Repository MacKenzie Smith, MIT EDUCAUSE 2003, November 5 th Copyright MacKenzie Smith, This work is the.
Current status Todd Vision (overview) Elena Feinstein (curation) Ryan Scherle (demo) 7/23/12Dryad Board of Directors1.
Sandra McIntyre Program Director. OVERVIEW Analysis.
Data archiving in evolutionary biology Michael Whitlock.
Helping Helping Interdisciplinary Vocabulary Engineering Ryan Scherle – National Evolutionary Synthesis Center Jose Aguera – University of North Carolina.
Jane Greenberg, CCI/Drexel University Director, SILS Metadata Research Center Dryad RDA Domain Repository IG.
Dryad’s Evolving Proof of Concept and the Metadata Hook Wolfram Data Summit September 6, 2012 Jane Greenberg Professor, School of Info.& Lib.Sci /UNC-CH.
Introduction to the Dryad Digital Repository A nonprofit repository for data underlying the international scientific and medical literature. April 2013.
New business models for open research Todd Vision Jared Lyle Mark Hahnel 12-June-2014Open Repositories1.
University of Southampton, U.K.
OLC Spring Chapter Conferences Metadata, Schmetadata … Tell Me Why I Should Care? OLC Spring Chapter Conferences, 2004 Margaret.
Publishing Solutions for Contemporary Scholars: The Library as Innovator and Partner Sarah E. Thomas University Librarian Cornell University Ithaca, NY.
Metadata: Its Functions in Knowledge Representation for Digital Collections 1 Summary.
Grey Literature, E-Repositories and Evaluation of Academic & Research Institutes. The case study of BPI e-repository Maria V. Kitsiou - Head Librarian,
Providing Access to Your Data: Access Mechanisms Robert R. Downs, PhD NASA Socioeconomic Data and Applications Center (SEDAC) Center for International.
Open Access: An Introduction Edward Shreeves Director, Collections and Content Development University of Iowa Libraries
GLOBAL BIODIVERSITY INFORMATION FACILITY Dr Vishwas Chavan Senior Programme Officer for DIGIT Data Citation Mechanism and.
©euroCRIS/Keith G JefferyCRIS Seminar Brussels Discussion Topics Keith G Jeffery President, euroCRIS
1 On the Record Report of the Library of Congress Working Group on the Future of Bibliographic Control Diane Boehr Head of Cataloging, NLM
1 Metadata Standards Directory Working Group… NIST WG-IG Meeting, Nov , Jane Greenberg, Alex Ball, Keith Jeffery, Rebecca Koskela.
THROUGH OR AROUND? SCIENTIFIC RESEARCH DATA AND THE INSTITUTIONAL REPOSITORY Panel Presentation for the International Conference on University Libraries.
U.S. Department of the Interior U.S. Geological Survey CDI Webinar Sept. 5, 2012 Kevin T. Gallagher and Linda C. Gundersen September 5, 2012 CDI Science.
Ms. Irene Onyancha ISTD/Library & Information Management Services United Nations Economic Commission for Africa The Second Session of the Committee on.
Providing Access to Your Data: Access Mechanisms Robert R. Downs, PhD NASA Socioeconomic Data and Applications Center (SEDAC) Center for International.
Supporting scientific communities by publishing data Dryad Digital Repository Peggy Schaeffer OpenAIRE/LIBER Workshop May 28, 2013 Ghent, Belgium.
JINR DOCUMENT SERVER: Current Status and Future Plans (From Open Access Repositories to Digital Libraries and to the Knowledge Infrastructure) I.Filozova.
1 Update to the Board of Research Data on Information CENDI Federal STI Managers’ Group CENDI Federal STI Managers’ Group January 31, 2012 Lisa Weber,
PLoS ONE Application Journal Publishing System (JPS) First application built on Topaz application framework Web 2.0 –Uses a template engine to display.
FY13 Accomplishments 1 Update to the Board of Research Data on Information CENDI INCREASING THE IMPACT OF FEDERALLY FUNDED SCIENCE September 23, 2013 Jerry.
HIVE: Enabling Common Language and Interdisciplinarity EPA-NIEHS Advancing Environmental Health Data Sharing and Analysis: Finding a Common Language June.
CRISP WP17 2/2 Data Continuum Achievements & Perspectives 18th March 2013Jean-François Perrin - Institut Laue Langevin - CRISP 2nd Annual Meeting1.
Data archiving and curation Ryan Scherle Data Repository Architect Dryad Digital Repository CurateGear January 8, 2014 You may reuse any of the original.
1 24 September BREAKOUT :30 1)Review of Metadata Standards Directory (DCC version and GitHub) 2)Introduction of Metadata Standards Catalog.
Networked Information Resources SPARC, E-prints & Open Access initiatives.
CBSOR,Indian Statistical Institute 30th March 07, ISI,Kokata 1 Digital Repository support for Consortium Dr. Devika P. Madalli Documentation Research &
Digital Commons & Open Access Repositories Johanna Bristow, Strategic Marketing Manager APBSLG Libraries: September 2006.
BMJ and Data Sharing Claire Bower, Digital Communications
Now launched! Visit nature.com/scientificdata Honorary Academic Editor Susanna-Assunta Sansone Advisory.
1 The Dryad Data Repository: Metadata Workflows and Processes 2nd Data Management Workshop November 28th – 29th 2014 University of Cologne, Germany Jane.
Jane Greenberg, Associative Professor and Director, SILS Metadata Research Center, School of Information and Library Science, University of North Carolina.
Symposium on Global Scientific Data Infrastructures Panel Two: Stakeholder Communities in the DWF Ann Wolpert, Massachusetts Institute of Technology Board.
/Greenberg/NDS DataDryad.org and the interoperability continuum. Repositories and Interoperability 2nd National Data Service Consortium Workshop.
1 Metadata Data integration for tackling global environmental challenges - Rebecca Koskela, Keith Jeffery, Jane Greenberg, Alex Ball.
Publishing & Citing Research Data Arun Prakash. Agenda  Introduction  Why is Data publishing important ?  Ongoing Work  Role of Semantics.
Evolving a Community Digital Repository: Lessons from Dryad Making data underlying scientific publications discoverable, freely reusable, and citable Bill.
Breakout Session 2.2: A sustainable GEO Information System of Systems Chair: Lorenzo Bigagli Rapporteur: Greg Yetman.
CAMP-4-DATA Alex Ball Jane Greenberg Keith Jeffery Rebecca Koskela Jian Qin Sandra Collins Johannes Keizer John Kunze José Merlo Eloy Rodrigues Robin Rice.
Jane Greenberg & the Dryad Team The DRYAD Repository ~~~~~~ INLS 720 visit to NESCent November 17, 2008.
Data Citation Implementation Pilot Workshop
Metadata Standards Directory Alex Ball, Jane Greenberg, Keith Jeffery, Rebecca Koskela.
Chelcie Rowell Jane Greenberg Metadata Research Center UNC-Chapel Hill CONTROLLED VOCABULARY STATUS & POTENTIAL IN DATA REPOSITORIES Authority Control.
Talking about the Scholarship Repository June 21, 2016 Charlotte Roh, University of San Francisco.
OceanDocs Digital Repository of Marine Science Research Outputs
ACS 2016 Moving research forward with persistent identifiers
IEEE Big Data 2014 Jane Greenberg, Adrian Ogletree
Data publishing from the viewpoint of a biodiversity publisher
Introducing da|raSearchNet
IDEALS at the University Of Illinois: A Case Study of Integration Between an IR and Library Discovery Systems Sarah L. Shreeves University of Illinois.
Publishing Solutions for Contemporary Scholars: The Library as Innovator and Partner Sarah E. Thomas University Librarian Cornell University Ithaca, NY.
Presentation transcript:

/Greenberg Metadata Quality and Capital Disseminators and Service Providers November 20, 2014 Jane Greenberg Professor, College of Computing & Informatics Director, Metadata Research Center

/Greenberg Your data is only as good as your metadata Metadata is a first class object

Toothbrush

/Greenberg The topic… (DRYAD)  Good enough is not bad (DRYAD) (CAPITAL)  ROI – return on investment (CAPITAL) (COMMUNITY)…. time permitting  RDA – Research Data Alliance (COMMUNITY)…. time permitting

/Greenberg

/Greenberg Pre-populated metadata field

/Greenberg

/Greenberg Data downloads  reuse  citation Observations, motivating study of metadata capital 1.Metadata generation costs money a BIG part 2.Metadata reuse is a BIG part of Dryad’s workflow 3.Metadata reuse via OAI 4.Metadata reuse via data sharing, reuse, and repurposing Download times

JournalRe. Wrkfl Blackout AmNtrlNN MBENN BioRiskYN BMJ Open YN …. Y TypeTotal30 days Data packages Data files Journals36172 Authors Downloads Journals (80+…PLOS): ntegratedJournals ntegratedJournals X >10GB = $15,$10+

/Greenberg Technology DSpace DOIs via CDL/DataCite CC0 ( + data) Integration with specialized repositories and databases  Federated searching with TreeBASE and KNB LTER  TreeBASE submission (OAI-PMH)  GenBank (currently in development) Governance “non-profit status, 12 member Board of Directors”  Sets policy, goals science, journals, societies, OCLC, MS  2006 Dryad development – NESCent + Stakeholders: journals, publishers and scientific societies, and researchers.  : Interim Board $ PAYMENT-Sept. 1,2014

/Greenberg

/Greenberg Singapore Framework Dryad DCAP, ver. 3.0  bibo (The Bibliographic Ontology)  dcterms (Dublin Core terms)  dryad (Dryad)  DwC (Darwin Core) Vision 1.Simple: automatic metadata gen; heterogeneous datasets *Data-package centric 2.Interoperable: harvesting, cross- system searching 3.Semantic Web compatible : sustainable; supporting machine processing Greenberg, et al, 2009, Metadata Best Practice for a Scientific Data Repository, JLM, DOI: /

/Greenberg Metadata research & development 1.Curation workflow - cognitive walkthroughs 2.Dryad metadata scheme development - crosswalk analyses (Dube, et al, 2007; Carrier, et al, 2007; White et al., 2008, Greenberg, et al, 2010; Greenberg 2009; 2010) 3.Metadata reuse - content analysis (Greenberg, IDCC Research Summit, 2010) 4.Instantiation - multi-method study (comprehensions assessment) (Greenberg, RDAP, 2010, UNAM 2012) 5.Name-authority control - exploratory study (Haven, 2009, INLS 720) 6.KO/metadata community practices - Concurrent triangulation mixed methods (survey + simulation experiment) (White, 2010, ASIST, 2010 JLM) 7.Metadata functions - quantitative categorical analysis (Willis, Greenberg, and White, 2010, CODATA, 2012, JASIST) (HIVE) 8.Vocabulary needs (HIVE) – mapping study (Greenberg, 2009, CCQ; Scherle, 2010, Code4Lib) 9.Metadata theory – deductive analysis (Greenberg, 2009)

Interoperability slope Dublin Core application profile OAI-PMH DOI DataCite DataONE TR: Data Citation Index Elsevier, Science Direct Semantic ontologies Researcher names Agency/ institution

/Greenberg

Package metadata harvested from Subj. 177 (gr. 97%, rd. 2%, bl. 1%) Contr. 101 (gr. 99%, bl. 1%)

/Greenberg The leap - capital to metadata capital  An economic concept (Weber, 1905; Smith’s, 1776) Business and operations (net gains or losses) Finances, goods and services, and public needs Intellectual capital, social capital a tangible result, value increase  Metadata as an asset, a product Reuse of good quality metadata increase value of initial investment Poor quality may reduce metadata capital ? Metadata reuse prevalence Cooperative cataloging, CIP, ISBD, MARC, FRBR, LCC, VIAF, OAI-PMH, CrossRef, PubMed, Zotero, BibTex, DataCite. Linked data/Semantic Web, PIDs, etc.

Modified Capital- sigma notation Reuse  Cost / value n R + ∑ a i = R + a 1 + a 2 +a 3 + …a n i=1 R = value of the metadata record i= number of usages a = incremental increase in value n = maximum number of reuse

/Greenberg Author/Submitter | Curator 100 metadata instantiations 8 of 12 metadata properties had 50% or greater 5 of 8 confirmed reuse at 80% or higher. Basic bib. vs. complex

Author Subject Dcterms.spatial   DwC.ScientificName

linked data Modified Capital-sigma notation for linked data Cost / value Reuse of linked data concept/URI P = Determined by the number of terms in an ontology, labor hours to generate, integrate, etc,

25 HIVE) Helping Interdisciplinary Vocabulary Engineering ( HIVE) C V cost, interoperability, and usability constraints  C V cost, interoperability, and usability constraints  Linked Open Vocabulary initiative, to support inter/transdisciplinary….  SKOS (a little dumb)  AMG + machine learning approach for integrating discipline terminologies

/Greenberg ~~~~Amy  Meet Amy Zanne. She is a botanist.  Like every good scientist, she publishes, and she deposits data in Dryad. Amy’s data

/Greenberg

/Greenberg Successive growth rates N ∑ i c = Θ (n c +1) i=1 Cycles… What about successive growth rate tied to a concept? A concept can be in ~ vernacular to canonical fall by the wayside, less popular out (deprecated)

/Greenberg Conclusion…other Valuation Approaches  Market cap of Facebook per user: $40 – $300  Revenues per record per user: $4 – $7 per year Facebook Experian  Market prices of personal data: $0.50 for street address $2.00 for date of birth $8 for social security number $3 for driver’s license number $35 for military record SOURCE: OECD. Exploring the Economics of Personal Data: A Survey of Methodologies for Measuring Monetary Value. OECD Digital Economy Papers. Office for Economic Cooperation and Development Publishing, 2013.

Concluding remarks  Interest….traction  Limitations: bad data, cost/value  We should care about cost  Metadata capital can contextualize  Generic formula for further research

/Greenberg Metadata Standards Directory Working Group…. Jane Greenberg, Alex Ball, Keith Jeffery, Rebecca Koskela

/Greenberg “…develop a collaborative, open directory of metadata standards applicable to scientific data” Stakeholders: Researchers, data managers, data scientists, tool developers, repositories, agencies, societies (RDA’s growing community) Goals and workplan - DCC Disciplinary Directory: standards standards

/Greenberg Acknowledgments  Dryad Consortium Board, journal partners, and data authors  NESCent: Laura Wendell (Executive Director), Hilmar Lapp, Heather Piwowar, Peggy Schaeffer, Ryan Scherle, Todd Vision (PI)  **Drexel/UNC : Jose R. Pérez- Agüera, Sarah Carrier, Elena Feinstein, Lina Huang, Robert Losee, Hollie White, Craig Willis, Jane Smith, Shea Swuager, Liz Turner, Christine Mayo, Adrian Ogletree, Erin Clary  U British Columbia: Michael Whitlock  NCSU Digital Libraries: Kristin Antelman  HIVE: Library of Congress, USGS, and The Getty Research Institute; and workshop hosts  Yale/TreeBASE: Youjun Guo, Bill Piel  DataONE: Rebecca Koskela, Bill Michener, Dave Veiglais, and many others  British Library: Lee-Ann Coleman, Adam Farquhar, Brian Hole  Oxford University: David Shotton

/Greenberg Facebook: Dryad Metsdata Reserch Center: Facebook: Dryad Metsdata Reserch Center:

/Greenberg Sustainability: Plan Comparison Payment PlanMemberNon-memberMinimum purchase 1. Voucher Plan USD$65 per data package USD$70 per data package 25 vouchers 2. Deferred Payment Plan USD$70 per data package USD$75 per data package 1 yr contract 3. Subscription Plan Annual fee based on USD$25 per published research article Annual fee based on USD$30 per published research article 2 yr contract For individuals: Pay on acceptance NA USD$80 per data package, payable by the submitter 1 data package

/Greenberg More on grown and sustainability  Membership: membershipOverviewhttp://datadryad.org/pages/ membershipOverview  Pricing and sponsorship of deposits: nghttp://datadryad.org/pages/prici ng  Journal integration:  tion tion