“Classifying Scientific Data Objects with Bibliographic Relationship

Slides:



Advertisements
Similar presentations
Current State of Play in Digital Preservation Peter B. Hirtle Cornell University Library Society of American Archivists.
Advertisements

Putting the Pieces Together Grace Agnew Slide User Description Rights Holder Authentication Rights Video Object Permission Administration.
Dublin Core for Digital Video: Overview of the ViDe Application Profile.
OLAC Metadata Steven Bird University of Melbourne / University of Pennsylvania OLAC Workshop 10 December 2002.
OLAC Process and OLAC Protocol: A Guided Tour Gary F. Simons SIL International ___________________________ OLAC Workshop 10 Dec 2002, Philadelphia.
Introduction The field of evolutionary biology draws from ecology, paleontology, population genetics, physiology, systematics, and new biological sub disciplines.
Theories of Evolution and Cultural Diffusion: The Dryad Repository Case Study for Understanding Changes in Organizing Information Practices ~~~~~~ ~~~~~~
Jane Greenberg SILS/Metadata Research Center School of Info. & Library Science Univ. of North Carolina at Chapel Hill The DRYAD Repository.
Building Support for a Discipline-Based Data Repository Ryan Scherle 1, Sarah Carrier 2, Jane Greenberg 2, Hilmar Lapp 1, Abbey Thompson 2, Todd Vision.
Ryan Scherle and Jane Greenberg. A Repository of Data Underlying Journal Articles.
Toward a Data Repository for Evolutionary Biology: Toward a Data Repository for Evolutionary Biology: Jane Greenberg, Associate Professor, Director SILS/Metadata.
Evolutionary biology Population genetics Systematics Paleontology Botany and Zoology Genomics Ecology Medicine Agriculture Anthropology Bioinformatics.
The Dryad Data Repository Ryan Scherle 1, Hilmar Lapp 1, Amol Bapat 2, Sarah Carrier 2, Jane Greenberg 2, Peggy Schaeffer 1, Todd Vision 1,3, Hollie White.
DLI Training Nesstar Workshop
DRIVER Long Term Preservation for Enhanced Publications in the DRIVER Infrastructure 1 WePreserve Workshop, October 2008 Dale Peters, Scientific Technical.
1 Geneva, October 2008 YUN Young-Woo IP INFORMATION & WIPO STANDARDS.
Metadata workshop, June The Workshop Workshop Timetable introduction to the Go-Geo! project metadata overview Go-Geo! portal hands on session.
Issues and approaches to preservation metadata Michael Day UKOLN: UK Office for Library and Information Networking University of Bath
Distributed Service Registries Workshop, July 2005 Slide 1 NISO Metasearch Initiative Registries Robert Sanderson Dept. of Computer Science University.
Metadata vocabularies and ontologies Dr. Manjula Patel Technical Research and Development
Community, Consensus, and the Trajectory of Progress Reflections on the Dublin Core experience and what it tells us about the future Stuart Weibel OCLC.
RESEARCH LIBRARY Content Packaging for Complex Objects MPEG – 21 1 February 2007 Frances Knudson Repository Team Los Alamos National Laboratory Research.
UKOLN, University of Bath
An overview of collection-level metadata Applications of Metadata BCS Electronic Publishing Specialist Group, Ismaili Centre, London, 29 May 2002 Pete.
A centre of expertise in digital information management UKOLN is supported by: British Academy e-Resources Policy Review: UKOLN Report.
Pure Silver Reusing and Repurposing Bibliographic Data in a Current Research Information System and Institutional Repository 15 September.
Andy Powell, Eduserv Foundation Feb 2007 The Dublin Core Abstract Model – a packaging standard?
February Harvesting RDF metadata Building digital library portals with harvested metadata workshop EU-DL All Projects concertation meeting DELOS.
A centre of expertise in data curation and preservation DigCCur2007 Symposium, Chapel Hill, N.C., April 18-20, 2007 Co-operation for digital preservation.
From content standards to RDF Gordon Dunsire Presented at AKM 15, Porec, 2011.
Jane Greenberg, Professor and Director, Metadata Research Center School of Information And Library Science University of North Carolina at Chapel Hill.
Collections and services in the information environment JISC Collection/Service Description Workshop, London, 11 July 2002 Pete Johnston UKOLN, University.
1 A Case Study in E- Science: Building Ecological Informatics Solutions for Multi-Decadal Research ARL/CNI 2008 Conference Washington, DC 16 October 2008.
A centre of expertise in digital information management UKOLN is supported by: The Dublin Core Application Profile for Scholarly Works.
October 28, 2003Copyright MIT, 2003 METS repositories: DSpace MacKenzie Smith Associate Director for Technology MIT Libraries.
Facebook for scientists Titus Schleyer et al. 1 of 38 Digital Vita : Leveraging Personal Information Management Practices to Facilitate Research Collaborations.
Data archiving in evolutionary biology Michael Whitlock.
An Introduction to Metadata by Wendy Duff ECURE 2000 October 6, 2000.
Metadata: An Introduction By Wendy Duff October 13, 2001 ECURE.
The RDF meta model: a closer look Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations.
Metadata (for the data users downstream) RFC GIS Workshop July 2007 NOAA/NESDIS/NGDC Documentation.
ORGANIZING AND STRUCTURING DATA FOR DIGITAL PROJECTS Suzanne Huffman Digital Resources Librarian Simpson Library.
Teaching Metadata and Networked Information Organization & Retrieval The UNT SLIS Experience William E. Moen School of Library and Information Sciences.
1 © Netskills Quality Internet Training, University of Newcastle Metadata Explained © Netskills, Quality Internet Training.
Metadata: An Overview Katie Dunn Technology & Metadata Librarian
Research Data Management Services Katherine McNeill Social Sciences Librarians Boot Camp June 1, 2012.
The Metadata Object Description Schema (MODS) NISO Metadata Workshop May 20, 2004 Rebecca Guenther Network Development and MARC Standards Office Library.
Ms. Irene Onyancha ISTD/Library & Information Management Services United Nations Economic Commission for Africa The Second Session of the Committee on.
A Metadata Application Profile for the DRIADE Project Sarah Carrier, Jed Dube, Jane Greenberg March 13, 2007 _____________________.
Supporting scientific communities by publishing data Dryad Digital Repository Peggy Schaeffer OpenAIRE/LIBER Workshop May 28, 2013 Ghent, Belgium.
Standards and tools for publishing biodiversity data Yu-Huang Wang June 25, 2012.
Lifecycle Metadata for Digital Objects (INF 389K) September 18, 2006 The Big Metadata Picture, Web Access, and the W3C Context.
HIVE: Enabling Common Language and Interdisciplinarity EPA-NIEHS Advancing Environmental Health Data Sharing and Analysis: Finding a Common Language June.
Archival Information Packages for NASA HDF-EOS Data R. Duerr, Kent Yang, Azhar Sikander.
Data archiving and curation Ryan Scherle Data Repository Architect Dryad Digital Repository CurateGear January 8, 2014 You may reuse any of the original.
1 Metadata –Information about information – Different objects, different forms – e.g. Library catalogue record Property:Value: Author Ian Beardwell Publisher.
Dryad A digital data repository A typical published data package Contains data that belongs in a specialized repository (e.g. Genbank, Treebase, Morphbank,
Evidence from Metadata INST 734 Doug Oard Module 8.
Jane Greenberg, Associative Professor and Director, SILS Metadata Research Center, School of Information and Library Science, University of North Carolina.
/Greenberg/NDS DataDryad.org and the interoperability continuum. Repositories and Interoperability 2nd National Data Service Consortium Workshop.
Metadata and Meta tag. What is metadata? What does metadata do? Metadata schemes What is meta tag? Meta tag example Table of Content.
Metadata Issues Underlying the Development of a Data Repository for Evolutionary Biology Sarah Carrier, SILS, Master’s Student Jackson Dube, Visiting Scholar,
Jane Greenberg & the Dryad Team The DRYAD Repository ~~~~~~ INLS 720 visit to NESCent November 17, 2008.
Describing resources II: Dublin Core CERN-UNESCO School on Digital Libraries Rabat, Nov 22-26, 2010 Annette Holtkamp CERN.
Ecological Society of America Data Sharing Workshops Bruce Wilson, on behalf of Cliff Duke.
Slides Template for Module 3 Contextual details needed to make data meaningful to others CC BY-NC.
Document, Index, Discover, Access
Introduction to Metadata
Data Management: Documentation & Metadata
Metadata in Digital Preservation: Setting the Scene
Presentation transcript:

“Classifying Scientific Data Objects with Bibliographic Relationship Taxonomies” ~~~~~~ ASIS&T 2007 Annual Meeting Sunday, October 21, 2007 Jane Greenberg, Frances McColl Professor; Director, SILS/Metadata Research Center School of Information and Library Science Univ. of North Carolina at Chapel Hill janeg@ils.unc.edu

Overview DRIADE – (Digital Repository of Information and Data for Evolution) Accomplishments: basic metadata challenges Functional requirements, underlying metadata Instantiation Motivation Research design Preliminary results Conclusions http://www.caffedriade.com/ -adaptation -natural selection

Small science, open access Small science repositories (SSR) Knowledge Network for Biocomplexity (KNB), Marine Metadata Initiative (MMI) Evolutionary biology Publication process Supplementary data (Molecular Biology and Evolution, American Naturalists) “Author,” “deposition date,” not “subject” “species,” ”geo. locator” Data deposition (Genbank, TreeBase, Morphbank) NESCent and SILS/Metadata Research Center ecology, paleontology, population genetics, physiology, systematics + genomics

DRIADE’s Goals DRIADE Team NESCent PI: Todd Vision, Director of Informatics and Associate Professor, Biology, UNC Hilmar Lapp, Assistant Director of Informatics Ryan Scherle, Data Repository Architect UNC/SILS/MRC PI: Jane Greenberg, Associate Professor, SILS and MRC Sarah Carrier, Research Assistant Abbey Thompson, DRIADE R.A./SILS Masters Student Hollie White, Doctoral Fellow Amy Bouck, Biology, Ph.d.student DRIADE’s Goals One-stop deposition and shopping for data objects Support the acquisition, preservation, resource discovery, and reuse of heterogeneous digital datasets Balance a need for low barriers, with higher-level … data synthesis

One stop shopping— an option DRIADE (Dryad) Depositor/s One stop deposition Specialized Repositories Genbank TreeBase Morphbank PaleoDB DRIADE -Data objects supporting published research Journals & journal repositories One stop shopping— an option Researcher/s

Accomplishments: basic metadata challenges

Accomplishments: addressing basic metadata challenges Functional requirements Consensus building (Dec. 06) + SSR workshops (May ‘07) First class object, **SHARING & REUSE** Analysis metadata functions in existing repositories (JCDL, 2007) Functional model OAIS (Open Archival Information System), DSpace Level one application profile Namespace schemas: Dublin Core Data Documentation Initiative (DDI) Ecological Metadata Language (EML) PREMIS Darwin Core Modular scheme: Journal citation Data objects (Carrier, et al., 2007)

Functional requirements Project Goals/priorities GBIF KNB NSDL ICPSR MMI Heterogeneous digital datasets ▪ Long-term data stewardship Tools and incentives to researchers Minimize technical expertise and time required Intellectual property rights Datasets coupled w/published research

<DRIADE application profile> Bibliographic Citation Module dcterms:bibliographicCitation/Citation information DOI Data Object Module dc:creator/Name* dc:title/Data Set # dc:identifier/Data Set Identifier PREMIS:fixity/(hidden) dc:relation/DOI of Published Article* DDI:<depositr>/Depositor DDI:<contact>/Contact Information dc:rights/Rights Statement dc:description/Description # dc:subject/Keywords dc:coverage / Locality Required* dc:coverage/Date Range Required* dc:software/Software* dc:format/File Format dc:format/File Size dc:date/(Hidden) Required dc:date/Date Modified* Darwin Core: species/ Species, or Scientific* Key * = semi-automatic # = manual Everything else is automatic

DRIADE metadata application profile….organic… Level 1 – initial repository implementation Application profile: Preservation, access/resource discovery, (limited use of CVs) Level 2 – full repository implementation Level 1++ expanded usage, interoperability, preservation; administration; greater use of CV and authority control; data sharing and reuse Level 3 – “next generation” implementation Considering Web 2.0 functionalities, Semantic Web

Instantiation (Classifying Scientific Data Objects with Bibliographic Relationships Taxonomies)

Motivation DRIADE goals: Data objects = first class objects Data preservation, sharing, use/re-use, validation, repeatability Is it important to know history of a data object? How can we support accurate and effective tracking of the life-cycle of data object? Data objects = first class objects Data structures as works (Coleman, 2002) Units of analysis, intellectual products of activity Work = “propositions expressed (ideational content)”, and “expressions of the propositions” (Smiraglia, 2001) Data objects are content carriers (Greenberg, 2007)

Motivation Research and…to explain a“work”; bibliographic families; and instantiation Metadata and Bibliographic Control Models FRBR (Functional Requirements for Bibliographic Records) DCAM (Dublin Core Abstract Model) RDF (Resource Description Framework) Coleman (2002) Cutter (1904) Leaser (1999) Ranganathan: embodied/expressed Smiraglia (1999,2001, 2002,etc.) Tillet (1991, 1992) Vellucci (1995) Wilson (1983) Yee (1995)

Research design Research questions Method Objective: Build an open access repository that accurately reflects the “disciplinary knowledge structures” (relationship among data objects”) Research questions Do scientists (developing scientists) view data objects as works? Do they understand different “instantiations”? Do they think instantiation tracking is important? Method Instantiation identification test and survey Participants: Scientists, research and publication

Procedures, etc. Verify participant suitability, contextual introduction Read instantiation definitions and view instantiation diagram (next slide) Example Instantiation scenario Scenario: Sherry collects data on the survival and growth of the plant Borrichia frutescens (the bushy seaside tansy)… Question: What is the relationship between Sherry’s paper data sheet and her excel spreadsheet? Answer: Equivalent | Derivative | Whole-part | Sequential (circle one) 8 scenarios (randomized); survey

Data object relationships Equivalence Derivative Whole-part Sequential B (=data set A annotated) A (=data set) A (=data set in Excel) A (=same data set in SAS) A (=same data set on paper) C (=data set A revised) A1 (=part 1 of a data set) A2 (=part 2 of a data set) A (=data set) A1 (=a subset of A) Smiraglia, Tillet

Results 7 participants (Biochemistry, Biology, Botany/Plant Ecology) 3 Faculty/research associates; 21+ yrs. research/pub. 4 Research assistants; 3 (1-3 yrs.), 1 (5 yrs.) Instant./ rel. A B Failing A Failing B Equivalent 100% n/a Derivative 6 of 7 (86%) 1 BS 1 MS Whole/part 1 PhD Sequential 3 of 7 (43%) 2 BS,

Results Use of data by others? ht=halftime; s=seldom; na=no answer Instant./ rel. Created X data before Apply to Work “freq.” Other app. Support last publ. y/n Equivalent 7 6 (86%) 1 (h/t) 6/1 Derivative 5 (71%) 2 (h/t) Whole/part 2 (s) 4/2 (1 na) Sequential 5 3 (43%) 1 (h/t),1(s) 2/3 (2 na) Use of data by others? 1-5, 1 person 6-20, 3 people 21-99, 1 person 100+, 2 people (PhD) How important is it to track data life-cycle (relationships)? 0, not 3 very 4 extremely (2Ph.D, 1 MS, 1 BS)

Results (participant comments) Validity: “Whenever anything changes, there’s the ability to make mistakes” Sheer quantity of data: “We have 30 years worth of data, tracking changes is important” The impact of changes in scientific nomenclature: “As time goes on, datasets must be maintained to reflect current understanding of taxonomy and nomenclature, to allow connection of old data and attributes of that data to be associated correctly with new data and attributes of that data. This is a giant problem.”

Conclusions and next steps Use of data created by others increases over time In general, more seasoned scientists better grasp Sequential data presented the most difficulty (less seasoned sci.) Unanimous support, “very  extremely important” Bibliographic control relationship are applicable to describing the relationships between data objects Next steps: Continue to collect data (30-50) Study additional types of instantiations Develop applications that effectively track data object life-cycle How do we encode these relationships? Manually/automatically? Who should create them? Curator, scientists, collaboratively? Should we maintain a vocabulary? Consider implications for other disciplines

Thank you!! Jane Greenberg, Associate Professor, and Director, SILS Metadata Research Center janeg@ils.unc.edu