Chelcie Rowell Jane Greenberg Metadata Research Center UNC-Chapel Hill CONTROLLED VOCABULARY STATUS & POTENTIAL IN DATA REPOSITORIES Authority Control.

Slides:



Advertisements
Similar presentations
Getting Involved in OLAC Steven Bird University of Pennsylvania LREC Symposium: The Open Language Archives Community 29 May 2002.
Advertisements

Theories of Evolution and Cultural Diffusion: The Dryad Repository Case Study for Understanding Changes in Organizing Information Practices ~~~~~~ ~~~~~~
Building Support for a Discipline-Based Data Repository Ryan Scherle 1, Sarah Carrier 2, Jane Greenberg 2, Hilmar Lapp 1, Abbey Thompson 2, Todd Vision.
The Dryad Data Repository Ryan Scherle 1, Hilmar Lapp 1, Amol Bapat 2, Sarah Carrier 2, Jane Greenberg 2, Peggy Schaeffer 1, Todd Vision 1,3, Hollie White.
Improving Learning Object Description Mechanisms to Support an Integrated Framework for Ubiquitous Learning Scenarios María Felisa Verdejo Carlos Celorrio.
Curating Research: problems and policy Dale Peters Scientific Technical Manager DRIVER II.
E-learning and Libraries WSIS Forum, Geneva,11 May 2010 Tullio Basaglia, CERN Scientific Information Service, Geneva.
Helping Helping Interdisciplinary Vocabulary Engineering Ryan Scherle – National Evolutionary Synthesis Center Jose Aguera – University of North Carolina.
1 Quality Control in Scholarly Publishing. What are the Alternatives to Peer Review? William Y. Arms Cornell University.
The current state of Metadata - as far as we understand it - Peter Wittenburg The Language Archive - Max Planck Institute CLARIN Research Infrastructure.
Oregon Spatial Data Library Partnership Metadata Training OU Knight Library Eugene, Oregon December 3, 2009 Kuuipo Walsh Institute for Natural Resources.
The Data Curation Profile IASSIST 2010 Jake Carlson Data Research Scientist Purdue University Libraries.
River Campus Libraries (c) 2004 University of Rochester Libraries 1 CNI Task Force, Fall 2004 Nancy Foster, Lead Anthropologist University of Rochester.
Information Seeking Behavior of Scientists Brad Hemminger School of Information and Library Science University of North Carolina at Chapel.
What is DLESE (part 1) Shelley Olds University Corporation for Atmospheric Research DLESE Program Center July 17 – 22, Resources.
Digital library for Earth System Education Shelley Olds University Corporation for Atmospheric Research DLESE Program Center July 17 – 22,
Introduction to Implementing an Institutional Repository Delivered to Technical Services Staff Dr. John Archer Library University of Regina September 21,
Why search again and again? Encore and next-generation searching at UQ Keith Webster University Librarian & Director of Learning Services.
Supporting Data Management Across Disciplines Katherine McNeill Massachusetts Institute of Technology IASSIST Annual Conference 2010.
Data Sources & Using VIVO Data Visualizing Scholarship VIVO provides network analysis and visualization tools to maximize the benefits afforded by the.
RDA Data Foundation and Terminology (DFT) IG: Introduction Prepared for RDA Plenary San Diego, March 9, 2015 Gary Berg-Cross, Raphael Ritz, Co-Chairs DFT.
Educause October 29, 2001 A GEM of a Resource: The Gateway to Educational Materials Copyright Nancy Virgil Morgan, This work is the intellectual.
Status of ICT structure, infrastructure and applications existed to manage and disseminate information and knowledge of Agricultural Biotechnology Innovations.
By Carrie Moran. To examine the Metadata Object Description Schema (MODS) metadata scheme to determine its utility based on structure, interoperability.
Grey Literature, E-Repositories and Evaluation of Academic & Research Institutes. The case study of BPI e-repository Maria V. Kitsiou - Head Librarian,
Key integrating concepts Groups Formal Community Groups Ad-hoc special purpose/ interest groups Fine-grained access control and membership Linked All content.
Teaching Metadata and Networked Information Organization & Retrieval The UNT SLIS Experience William E. Moen School of Library and Information Sciences.
Using Metadata Skills for a Course Inventory Lee Richardson Health Sciences Library University of North Carolina at Chapel Hill ALA Annual Conference June.
RDA Data Foundation and Terminology (DFT) IG: Introduction Prepared for RDA Plenary San Diego, March 9, 2015 Gary Berg-Cross, Raphael Ritz, Co-Chairs DFT.
Managing the Record of Research At the Smithsonian Using SIdora SAA Research Forum August 12, 2014.
Data on the Web Life Cycle Bernadette Farias Lóscio March, 2014.
Research Data Management Services Katherine McNeill Social Sciences Librarians Boot Camp June 1, 2012.
Addressing the Metadata Bottleneck* *By Developing and Evaluating an Online Tool to Support Non-specialists to Evaluate Dublin Core Metadata Records Michael.
RDA Data Foundation and Terminology (DFT) IG: Introduction Prepared for RDA 6 th Plenary Paris, Sept. 25, 2015 Gary Berg-Cross, Raphael Ritz Co-Chairs.
THROUGH OR AROUND? SCIENTIFIC RESEARCH DATA AND THE INSTITUTIONAL REPOSITORY Panel Presentation for the International Conference on University Libraries.
JENNIE MATHEWS ST. JOHN’S UNIVERSITY LIS 239 Can the Addition of Social Software Tools & Tags Improve the Productivity of an Academic Library OPAC? 1.
Bio-Medical Information Retrieval from Net By Sukhdev Singh.
Ms. Irene Onyancha ISTD/Library & Information Management Services United Nations Economic Commission for Africa The Second Session of the Committee on.
Enhancing Content Visibility in Institutional Repositories: Maintaining Metadata Consistency Across Digital Collections Ahmet Meti Tmava and Daniel Gelaw.
Testing and Improving Interoperability The Z39.50 Interoperability Testbed William E. Moen School of Library and Information Sciences Texas Center for.
Metadata Lessons Learned Katy Ginger Digital Learning Sciences University Corporation for Atmospheric Research (UCAR)
Usability, the User Experience & Interface Design: The Role of Reference July 30, 2013.
HIVE: Enabling Common Language and Interdisciplinarity EPA-NIEHS Advancing Environmental Health Data Sharing and Analysis: Finding a Common Language June.
Michael Witt Interdisciplinary Research Librarian & Assistant Professor Purdue Libraries & Distributed Data Curation Center (D2C2) Eliciting.
1 24 September BREAKOUT :30 1)Review of Metadata Standards Directory (DCC version and GitHub) 2)Introduction of Metadata Standards Catalog.
NEON non-specialist use case; Science data reuse in a classroom Peter Fox Brian Wee Patrick West 1
Dryad Management Board Meeting Friday, May 22 1:30 p.m. Session 3: Software development timeline and priorities Slides pprepared by the Dryad development.
Towards a Reference Quality Model for Digital Libraries Maristella Agosti Nicola Ferro Edward A. Fox Marcos André Gonçalves Bárbara Lagoeiro Moreira.
ARL Workshop on New Collaborative Relationships: The Role of Academic Libraries in the Digital Data Universe September 26-27, 2006 ARL Prue.
Digital Repositories: Concepts and Issues By Devendra. S. Gobbur (Sr) Assistant Librarian, Gulbarga University, Gulbarga. 10 NOV, NOV, 2009.
1 Introduction to Metadata: The Role of the Metadata Editor Institutional Repository Workshop 1-3 April 2009 Marguerite Nel Metadata editor
Achieving Semantic Interoperability at the World Bank Designing the Information Architecture and Programmatically Processing Information Denise Bedford.
4 way comparison of Data Citation Principles: Amsterdam Manifesto, CoData, Data Cite, Digital Curation Center FORCE11 Data Citation Synthesis Group.
Preliminary Findings Baseline Assessment of Scientists’ Data Sharing Practices Carol Tenopir, University of Tennessee
Ontologies for the Semantic Web Prepared By: Tseliso Molukanele Rapelang Rabana Supervisor: Associate Professor Sonia Burman 20 July 2005.
Data Citation Implementation Pilot Workshop
Metadata Standards Directory Alex Ball, Jane Greenberg, Keith Jeffery, Rebecca Koskela.
Automating the Audit: Updates from the Metadata Upgrade Project at the University of Houston Libraries Andrew Weidner, Metadata Librarian Santi Thompson,
Access to Electronic Journals and Articles in ARL Libraries By Dana M. Caudle Cecilia M. Schmitz.
Data Sources & Using VIVO Data Visualizing Science VIVO provides network analysis and visualization tools to maximize the benefits afforded by the data.
1 Using DLESE: Finding Resources to Enhance Teaching Shelley Olds Holly Devaul 11 July 2004.
DataNet Collaboration
Using DLESE: Finding Resources to Enhance Teaching
Access  Discovery  Compliance  Identification  Preservation
EOSCpilot Skills Landscape & Framework
A Case Study for Synergistically Implementing the Management of Open Data Robert R. Downs NASA Socioeconomic Data and Applications.
Time Trend of PTA Membership and Organizational Presence Source: Authors’ calculations updating Theda Skocpol’s data on PTA membership to 2017 and including.
Bird of Feather Session
Bivariate Loess Plots of School Demographic Composition and PTA Presence Source: Authors’ calculations using the National Center for Charitable Statistics,
Elementary School PTAs by Local Free Lunch Participation Rates, North Carolina Research Triangle, 2015 Source: Author-generated map using the National.
Presentation transcript:

Chelcie Rowell Jane Greenberg Metadata Research Center UNC-Chapel Hill CONTROLLED VOCABULARY STATUS & POTENTIAL IN DATA REPOSITORIES Authority Control Interest Group ALA Annual 2013

 Research Impetus  Research Goals  Methodology  Results  Conclusions OVERVIEW OF NSF-SPONSORED RESEARCH STUDY

1 RESEARCH IMPETUS

 COST Vocabularies are expensive to create and maintain  INTEROPERABILITY Vocabularies sometimes use standards (Z39.19, SKOS) but are often developed independently  USABILITY Vocabularies are difficult to use for both information professionals and content creators  INTERDISCIPLINARITY Collections are increasingly interdisciplinary CONTROLLED VOCABULARY CHALLENGES FOR DATA REPOSITORIES

 A curated general-purpose repository for data underlying journal articles; member node of DataONE  Staff performed vocabulary analysis mapping keywords from journals to 10 controlled vocabularies  Low percentage of keywords had an exact match with within each controlled vocabulary  So what? Dryad requires multiple controlled vocabularies within subject field alone  LCSH, MESH, NBII, ERIC, ITIS VOCABULARY NEEDS OF DRYAD, ONE DATA REPOSITORY

HELPING INTERDISCIPLINARY VOCABULARY ENGINEERING (HIVE) MODEL

HELPING INTERDISCIPLINARY VOCABULARY ENGINEERING (HIVE) CONCEPT BROWSER

HELPING INTERDISCIPLINARY VOCABULARY ENGINEERING (HIVE) INDEXER

2 RESEARCH GOALS

1.Identify controlled vocabularies currently in use by different data repositories 2.Examine potential facilitators and inhibitors of controlled vocabulary use by different repository stakeholders 3.Explore infrastructure for using controlled vocabularies in place at different data repositories 4.Develop framework for studying controlled vocabulary use across different roles associated with data repositories RESEARCH GOALS

3 METHODOLOGY

1 DRAFT SURVEY INSTRUMENT 2 PERFORM PILOT TESTING 3 REVISE SURVEY INSTRUMENT 4 1 ST DISTRIBUTION 5 PRELIMARY DATA ANALYSIS 6 2 nd DISTRIBUTION 7 FINAL DATA ANALYSIS RESEARCH PROCESS

 CODATA  DARTG  DC-SAM  EPA  JE  JISC Research Data Mgmt  PAMWG  RDA  RDAP  SIG-CR  SIG-STI  SE  STS-L  USGS WEB SURVEY DISTRIBUTED TO DATANET & DATA REPOSITORY STAKEHOLDERS

Data Contributor Q3Data CuratorQ13DeveloperQ13 DataNet Administrator Q22OtherQ22 ROLE WITHIN DATA REPOSITORY DETERMINES QUESTION PATH

Data Contributor Q3Data CuratorQ13DeveloperQ13 DataNet Administrator Q22OtherQ22 ROLE WITHIN DATA REPOSITORY DETERMINES QUESTION PATH ✔

Data Contributor Q3Data CuratorQ13DeveloperQ13 DataNet Administrator Q22OtherQ22 ROLE WITHIN DATA REPOSITORY DETERMINES QUESTION PATH ✔✔

4 RESULTS

PARTICIPANT POPULATION

CONTROLLED VOCABULARY USE: CHOICES SUPPLIED BY SURVEY None of the aboveLCSHMeSHTGN ITISNBIIEnvThes/LTERGO 8733 UATAGROVOCERICNALT 3110 TOTAL = 93 participants

CONTROLLED VOCABULARIES: SUPPLIED BY PARTICIPANTS

YesNo Don’t Know TOTAL Select from multiple controlled vocabularies when describing a single dataset Use software to generate suggested subject terms selected from a controlled vocabulary OF THE DATA CONTRIBUTORS WHO HAD NOT PERFORMED ONE OF THE ACTIONS BELOW, HOW MANY WOULD MAKE USE OF THAT FUNCTION IN THE NEXT 12 MONTHS?

YesNo Don’t Know TOTAL Select from multiple controlled vocabularies when describing a single dataset Use software to generate suggested subject terms selected from a controlled vocabulary OF THE DATA CURATORS WHOSE REPOSITORY DOES NOT SUPPORT ONE OF THE ACTIONS BELOW, HOW MANY WOULD SUPPORT THAT FUNCTION IN THE NEXT 12 MONTHS?

IF A TOOL WERE BUILT THAT SUPPORTED THE USE OF CONTROLLED VOCABULARIES WITHIN & ACROSS DATA REPOSITORIES, WHAT FEATURES WOULD THIS TOOL NEED? We would be more likely to use the tool if it was offered in the form of a web services API as opposed to a web site or a desktop application. Web services would make the tool platform-independent and easier to embed within our current suite of software aplications.

IF A TOOL WERE BUILT THAT SUPPORTED THE USE OF CONTROLLED VOCABULARIES WITHIN & ACROSS DATA REPOSITORIES, WHAT FEATURES WOULD THIS TOOL NEED? Ease of use, ease of ‘plugging’ into different services and software.

IF A TOOL WERE BUILT THAT SUPPORTED THE USE OF CONTROLLED VOCABULARIES WITHIN & ACROSS DATA REPOSITORIES, WHAT FEATURES WOULD THIS TOOL NEED? I would use such a tool to add preferred terms to records while keeping free-text tags in place.

IF A TOOL WERE BUILT THAT SUPPORTED THE USE OF CONTROLLED VOCABULARIES WITHIN & ACROSS DATA REPOSITORIES, WHAT FEATURES WOULD THIS TOOL NEED? [S]cience researchers are not familiar with the jargon of ‘controlled vocabularies’ and ‘ontologies.’ They need a tool that helps them connect the correct subject headings or keywords to their work, regardless of what scheme it is. They mostly don't care if it's LCSH or NBII – they just want the correct terms attached to their dataset.

IF A TOOL WERE BUILT THAT SUPPORTED THE USE OF CONTROLLED VOCABULARIES WITHIN & ACROSS DATA REPOSITORIES, WHAT FEATURES WOULD THIS TOOL NEED? My ‘wish list’ includes:  selection of specific vocabularies to be used in specific contexts  web services that support identification of candidate terms based on metadata content  tools for addressing shared terms in different vocabularies

12345MEAN Availability on WWW Openness to term suggestions Generation of suggested terms from selected controlled vocab Data storage Inter/national governance Update frequency Availability of terms as URIS In-house governance TOTAL81 LIMITATIONS: FACILITATORS & INHIBITORS OF CONTROLLED VOCABULARY USE

On a five point scale, with 1 being least important and 5 being most important, please rate how the following aspects FACILITATE your use of controlled vocabularies to describe scientific research data. LIMITATIONS: FACILITATORS & INHIBITORS OF CONTROLLED VOCABULARY USE 1 Low importance 2 Slightly important 3 Neutral 4 Moderately important 5 Very important Availability on the WWW High update frequency

On a five point scale, with 1 being least important and 5 being most important, please rate how the following aspects IMPEDE your use of controlled vocabularies to describe scientific research data. LIMITATIONS: FACILITATORS & INHIBITORS OF CONTROLLED VOCABULARY USE 1 Low importance 2 Slightly important 3 Neutral 4 Moderately important 5 Very important Unavailabilit y on the WWW Low update frequency

On a five point scale, with 1 being least important and 5 being most important, please rate how the following aspects FACILITATE your use of controlled vocabularies to describe scientific research data. LIMITATIONS: FACILITATORS & INHIBITORS OF CONTROLLED VOCABULARY USE 1 Low importance 2 Slightly important 3 Neutral 4 Moderately important 5 Very important Availability on the WWW High update frequency ✔ ✔

On a five point scale, with 1 being least important and 5 being most important, please rate how the following aspects IMPEDE your use of controlled vocabularies to describe scientific research data. LIMITATIONS: FACILITATORS & INHIBITORS OF CONTROLLED VOCABULARY USE 1 Low importance 2 Slightly important 3 Neutral 4 Moderately important 5 Very important Unavailabilit y on the WWW Low update frequency ✔ ✔

LIMITATIONS: WILD WILD WEST  No research designs on which to model ours  Population and sample difficult to define

5 CONCLUSIONS

 Multiple roles associated with data repositories would make use of the following functions:  Access to multiple vocabularies at the time of indexing  Automatic generation of suggested terms  Diversity of understanding regarding what defines a “controlled vocabulary”  Long tail of controlled vocabularies actively in use  Clear ideas on how to design research about controlled vocabulary use by different data repository stakeholders CONCLUSIONS

PARTICIPATE! The survey will remain open until July 15.

KEEP IN TOUCH! Chelcie Juliet Rowell Digital Initiatives Librarian Z. Smith Reynolds Library Wake Forest University Jane Greenberg Professor, School of Information & Library Science Director, Metadata Research Center University of North Carolina at Chapel Hill

ACKNOWLEDGEMENTS This study was supported by the U.S. National Science Foundation Grant No. ACI We would like to express our gratitude to the people who helped and supported us throughout the design and implementation of this research study, especially Rebecca Koskela, Laura Moyers, and Amber Budden of DataONE and Mary Whitton of RENCI, who were instrumental in helping us to disseminate the survey. We would also like to thank pilot testers of the first draft of our survey instrument as well as all survey participants.

 Greenberg, J. (2009). Theoretical Considerations of Lifecycle Modeling: An Analysis of the Dryad Repository Demonstrating Automatic Metadata Propagation, Inheritance, and Value System Adoption. Cataloging & Classification Quarterly, 47(3): 380–402.  Greenberg, J. et al. (2011). HIVE: Helping Interdisciplinary Vocabulary Engineering. Bulletin of the American Society for Information Science and Technology, 37(4). etAl.html. etAl.html  Helping Interdisciplinary Vocabulary Engineering (HIVE) Demonstration System.  Helping Interdisciplinary Vocabulary Engineering (HIVE) Wiki. REFERENCES

 Tenopir, C. et al. (2011) Data Sharing by Scientists: Practices and Perceptions. PLoS ONE, 6(6): 1–21.  Willis, C. et al. (2012). Analysis and Synthesis of Metadata Goals for Scientific Data. Journal of the American Society for Information Science and Technology, 63 (8): 1505–1520. REFERENCES