Download presentation
Presentation is loading. Please wait.
Published byErica Flowers Modified over 8 years ago
1
Chelcie Rowell Jane Greenberg Metadata Research Center UNC-Chapel Hill CONTROLLED VOCABULARY STATUS & POTENTIAL IN DATA REPOSITORIES Authority Control Interest Group ALA Annual 2013
2
Research Impetus Research Goals Methodology Results Conclusions OVERVIEW OF NSF-SPONSORED RESEARCH STUDY
3
1 RESEARCH IMPETUS
4
COST Vocabularies are expensive to create and maintain INTEROPERABILITY Vocabularies sometimes use standards (Z39.19, SKOS) but are often developed independently USABILITY Vocabularies are difficult to use for both information professionals and content creators INTERDISCIPLINARITY Collections are increasingly interdisciplinary CONTROLLED VOCABULARY CHALLENGES FOR DATA REPOSITORIES
5
A curated general-purpose repository for data underlying journal articles; member node of DataONE Staff performed vocabulary analysis mapping keywords from journals to 10 controlled vocabularies Low percentage of keywords had an exact match with within each controlled vocabulary So what? Dryad requires multiple controlled vocabularies within subject field alone LCSH, MESH, NBII, ERIC, ITIS VOCABULARY NEEDS OF DRYAD, ONE DATA REPOSITORY
6
HELPING INTERDISCIPLINARY VOCABULARY ENGINEERING (HIVE) MODEL
8
HELPING INTERDISCIPLINARY VOCABULARY ENGINEERING (HIVE) CONCEPT BROWSER
9
HELPING INTERDISCIPLINARY VOCABULARY ENGINEERING (HIVE) INDEXER
10
2 RESEARCH GOALS
11
1.Identify controlled vocabularies currently in use by different data repositories 2.Examine potential facilitators and inhibitors of controlled vocabulary use by different repository stakeholders 3.Explore infrastructure for using controlled vocabularies in place at different data repositories 4.Develop framework for studying controlled vocabulary use across different roles associated with data repositories RESEARCH GOALS
12
3 METHODOLOGY
13
1 DRAFT SURVEY INSTRUMENT 2 PERFORM PILOT TESTING 3 REVISE SURVEY INSTRUMENT 4 1 ST DISTRIBUTION 5 PRELIMARY DATA ANALYSIS 6 2 nd DISTRIBUTION 7 FINAL DATA ANALYSIS RESEARCH PROCESS
14
CODATA DARTG DC-SAM EPA JE JISC Research Data Mgmt PAMWG RDA RDAP SIG-CR SIG-STI SE STS-L USGS WEB SURVEY DISTRIBUTED TO DATANET & DATA REPOSITORY STAKEHOLDERS
15
Data Contributor Q3Data CuratorQ13DeveloperQ13 DataNet Administrator Q22OtherQ22 ROLE WITHIN DATA REPOSITORY DETERMINES QUESTION PATH
16
Data Contributor Q3Data CuratorQ13DeveloperQ13 DataNet Administrator Q22OtherQ22 ROLE WITHIN DATA REPOSITORY DETERMINES QUESTION PATH ✔
17
Data Contributor Q3Data CuratorQ13DeveloperQ13 DataNet Administrator Q22OtherQ22 ROLE WITHIN DATA REPOSITORY DETERMINES QUESTION PATH ✔✔
18
4 RESULTS
19
PARTICIPANT POPULATION 84 112 54 25 54
20
CONTROLLED VOCABULARY USE: CHOICES SUPPLIED BY SURVEY None of the aboveLCSHMeSHTGN 5014109 ITISNBIIEnvThes/LTERGO 8733 UATAGROVOCERICNALT 3110 TOTAL = 93 participants
21
CONTROLLED VOCABULARIES: SUPPLIED BY PARTICIPANTS
22
YesNo Don’t Know TOTAL Select from multiple controlled vocabularies when describing a single dataset 144119 Use software to generate suggested subject terms selected from a controlled vocabulary 224430 OF THE DATA CONTRIBUTORS WHO HAD NOT PERFORMED ONE OF THE ACTIONS BELOW, HOW MANY WOULD MAKE USE OF THAT FUNCTION IN THE NEXT 12 MONTHS?
23
YesNo Don’t Know TOTAL Select from multiple controlled vocabularies when describing a single dataset 2281141 Use software to generate suggested subject terms selected from a controlled vocabulary 3372262 OF THE DATA CURATORS WHOSE REPOSITORY DOES NOT SUPPORT ONE OF THE ACTIONS BELOW, HOW MANY WOULD SUPPORT THAT FUNCTION IN THE NEXT 12 MONTHS?
24
IF A TOOL WERE BUILT THAT SUPPORTED THE USE OF CONTROLLED VOCABULARIES WITHIN & ACROSS DATA REPOSITORIES, WHAT FEATURES WOULD THIS TOOL NEED? We would be more likely to use the tool if it was offered in the form of a web services API as opposed to a web site or a desktop application. Web services would make the tool platform-independent and easier to embed within our current suite of software aplications.
25
IF A TOOL WERE BUILT THAT SUPPORTED THE USE OF CONTROLLED VOCABULARIES WITHIN & ACROSS DATA REPOSITORIES, WHAT FEATURES WOULD THIS TOOL NEED? Ease of use, ease of ‘plugging’ into different services and software.
26
IF A TOOL WERE BUILT THAT SUPPORTED THE USE OF CONTROLLED VOCABULARIES WITHIN & ACROSS DATA REPOSITORIES, WHAT FEATURES WOULD THIS TOOL NEED? I would use such a tool to add preferred terms to records while keeping free-text tags in place.
27
IF A TOOL WERE BUILT THAT SUPPORTED THE USE OF CONTROLLED VOCABULARIES WITHIN & ACROSS DATA REPOSITORIES, WHAT FEATURES WOULD THIS TOOL NEED? [S]cience researchers are not familiar with the jargon of ‘controlled vocabularies’ and ‘ontologies.’ They need a tool that helps them connect the correct subject headings or keywords to their work, regardless of what scheme it is. They mostly don't care if it's LCSH or NBII – they just want the correct terms attached to their dataset.
28
IF A TOOL WERE BUILT THAT SUPPORTED THE USE OF CONTROLLED VOCABULARIES WITHIN & ACROSS DATA REPOSITORIES, WHAT FEATURES WOULD THIS TOOL NEED? My ‘wish list’ includes: selection of specific vocabularies to be used in specific contexts web services that support identification of candidate terms based on metadata content tools for addressing shared terms in different vocabularies
29
12345MEAN Availability on WWW121525384.20 Openness to term suggestions152032233.88 Generation of suggested terms from selected controlled vocab 222928203.77 Data storage253224183.63 Inter/national governance462430173.62 Update frequency263026173.62 Availability of terms as URIS223726143.59 In-house governance5143018143.27 TOTAL81 LIMITATIONS: FACILITATORS & INHIBITORS OF CONTROLLED VOCABULARY USE
31
On a five point scale, with 1 being least important and 5 being most important, please rate how the following aspects FACILITATE your use of controlled vocabularies to describe scientific research data. LIMITATIONS: FACILITATORS & INHIBITORS OF CONTROLLED VOCABULARY USE 1 Low importance 2 Slightly important 3 Neutral 4 Moderately important 5 Very important Availability on the WWW High update frequency
32
On a five point scale, with 1 being least important and 5 being most important, please rate how the following aspects IMPEDE your use of controlled vocabularies to describe scientific research data. LIMITATIONS: FACILITATORS & INHIBITORS OF CONTROLLED VOCABULARY USE 1 Low importance 2 Slightly important 3 Neutral 4 Moderately important 5 Very important Unavailabilit y on the WWW Low update frequency
33
On a five point scale, with 1 being least important and 5 being most important, please rate how the following aspects FACILITATE your use of controlled vocabularies to describe scientific research data. LIMITATIONS: FACILITATORS & INHIBITORS OF CONTROLLED VOCABULARY USE 1 Low importance 2 Slightly important 3 Neutral 4 Moderately important 5 Very important Availability on the WWW High update frequency ✔ ✔
34
On a five point scale, with 1 being least important and 5 being most important, please rate how the following aspects IMPEDE your use of controlled vocabularies to describe scientific research data. LIMITATIONS: FACILITATORS & INHIBITORS OF CONTROLLED VOCABULARY USE 1 Low importance 2 Slightly important 3 Neutral 4 Moderately important 5 Very important Unavailabilit y on the WWW Low update frequency ✔ ✔
35
LIMITATIONS: WILD WILD WEST No research designs on which to model ours Population and sample difficult to define
36
5 CONCLUSIONS
37
Multiple roles associated with data repositories would make use of the following functions: Access to multiple vocabularies at the time of indexing Automatic generation of suggested terms Diversity of understanding regarding what defines a “controlled vocabulary” Long tail of controlled vocabularies actively in use Clear ideas on how to design research about controlled vocabulary use by different data repository stakeholders CONCLUSIONS
38
PARTICIPATE! http://tinyurl.com/controlledvocabsurvey The survey will remain open until July 15.
39
KEEP IN TOUCH! Chelcie Juliet Rowell Digital Initiatives Librarian Z. Smith Reynolds Library Wake Forest University rowellcj@wfu.edu Jane Greenberg Professor, School of Information & Library Science Director, Metadata Research Center University of North Carolina at Chapel Hill janeg@email.unc.edu
40
ACKNOWLEDGEMENTS This study was supported by the U.S. National Science Foundation Grant No. ACI-0830944. We would like to express our gratitude to the people who helped and supported us throughout the design and implementation of this research study, especially Rebecca Koskela, Laura Moyers, and Amber Budden of DataONE and Mary Whitton of RENCI, who were instrumental in helping us to disseminate the survey. We would also like to thank pilot testers of the first draft of our survey instrument as well as all survey participants.
41
Greenberg, J. (2009). Theoretical Considerations of Lifecycle Modeling: An Analysis of the Dryad Repository Demonstrating Automatic Metadata Propagation, Inheritance, and Value System Adoption. Cataloging & Classification Quarterly, 47(3): 380–402. Greenberg, J. et al. (2011). HIVE: Helping Interdisciplinary Vocabulary Engineering. Bulletin of the American Society for Information Science and Technology, 37(4). http://www.asis.org/BulletinApr-11AprMay11_Greenberg_ etAl.html. http://www.asis.org/BulletinApr-11AprMay11_Greenberg_ etAl.html Helping Interdisciplinary Vocabulary Engineering (HIVE) Demonstration System. http://hive.nescent.org/.http://hive.nescent.org/ Helping Interdisciplinary Vocabulary Engineering (HIVE) Wiki. https://www.nescent.org/sites/hive/Main_Page. https://www.nescent.org/sites/hive/Main_Page REFERENCES
42
Tenopir, C. et al. (2011) Data Sharing by Scientists: Practices and Perceptions. PLoS ONE, 6(6): 1–21. Willis, C. et al. (2012). Analysis and Synthesis of Metadata Goals for Scientific Data. Journal of the American Society for Information Science and Technology, 63 (8): 1505–1520. REFERENCES
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.