Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chelcie Rowell Jane Greenberg Metadata Research Center UNC-Chapel Hill CONTROLLED VOCABULARY STATUS & POTENTIAL IN DATA REPOSITORIES Authority Control.

Similar presentations


Presentation on theme: "Chelcie Rowell Jane Greenberg Metadata Research Center UNC-Chapel Hill CONTROLLED VOCABULARY STATUS & POTENTIAL IN DATA REPOSITORIES Authority Control."— Presentation transcript:

1 Chelcie Rowell Jane Greenberg Metadata Research Center UNC-Chapel Hill CONTROLLED VOCABULARY STATUS & POTENTIAL IN DATA REPOSITORIES Authority Control Interest Group ALA Annual 2013

2  Research Impetus  Research Goals  Methodology  Results  Conclusions OVERVIEW OF NSF-SPONSORED RESEARCH STUDY

3 1 RESEARCH IMPETUS

4  COST Vocabularies are expensive to create and maintain  INTEROPERABILITY Vocabularies sometimes use standards (Z39.19, SKOS) but are often developed independently  USABILITY Vocabularies are difficult to use for both information professionals and content creators  INTERDISCIPLINARITY Collections are increasingly interdisciplinary CONTROLLED VOCABULARY CHALLENGES FOR DATA REPOSITORIES

5  A curated general-purpose repository for data underlying journal articles; member node of DataONE  Staff performed vocabulary analysis mapping keywords from journals to 10 controlled vocabularies  Low percentage of keywords had an exact match with within each controlled vocabulary  So what? Dryad requires multiple controlled vocabularies within subject field alone  LCSH, MESH, NBII, ERIC, ITIS VOCABULARY NEEDS OF DRYAD, ONE DATA REPOSITORY

6 HELPING INTERDISCIPLINARY VOCABULARY ENGINEERING (HIVE) MODEL

7

8 HELPING INTERDISCIPLINARY VOCABULARY ENGINEERING (HIVE) CONCEPT BROWSER

9 HELPING INTERDISCIPLINARY VOCABULARY ENGINEERING (HIVE) INDEXER

10 2 RESEARCH GOALS

11 1.Identify controlled vocabularies currently in use by different data repositories 2.Examine potential facilitators and inhibitors of controlled vocabulary use by different repository stakeholders 3.Explore infrastructure for using controlled vocabularies in place at different data repositories 4.Develop framework for studying controlled vocabulary use across different roles associated with data repositories RESEARCH GOALS

12 3 METHODOLOGY

13 1 DRAFT SURVEY INSTRUMENT 2 PERFORM PILOT TESTING 3 REVISE SURVEY INSTRUMENT 4 1 ST DISTRIBUTION 5 PRELIMARY DATA ANALYSIS 6 2 nd DISTRIBUTION 7 FINAL DATA ANALYSIS RESEARCH PROCESS

14  CODATA  DARTG  DC-SAM  EPA  JE  JISC Research Data Mgmt  PAMWG  RDA  RDAP  SIG-CR  SIG-STI  SE  STS-L  USGS WEB SURVEY DISTRIBUTED TO DATANET & DATA REPOSITORY STAKEHOLDERS

15 Data Contributor Q3Data CuratorQ13DeveloperQ13 DataNet Administrator Q22OtherQ22 ROLE WITHIN DATA REPOSITORY DETERMINES QUESTION PATH

16 Data Contributor Q3Data CuratorQ13DeveloperQ13 DataNet Administrator Q22OtherQ22 ROLE WITHIN DATA REPOSITORY DETERMINES QUESTION PATH ✔

17 Data Contributor Q3Data CuratorQ13DeveloperQ13 DataNet Administrator Q22OtherQ22 ROLE WITHIN DATA REPOSITORY DETERMINES QUESTION PATH ✔✔

18 4 RESULTS

19 PARTICIPANT POPULATION 84 112 54 25 54

20 CONTROLLED VOCABULARY USE: CHOICES SUPPLIED BY SURVEY None of the aboveLCSHMeSHTGN 5014109 ITISNBIIEnvThes/LTERGO 8733 UATAGROVOCERICNALT 3110 TOTAL = 93 participants

21 CONTROLLED VOCABULARIES: SUPPLIED BY PARTICIPANTS

22 YesNo Don’t Know TOTAL Select from multiple controlled vocabularies when describing a single dataset 144119 Use software to generate suggested subject terms selected from a controlled vocabulary 224430 OF THE DATA CONTRIBUTORS WHO HAD NOT PERFORMED ONE OF THE ACTIONS BELOW, HOW MANY WOULD MAKE USE OF THAT FUNCTION IN THE NEXT 12 MONTHS?

23 YesNo Don’t Know TOTAL Select from multiple controlled vocabularies when describing a single dataset 2281141 Use software to generate suggested subject terms selected from a controlled vocabulary 3372262 OF THE DATA CURATORS WHOSE REPOSITORY DOES NOT SUPPORT ONE OF THE ACTIONS BELOW, HOW MANY WOULD SUPPORT THAT FUNCTION IN THE NEXT 12 MONTHS?

24 IF A TOOL WERE BUILT THAT SUPPORTED THE USE OF CONTROLLED VOCABULARIES WITHIN & ACROSS DATA REPOSITORIES, WHAT FEATURES WOULD THIS TOOL NEED? We would be more likely to use the tool if it was offered in the form of a web services API as opposed to a web site or a desktop application. Web services would make the tool platform-independent and easier to embed within our current suite of software aplications.

25 IF A TOOL WERE BUILT THAT SUPPORTED THE USE OF CONTROLLED VOCABULARIES WITHIN & ACROSS DATA REPOSITORIES, WHAT FEATURES WOULD THIS TOOL NEED? Ease of use, ease of ‘plugging’ into different services and software.

26 IF A TOOL WERE BUILT THAT SUPPORTED THE USE OF CONTROLLED VOCABULARIES WITHIN & ACROSS DATA REPOSITORIES, WHAT FEATURES WOULD THIS TOOL NEED? I would use such a tool to add preferred terms to records while keeping free-text tags in place.

27 IF A TOOL WERE BUILT THAT SUPPORTED THE USE OF CONTROLLED VOCABULARIES WITHIN & ACROSS DATA REPOSITORIES, WHAT FEATURES WOULD THIS TOOL NEED? [S]cience researchers are not familiar with the jargon of ‘controlled vocabularies’ and ‘ontologies.’ They need a tool that helps them connect the correct subject headings or keywords to their work, regardless of what scheme it is. They mostly don't care if it's LCSH or NBII – they just want the correct terms attached to their dataset.

28 IF A TOOL WERE BUILT THAT SUPPORTED THE USE OF CONTROLLED VOCABULARIES WITHIN & ACROSS DATA REPOSITORIES, WHAT FEATURES WOULD THIS TOOL NEED? My ‘wish list’ includes:  selection of specific vocabularies to be used in specific contexts  web services that support identification of candidate terms based on metadata content  tools for addressing shared terms in different vocabularies

29 12345MEAN Availability on WWW121525384.20 Openness to term suggestions152032233.88 Generation of suggested terms from selected controlled vocab 222928203.77 Data storage253224183.63 Inter/national governance462430173.62 Update frequency263026173.62 Availability of terms as URIS223726143.59 In-house governance5143018143.27 TOTAL81 LIMITATIONS: FACILITATORS & INHIBITORS OF CONTROLLED VOCABULARY USE

30

31 On a five point scale, with 1 being least important and 5 being most important, please rate how the following aspects FACILITATE your use of controlled vocabularies to describe scientific research data. LIMITATIONS: FACILITATORS & INHIBITORS OF CONTROLLED VOCABULARY USE 1 Low importance 2 Slightly important 3 Neutral 4 Moderately important 5 Very important Availability on the WWW High update frequency

32 On a five point scale, with 1 being least important and 5 being most important, please rate how the following aspects IMPEDE your use of controlled vocabularies to describe scientific research data. LIMITATIONS: FACILITATORS & INHIBITORS OF CONTROLLED VOCABULARY USE 1 Low importance 2 Slightly important 3 Neutral 4 Moderately important 5 Very important Unavailabilit y on the WWW Low update frequency

33 On a five point scale, with 1 being least important and 5 being most important, please rate how the following aspects FACILITATE your use of controlled vocabularies to describe scientific research data. LIMITATIONS: FACILITATORS & INHIBITORS OF CONTROLLED VOCABULARY USE 1 Low importance 2 Slightly important 3 Neutral 4 Moderately important 5 Very important Availability on the WWW High update frequency ✔ ✔

34 On a five point scale, with 1 being least important and 5 being most important, please rate how the following aspects IMPEDE your use of controlled vocabularies to describe scientific research data. LIMITATIONS: FACILITATORS & INHIBITORS OF CONTROLLED VOCABULARY USE 1 Low importance 2 Slightly important 3 Neutral 4 Moderately important 5 Very important Unavailabilit y on the WWW Low update frequency ✔ ✔

35 LIMITATIONS: WILD WILD WEST  No research designs on which to model ours  Population and sample difficult to define

36 5 CONCLUSIONS

37  Multiple roles associated with data repositories would make use of the following functions:  Access to multiple vocabularies at the time of indexing  Automatic generation of suggested terms  Diversity of understanding regarding what defines a “controlled vocabulary”  Long tail of controlled vocabularies actively in use  Clear ideas on how to design research about controlled vocabulary use by different data repository stakeholders CONCLUSIONS

38 PARTICIPATE! http://tinyurl.com/controlledvocabsurvey The survey will remain open until July 15.

39 KEEP IN TOUCH! Chelcie Juliet Rowell Digital Initiatives Librarian Z. Smith Reynolds Library Wake Forest University rowellcj@wfu.edu Jane Greenberg Professor, School of Information & Library Science Director, Metadata Research Center University of North Carolina at Chapel Hill janeg@email.unc.edu

40 ACKNOWLEDGEMENTS This study was supported by the U.S. National Science Foundation Grant No. ACI-0830944. We would like to express our gratitude to the people who helped and supported us throughout the design and implementation of this research study, especially Rebecca Koskela, Laura Moyers, and Amber Budden of DataONE and Mary Whitton of RENCI, who were instrumental in helping us to disseminate the survey. We would also like to thank pilot testers of the first draft of our survey instrument as well as all survey participants.

41  Greenberg, J. (2009). Theoretical Considerations of Lifecycle Modeling: An Analysis of the Dryad Repository Demonstrating Automatic Metadata Propagation, Inheritance, and Value System Adoption. Cataloging & Classification Quarterly, 47(3): 380–402.  Greenberg, J. et al. (2011). HIVE: Helping Interdisciplinary Vocabulary Engineering. Bulletin of the American Society for Information Science and Technology, 37(4). http://www.asis.org/BulletinApr-11AprMay11_Greenberg_ etAl.html. http://www.asis.org/BulletinApr-11AprMay11_Greenberg_ etAl.html  Helping Interdisciplinary Vocabulary Engineering (HIVE) Demonstration System. http://hive.nescent.org/.http://hive.nescent.org/  Helping Interdisciplinary Vocabulary Engineering (HIVE) Wiki. https://www.nescent.org/sites/hive/Main_Page. https://www.nescent.org/sites/hive/Main_Page REFERENCES

42  Tenopir, C. et al. (2011) Data Sharing by Scientists: Practices and Perceptions. PLoS ONE, 6(6): 1–21.  Willis, C. et al. (2012). Analysis and Synthesis of Metadata Goals for Scientific Data. Journal of the American Society for Information Science and Technology, 63 (8): 1505–1520. REFERENCES


Download ppt "Chelcie Rowell Jane Greenberg Metadata Research Center UNC-Chapel Hill CONTROLLED VOCABULARY STATUS & POTENTIAL IN DATA REPOSITORIES Authority Control."

Similar presentations


Ads by Google