Presentation is loading. Please wait.

Presentation is loading. Please wait.

SKOS-2-HIVE UNT workshop. Morning Session Schedule Introductions and Exploring HIVE Section 1: Knowledge Organization and Vocabulary Control Section 2:

Similar presentations


Presentation on theme: "SKOS-2-HIVE UNT workshop. Morning Session Schedule Introductions and Exploring HIVE Section 1: Knowledge Organization and Vocabulary Control Section 2:"— Presentation transcript:

1 SKOS-2-HIVE UNT workshop

2 Morning Session Schedule Introductions and Exploring HIVE Section 1: Knowledge Organization and Vocabulary Control Section 2: From Thesauri to SKOS BREAK Section 3: From SKOS to HIVE Section 4: Evaluating HIVE

3 Introductions Hollie White hcwhite1@email.unc.edu

4 Exploring HIVE http://hive.nescent.org

5 Section 1: Knowledge Organization and Vocabulary Control

6 Classical view of ILS languages Simple thesauri/ deeper taxonomies low level full/intricate Key word CV thesauri ontologies ontologies Lists (WordNet) (OWL) Greenberg’s Ontology Continuum

7 Types of Vocabulary Control From least to most structure Term lists Controlled but semi-unstructured list Example: ASU portal-- http://library.lib.asu.edu/search/yhttp://library.lib.asu.edu/search/y Controlled Vocabulary Less structured thesauri also referred to as subject heading lists Example: MeSH -- http://www.nlm.nih.gov/mesh/MBrowser.htmlhttp://www.nlm.nih.gov/mesh/MBrowser.html Thesauri Composed of indexing terms/descriptors Example: NASA -- http://www.sti.nasa.gov/thesfrm1.htmhttp://www.sti.nasa.gov/thesfrm1.htm

8 Types of Vocabulary Control continued Taxonomy A subject-based classification that arranges the terms in the controlled vocabulary into a hierarchy (Garshol 2004) Example: ITIS--http://www.itis.gov/ (search Abutilon menziesii) Ontology A way to convey or represent a class (or classes) of things, and relationships among the classes. Example: Gene Ontology--http://www.geneontology.org/

9 KOS used in Digital Libraries Looked at 269 online digital libraries and collections KOS used: Locally developed taxonomy (113) LCSH (78) Author list (34) Thesauri (26) Alphabetical listing (20) Geographic arrangement (16) Shiri, A. and Chase-Kruszewski, S. (2009) Knowledge organization systems in North American digital library collections. Program:electronic library and information systems. 43 (2) pp 121-139.

10 Discussion: Think about your own organization. What type of controlled vocabularies, thesauri, and ontologies does your organization use for everyday work? How do these vocabulary choices help you meet the goals of your institution? See activity page

11 Section 2: From Thesauri to SKOS

12 SKOS Knowledge Organization Structure Technical Infrastructure

13 SKOS Knowledge Organization Structure Technical Infrastructure

14 Simple Knowledge Organization Systems Classical view of ILS languages Simple thesauri/ deeper taxonomies low level full/intricate Key word CV thesauri ontologies ontologies Lists (i.e WordNet) (i.e. OWL) SKOS

15 Common thesaural identifiers SN Scope Note Instruction, e.g. don’t invert phrases USE Use (another term in preference to this one) UF Used For BT Broader Term NT Narrower Term RT Related Term

16 Syndetic Relationships Syndetic relationships are the conceptual connections between terms. Three types of syndetic relationships Hierarchical Equivalent Associative

17 Hierarchical Level of generality – both preferred terms BT (broader term) Birthday cakes BT Cakes NT (narrower term) Cakes NT Birthday cakes …remember inheritance

18 Equivalent When two or more terms represent the same concept One is the preferred term ( descriptor ), where all the information is collected The other is the non-preferred and helps the user to find the appropriate term

19 Equivalent Non-preferred term USE Preferred term – Biological diversification USE Biodiversity Preferred term UF (used for) Non- preferred term – Biodiversity UF Biological diversification

20 Associative One preferred term is related to another preferred term Non-hierarchical “See also” function In any large thesaurus, a significant number of terms will mean similar things or cover related areas, without necessarily being synonyms or fitting into a defined hierarchy

21 Associative Related Terms ( RT ) can be used to show these links within the thesaurus – Bed RT Bedding – Paint Brushes RT Painting – Vandalism RT Hostility – Programming RT Software

22 Identifiers to SKOS code SN Scope Note = skos:scopeNote USE Use = skos:prefLabel UF Used For =skos:altLabel BT Broader Term = skos:broader NT Narrower Term = skos:narrower RT Related Term = skos:related Each entry term has a skos:concept

23 Terms vs. Concepts? Example: Table Lexical level : Table Conceptual level :

24 What is a SKOS Concept? Zygotes BT Ova NTOocysts RTHemizygosity RTReproduction RTZygosity UFOokinetes All these relationships make up a SKOS concept

25 Conceptualizing SKOS See activity in packet

26 SKOS Knowledge Organization Structure Technical Infrastructure

27

28 Example 1: Web view of NBII entry

29 XML Extensible Markup Language --Created by the World Wide Web Consortium (W3C). --Used to mark up documents on the internet or electronic documents. --Users get to describe the tags that are used and define how they are used.

30 XML encoding

31 NBII in XML Desert plants Desert organisms Plants Succulents ORIG Original Approved Descriptor 2007-08-14

32 Creating SKOS/XML See activity online

33 RDF Resource Description Framework “is a family of World Wide Web Consortium (W3C) specifications originally designed as a metadata data model. It has come to be used as a general method for conceptual description or modeling of information that is implemented in web resources, using a variety of syntax formats”World Wide Web Consortiumspecificationsmetadatadata model --from Wikipedia

34 RDF data model is similar to Entity-Relationship or Class diagrams,Entity-RelationshipClass diagrams statements about resource in subject-predicate- object expressions called “triples”.statements subject = resource predicate = traits or aspects of the resource and expresses a relationship between the subject and the object. http://www.w3.org/TR/rdf-concepts/

35 The sky has the color blue RDF triple: a subject denoting "the sky“ a predicate denoting "has the color” an object denoting "blue”

36 Things to know about RDF Everything can be identified by URI’s Resources and links can have types Partial information is tolerated There is no need for absolute truth Evolution is supported Minimalist design http://www.w3.org/2001/12/semweb-fin/w3csw

37 Example of RDF <rdf:RDF xmlns:rdf=“http://www.w3.org/1999/02/22-rdf-syntax-ns#”“http://www.w3.org/1999/02/22-rdf-syntax-ns# xmlns:dc="http://purl.org/dc/elements/1.1/"> HIVE Web Interface

38 NBII in SKOS/RDF Desert plants ORIG Original

39 Deconstructing SKOS/RDF

40 Desert plants ORIG Original

41 Desert plants ORIG Original

42 Desert plants ORIG Original

43 Desert plants ORIG Original

44 Desert plants ORIG Original

45 Desert plants ORIG Original

46 Desert plants ORIG Original

47 Desert plants ORIG Original

48 Deconstructing SKOS/RDF For more examples of deconstruction see packet

49 Constructing SKOS See activities online

50 Section 3: From SKOS to HIVE

51 Examples of Projects/Communities Using SKOS  W3C’s List of SKOS/Datasets http://www.w3.org/2001/sw/wiki/SKOS/Datasets  Library of Congress http://id.loc.gov/authorities/search/ Europeana http://www.europeana.eu/portal/ HIVE http://ils.unc.edu/mrc/hive/

52 Overview HIVE—Helping Interdisciplinary Vocabulary Engineering Motivation—Dryad repository HIVE—Goals, status, and design A scenario Usability Conclusion and questions

53 53 HIVE model  approach for integrating discipline CVs  Model addressing C V cost, interoperability, and usability constraints (interdisciplinary environment)

54 54

55 American Society of Naturalists American Naturalist Ecological Society of America Ecology, Ecological Letters, Ecological Monographs, etc. European Society for Evolutionary Biology Journal of Evolutionary Biology Society for Integrative and Comparative Biology Integrative and Comparative Biology Society for Molecular Biology and Evolution Molecular Biology and Evolution Society for the Study of Evolution Evolution Society for Systematic Biology Systematic Biology Commercial journals Molecular Ecology Molecular Phylogenetics and Evolution Partner Journals

56 Dryad’s workflow ~ low burden submission

57 Vocabulary needs for Dryad Vocabulary analysis – 600 keywords, Dryad partner journals Vocabularies: NBII Thesaurus, LCSH, the Getty’s TGN, ERIC Thesaurus, Gene Ontology, IT IS (10 vocabularies) Facets: taxon, geographic name, time period, topic, research method, genotype, phenotype… Results 431 topical terms, exact matches – NBII Thesaurus, 25%; MeSH, 18% 531 terms (research method and taxon) – LCSH, 22% found exact matches, 25% partial Conclusion: Need multiple vocabularies

58 HIVE... as a solution Address CV (controlled vocabulary) cost, interoperability, and usability constraints COST: Expensive to create, maintain, and use INTEROPERABILITY: Developed in silos (structurally and intellectually) USABILITY: Interface design and functionality limitations have been well documented

59 HIVE Goals − Automatic metadata generation approach that dynamically integrates discipline-specific controlled vocabularies encoded with the Simple Knowledge Organisation System (SKOS)Simple Knowledge Organisation System (SKOS) Provide efficient, affordable, interoperable, and user friendly access to multiple vocabularies during metadata creation activities A model that can be replicated —> model and service Three phases of HIVE: 1. Building HIVE - Vocabulary preparation - Server development - Primate Life Histories Working Group - Wood Anatomy and Wood Density Working Group 2. Sharing HIVE empowering information professionals - Continuing education (empowering information professionals) 3. Evaluating HIVE - Examining HIVE in Dryad

60 HIVE Partners Vocabulary Partners Library of Congress: LCSH the Getty Research Institute (GRI): TGN (Thesaurus of Geographic Names ) United States Geological Survey (USGS): NBII Thesaurus, Integrated Taxonomic Information System (ITIS) Agrovoc Thesaurus Advisory Board Jim Balhoff, NESCent Libby Dechman, LCSH Mike Frame, USGS Alistair Miles, Oxford, UK William Moen, University of North Texas Eva Méndez Rodríguez, University Carlos III of Madrid Joseph Shubitowski, Getty Research Institute Ed Summers, LCSH Barbara Tillett, Library of Congress Kathy Wisser, Simmons Lisa Zolly, USGS WORKSHOPS HOSTS: Columbia Univ.; Univ. of California, San Diego; George Washington University; Univ. of North Texas; Universidad Carlos III de Madrid, Madrid, Spain

61 HIVE Construction HIVE stores millions of concepts from different vocabularies, and makes them available on the Web by a simple HTTP – Vocabularies are imported into HIVE using SKOS/RDF format HIVE is divided in two different modules: 1. HIVE Core – SKOS/RDF storage and management (SESAME/Elmo) – SMART HIVE – SMART HIVE : Automatic Metadata Extraction and Topic Detection (KEA++) – Concept Retrieval (Lucene) 2. HIVE Web – Web user Interface (GWT—Google Web Toolkit) – Machine oriented interface (SOAP and REST)

62

63

64

65 A scenario HIVE for scientists, depositors HIVE for information professionals: curators, professional librarians, archivists, museum catalogers

66 Meet Amy Amy Zanne is a botanist. Like every good scientist, she publishes.

67

68 ~~~~Amy Amy Zanne is a botanist. Like every good scientist, she publishes. She deposits data in Dryad.

69 Dryad’s workflow ~ low burden submission

70

71

72

73

74 Usability LS and IS students (32 students) - Understanding HIVE: 3.8 on 5 pt. scale - Ease of navigation: 4.5 - Concept cloud a good idea: 3.3 - Represent document accurately: 2.0 (simple HIVE), 3.3 ( smart HIVE) Advisory board (10 members) - Systems/technical folks want integration w/systems, Getty—EAD - Librarians/KO folks, want to see term relationships - Like tag cloud, want relevance percentages - Color, placement of box, labels.. White 2009-2010; HIVE Team 2009-2010

75 Usability Formal usability study 4 biologist, 5 information professionals ~ Tasks, usability ratings, satisfaction ranking Average time to search a concept: Librarians: 6.53 minutes Scientists: 3.82 minutes ~ consistent w/research at NIEHS, 2 times as long Average time for automatic indexing sequence Librarians: 1.91 minutes Scientists: 2.1 minutes Huang, 2010

76 System usability and flow metrics Huang, 2010

77 Challenges Building vs. doing/analysis Source for HIVE generation, beyond abstracts Combining many vocabularies during the indexing/term matching phase is difficult, time consuming, inefficient. NLP and machine learning offer promise Interoperability = dumbing down ontologies Proof-of-concept/ illustrate the differences between HIVE and other vocabulary registries (NCBO and OBO Foundry) General large team logistics, and having people from multiple disciplines (also the ++)

78 Summary and next steps Open source, customizable, SKOS, + hybrid metadata generation Research and evaluation Team project relating to Dryad Hollie White--dissertation Lesley Skalla--master’s paper Craig Willis– MeSH/SKOS conversion Curator interface design Workshop evaluation User’s and developer’s groups on “Google Groups” Long Term Ecological Research (LTER) Network (http://www.lternet.edu/)http://www.lternet.edu/

79 Section 4: Evaluating HIVE Comparing manual and automatic classification of science abstracts

80 Join Us @ HIVE Community http://groups.google.com/group/hive-community Google Code page (to get your own HIVE) http://code.google.com/p/hive-mrc/

81 Questions /Comments Hollie White hcwhite1@email.unc.edu Ryan Scherle ryan@scherle.org Jane Greenberg janeg@email.unc.edu Craig Willis willisca@email.unc.edu


Download ppt "SKOS-2-HIVE UNT workshop. Morning Session Schedule Introductions and Exploring HIVE Section 1: Knowledge Organization and Vocabulary Control Section 2:"

Similar presentations


Ads by Google