Download presentation
Presentation is loading. Please wait.
Published byJason Harmon Modified over 9 years ago
1
SKOS-2-HIVE Interactive Seminar
2
Introductions Hollie White hcwhite1@email.unc.edu Jane Greenberg janeg@email.unc.edu
3
Morning Session Schedule 8:30- 8:45 Introductions 8:45-9:30 Section 1: Characterizing Knowledge Organization Structures 9:30-10:15 Section 2: Thesauri and What They Represent 10:15-10:30 BREAK 10:30-11:15 Section 3: From Thesauri to SKOS 11:15-11:45 Section 4: From SKOS to HIVE 11:45-12:30 Exploring HIVE
4
Section 1: Characterizing knowledge organization structures
5
Types of knowledge organization structures From least to most structure Term lists Controlled vocabularies Thesauri Taxonomy Ontology
6
Languages for aboutness Indexing languages: Terminological tools Thesauri (CV – controlled vocabulary) Subject headings lists Authority files for named entities (people, places, structures, organizations) Classification / Classificatory systems Keyword lists Natural language systems (broad interpretation) 6
7
Term lists Controlled but semi-unstructured list Term List in practice http://library.lib.asu.edu/search/y
8
Authority files -standardization of names, subjects and titles for easier identification and interoperability of information Authority Files: http://authorities.loc.gov/
9
Thesauri Less-structured and structured thesauri Lexical semantic relationships Composed of indexing terms/descriptors Descriptors - representations of concepts Concepts - Units of meaning
10
Thesaurus basics Preferred terms vs. non-preferred terms --ex. dress vs. clothing Semantic relations between terms --broader, narrower, related How to apply terms (guidelines, rules) Scope notes
11
Common thesaural identifiers SN Scope Note Instruction, e.g. don’t invert phrases USE Use (another term in preference to this one) UF Used For BT Broader Term NT Narrower Term RT Related Term
12
Controlled Vocabularies (less structured thesauri also referred to as subject heading lists) Library of Congress Subject Headings (LCSH) Sears Subject Headings Medical Subject Headings (MeSH) http://www.nlm.nih.gov/mesh/MBrowser.html
13
Thesauri Thesaurus in practice ERIC NBII http://thesaurus.nbii.gov/portal/server.pt NASA thesaurus http://www.sti.nasa.gov/thesfrm1.htm
14
Taxonomy First used by Carl von Linne (Linneaus) to classify zoology. A grouping of terms representing topics or subject categories. A taxonomy is typically structured so that its terms exhibit hierarchical relationships to one another, between broader and narrower concepts. taxonomy == a subject-based classification that arranges the terms in the controlled vocabulary into a hierarchy (Garshol 2004)
15
Ontology In general (in the LIS domain): a tool to help organize knowledge a way to convey or represent a class (or classes) of things, and relationships among the class/es. No exact definition…this comes from the community you are coming from 15
16
KOS used in Digital Libraries Looked at 269 online digital libraries and collections KOS used: Locally developed taxonomy (113) LCSH (78) Author list (34) Thesauri (26) Alphabetical listing (20) Geographic arrangement (16) Shiri, A. and Chase-Kruszewski, S. (2009) Knowledge organization systems in North American digital library collections. Program:electronic library and information systems. 43 (2) pp 121-139.
17
Discussion: Think about your own organization. What type of controlled vocabularies, thesauri, and ontologies does your organization use for everyday work? How do these vocabulary choices help you meet the goals of your institution?
18
Organizing Knowledge Organization Structures
19
Hodge’s Types of Knowledge Organization Systems Terms Lists : Authority Files, Glossaries, Gazetteers, Dictionaries Classifications and Categories: Subject Headings, Classification Schemes, Taxonomies, and Categorization Schemes Relationship Lists: Thesauri, Semantic Networks, Ontologies Hodge, G. (2000) Systems of Knowledge Organization for Digital Libraries: Beyond Traditional Authority Files.http://www.clir.org/pubs/abstract/pub91abst.htmlhttp://www.clir.org/pubs/abstract/pub91abst.html
20
(McGuinness, D. L. (2003). Ontologies Come of Age. In Fensel, et al, Spinning the Semantic Web. Cambridge, MIT Press), pp. 175. [see also, p. 181 + 189])
21
Classical view of ILS languages Simple thesauri/ deeper taxonomies low level full/intricate Key word CV thesauri ontologies ontologies Lists (WordNet) (OWL) Greenberg’s Ontology Continuum
22
(http://jodi.tamu.edu/Articles/v04/i04/Smith/#section12)http://jodi.tamu.edu/Articles/v04/i04/Smith/#section12
23
http://www.semantic-conference.com
24
Section 2: Thesauri and what they represent
25
Examples of different types of “thesauri” Cook’s Thesaurus http://www.foodsubs.com/ BZZURKK! Thesaurus of Champions http://epe.lac- bac.gc.ca/100/200/300/ktaylor/kaboom/bzzurkk.htm General Multilingual Environmental Thesaurus http://www.eionet.europa.eu/gemet
26
Common thesaural identifiers SN Scope Note Instruction, e.g. don’t invert phrases USE Use (another term in preference to this one) UF Used For BT Broader Term NT Narrower Term RT Related Term
27
Syndetic Relationships Hierarchical Equivalent Associative
28
Hierarchical Level of generality – both preferred terms BT (broader term) Robins BT Birds NT (narrower term) Birds NT Robins …remember inheritance
29
Equivalent When two or more terms represent the same concept One is the preferred term ( descriptor ), where all the information is collected The other is the non-preferred and helps the user to find the appropriate term
30
Equivalent Non-preferred term USE Preferred term – Nuclear Power USE Nuclear Energy – Periodicals USE Serials Preferred term UF (used for) Non- preferred term – Nuclear Energy UF Nuclear Power – Serials UF Periodicals
31
Associative One preferred term is related to another preferred term Non-hierarchical “See also” function In any large thesaurus, a significant number of terms will mean similar things or cover related areas, without necessarily being synonyms or fitting into a defined hierarchy
32
Associative Related Terms ( RT ) can be used to show these links within the thesaurus – Bed RT Bedding – Paint Brushes RT Painting – Vandalism RT Hostility – Programming RT Software
33
Exercise: Thesauri Building Montages Digital photographs Illustrations Pictures Photographic prints Drawings Photographs Daguerreotypes Negatives
34
Where to start: Look at the overall offering Determine the aboutness Identify the “root” element or broadest term Identify groups/categories of information Start structuring based on the syndetic relations you know Create hierarchies based on the semantic relations Use the appropriate identifiers to show the relationships
35
Section 3: From Thesauri to SKOS
36
Simple Knowledge Organization Systems Classical view of ILS languages Simple thesauri/ deeper taxonomies low level full/intricate Key word CV thesauri ontologies ontologies Lists (i.e WordNet) (i.e. OWL) SKOS
37
Example 1:web view of NBII entry
39
Descriptive Markup “the markup is used to label parts of the document rather than to provide specific instructions as to how they should be processed. The objective is to decouple the inherent structure of the document from any particular treatment or rendition of it. Such markup is often described as "semantic". --from Wikipedia
40
Markup Languages “is a system for annotating a text in a way which is syntactically distinguishable from that text.”annotating Using tags: content to be rendered Or a keyword in brackets to distinguish texts --from Wikipedia
41
HTML Hypertext Markup Language --language used to mark up webpages --both descriptive and processing.
42
HTML encoding !doctype html> Hello HTML Hello World!
43
NBII in HTML BiocomplexityThesaurus if(!document.getElementById('PTIncluder-js')){document.write(' ');} PTIncluder.imageServerURL ='http://www.nbii.gov/imageserver/';PTIncluder.basePath ='plumtree/common/private/js/';PTIncluder.lang ='en';PTIncluder.country = 'US';PTIncluder.debug =false;PTIncluder.loadComponent('jsportlet'); // Define PTPortalContext forCSAPIPTPortalContext = newObject();PTPortalContext.GET_SESSION_PREFS_URL ='http://www.nbii.gov/portal/server.pt?space=SessionPrefs&control=SessionPrefs&action=getprefs';PTPortalContext.SET_SESSION_PREFS_URL ='http://www.nbii.gov/portal/server.pt?space=SessionPrefs&control=SessionPrefs&action=setprefs';PTPortalContext.USER_LOCALE = 'en-us';PTPortalContext.USER_LOGIN_NAME = 'Guest';
44
XML Extensible Markup Language --Created by the World Wide Web Consortium (W3C). --Used to mark up documents on the internet or electronic documents. --Users get to describe the tags that are used and define how they are used.
45
XML encoding
46
NBII in XML CONCEPT> Zygotes Ookinetes Ova Oocysts Hemizygosity Reproduction Zygosity ASF Aquatic Sciences and Fisheries LSC Life Sciences Approved Descriptor 2007-08-14
47
RDF Resource Description Framework “is a family of World Wide Web Consortium (W3C) specifications originally designed as a metadata data model. It has come to be used as a general method for conceptual description or modeling of information that is implemented in web resources, using a variety of syntax formats”World Wide Web Consortiumspecificationsmetadatadata model --from Wikipedia
48
RDF data model Entity-Relationship or Class diagrams,Entity-RelationshipClass diagrams statements about resource in subject-predicate- object expressions called “triples”.statements subject = resource predicate = traits or aspects of the resource and expresses a relationship between the subject and the object.
49
The sky has the color blue RDF triple: a subject denoting "the sky“ a predicate denoting "has the color” an object denoting "blue”
50
OWL Web Ontology Language --knowledge representation language for displaying ontologies working with logic
51
SKOS Family of languages used to describe thesauri, controlled vocabulary, subject headings, and taxonomies.
52
NBII in SKOS/RDF Ookinetes Zygotes ASF Aquatic Sciences and Fisheries LSC LifeSciences
53
Basic SKOS Tags Skos: concept Skos:prefLabel Skos:altLabel Skos:broader Skos:narrower Skos:related
54
Tags vs. Concepts? 2 levels: Lexical level Conceptual level
55
SKOS tags SN Scope Note = skos:scopeNote USE Use = skos:prefLabel UF Used For =skos:altLabel BT Broader Term = skos:broader NT Narrower Term = skos:narrower RT Related Term = skos:related Each entry term has a skos:concept
56
Projects Using SKOS: Library of Congress http://id.loc.gov/authorities/search/ Europeana http://www.europeana.eu/portal/ HIVE http://ils.unc.edu/mrc/hive/
57
EXPERIMENTING WITH SKOS Instructions: SKOS tags can easily be mapped to identifiers found in traditional thesauri. For this activity try mapping basic SKOS tags to a thesaurus excerpt.
58
Section 4: From SKOS to HIVE
59
Overview HIVE—Helping Interdisciplinary Vocabulary Engineering Motivation—Dryad repository HIVE—Goals, status, and design A scenario Usability Conclusion and questions
60
60 HIVE model approach for integrating discipline CVs Model addressing C V cost, interoperability, and usability constraints (interdisciplinary environment)
61
Motivation
62
62 ~ Evolutionary biologists use published data more frequently than they are depositing it themselves! ~ Surveyof400 evolutionary biologist: 48 % 78% 48 % use other data; 78% had not deposited Ecology Paleontology Physiology Systematics Genomics Population genetics…. Ecology Paleontology Physiology Systematics Genomics Population genetics….
63
American Society of Naturalists American Naturalist Ecological Society of America Ecology, Ecological Letters, Ecological Monographs, etc. European Society for Evolutionary Biology Journal of Evolutionary Biology Society for Integrative and Comparative Biology Integrative and Comparative Biology Society for Molecular Biology and Evolution Molecular Biology and Evolution Society for the Study of Evolution Evolution Society for Systematic Biology Systematic Biology Commercial journals Molecular Ecology Molecular Phylogenetics and Evolution Partner Journals
64
Dryad’s workflow ~ low burden submission
65
Vocabulary needs for Dryad Vocabulary analysis – 600 keywords, Dryad partner journals Vocabularies: NBII Thesaurus, LCSH, the Getty’s TGN, ERIC Thesaurus, Gene Ontology, IT IS (10 vocabularies) Facets: taxon, geographic name, time period, topic, research method, genotype, phenotype… Results 431 topical terms, exact matches – NBII Thesaurus, 25%; MeSH, 18% 531 terms (research method and taxon) – LCSH, 22% found exact matches, 25% partial Conclusion: Need multiple vocabularies
66
Goals, status, and design
67
HIVE... as a solution Address CV (controlled vocabulary) cost, interoperability, and usability constraints COST: Expensive to create, maintain, and use INTEROPERABILITY: Developed in silos (structurally and intellectually) USABILITY: Interface design and functionality limitations have been well documented
68
HIVE Goals − Automatic metadata generation approach that dynamically integrates discipline-specific controlled vocabularies encoded with the Simple Knowledge Organisation System (SKOS)Simple Knowledge Organisation System (SKOS) Provide efficient, affordable, interoperable, and user friendly access to multiple vocabularies during metadata creation activities A model that can be replicated —> model and service Three phases of HIVE: 1. Building HIVE - Vocabulary preparation - Server development - Primate Life Histories Working Group - Wood Anatomy and Wood Density Working Group 2. Sharing HIVE empowering information professionals - Continuing education (empowering information professionals) 3. Evaluating HIVE - Examining HIVE in Dryad
69
HIVE Partners Vocabulary Partners Library of Congress: LCSH the Getty Research Institute (GRI): TGN (Thesaurus of Geographic Names ) United States Geological Survey (USGS): NBII Thesaurus, Integrated Taxonomic Information System (ITIS) Agrovoc Thesaurus Advisory Board Jim Balhoff, NESCent Libby Dechman, LCSH Mike Frame, USGS Alistair Miles, Oxford, UK William Moen, University of North Texas Eva Méndez Rodríguez, University Carlos III of Madrid Joseph Shubitowski, Getty Research Institute Ed Summers, LCSH Barbara Tillett, Library of Congress Kathy Wisser, Simmons Lisa Zolly, USGS WORKSHOPS HOSTS: Columbia Univ.; Univ. of California, San Diego; Univ. of North Texas; Universidad Carlos III de Madrid, Madrid, Spain
70
HIVE Construction HIVE stores millions of concepts from different vocabularies, and makes them available on the Web by a simple HTTP – Vocabularies are imported into HIVE using SKOS/RDF format HIVE is divided in two different modules: 1. HIVE Core – SKOS/RDF storage and management (SESAME/Elmo) – SMART HIVE – SMART HIVE : Automatic Metadata Extraction and Topic Detection (KEA++ and MAUI) – Concept Retrieval (Lucene and MG4J) 2. HIVE Web – Web user Interface (GWT—Google Web Toolkit) – Machine oriented interface (SOAP and REST)
76
A scenario HIVE for scientists, depositors HIVE for information professionals: curators, professional librarians, archivists, museum catalogers
77
Meet Amy Amy Zanne is a botanist. Like every good scientist, she publishes.
79
~~~~Amy Amy Zanne is a botanist. Like every good scientist, she publishes. She deposits data in Dryad.
80
Dryad’s workflow ~ low burden submission
85
Usability LS and IS students (32 students) - Understanding HIVE: 3.8 on 5 pt. scale - Ease of navigation: 4.5 - Concept cloud a good idea: 3.3 - Represent document accurately: 2.0 (simple HIVE), 3.3 ( smart HIVE) Advisory board (10 members) - Systems/technical folks want integration w/systems, Getty—EAD - Librarians/KO folks, want to see term relationships - Like tag cloud, want relevance percentages - Color, placement of box, labels..
86
Usability Formal usability study 4 biologist, 5 information professionals ~ Tasks, usability ratings, satisfaction ranking Average time to search a concept: Librarians: 6.53 minutes Scientists: 3.82 minutes ~ consistent w/research at NIEHS, 2 times as long Average time for automatic indexing sequence Librarians: 1.91 minutes Scientists: 2.1 minutes
87
System usability and flow metrics
88
Challenges Building vs. doing/analysis Source for HIVE generation, beyond abstracts Combining many vocabularies during the indexing/term matching phase is difficult, time consuming, inefficient. NLP and machine learning offer promise Interoperability = dumbing down ontologies Proof-of-concept/ illustrate the differences between HIVE and other vocabulary registries (NCBO and OBO Foundry) General large team logistics, and having people from multiple disciplines (also the ++)
89
Concluding facts Open source, customizable Uses SKOS, W3C/Semantic Web enabling technology A hybrid metadata generation process: using automatic indexing, plus author suggestions and (depending on the environment) professional metadata creators experience User’s and developer’s groups on “Google Groups” Long Term Ecological Research (LTER) Network (http://www.lternet.edu/)http://www.lternet.edu/ Future plans: integrate aspects of folksonomy into the system, explore HIVE as a front-end for access
90
Vocabularies will enrich Dryad data description, and assist with access, use, reuse, etc… Nothing novel, but infrastructure is supportive, HIVE is a real-world applications using Semantic Web technology “to HIVE…” HIVE/HIVE is applicable beyond Dryad HIVE wiki: HIVE wiki: https://www.nescent.org/sites/hive/Main_Page https://www.nescent.org/sites/hive/Main_Page 90 final… Conclusions
91
Exploring HIVE http://hive.nescent.org
92
Questions /Comments Hollie White hcwhite1@email.unc.edu Ryan Scherle res20@duke.edu Jane Greenberg janeg@email.unc.edu
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.