SKOS-2-HIVE GWU workshop. Introductions Hollie White Jane Greenberg

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

Alexandria Digital Library Project Integration of Knowledge Organization Systems into Digital Library Architectures Linda Hill, Olha Buchel, Greg Janée.
Jane Greenberg, Professor and Director, Metadata Research Center School of Information And Library Science University of North Carolina at Chapel Hill.
SKOS-2-HIVE UNT workshop. Morning Session Schedule Introductions and Exploring HIVE Section 1: Knowledge Organization and Vocabulary Control Section 2:
Helping Helping Interdisciplinary Vocabulary Engineering Ryan Scherle – National Evolutionary Synthesis Center Jose Aguera – University of North Carolina.
6. Applying metadata standards: Controlled vocabularies and quality issues Metadata Standards and Applications Workshop.
Using the Semantic Web to Construct an Ontology- Based Repository for Software Patterns Scott Henninger Computer Science and Engineering University of.
The Web of data with meaning... By Michael Griffiths.
Helping Interdisciplinary Vocabulary Engineering (HIVE) OCTOBER 31, 2011 Joan Boone Nico Carver Jane Greenberg Lina Huang Robert Losee Mady Madhura José.
Ontology Notes are from:
Standards for networked knowledge organisation systems Ron Davies European Library Automation Group Bucharest, April 2006.
SKOS and Other W3C Vocabulary Related Activities Gail Hodge Information International Assoc. NKOS Workshop Denver, CO June 10, 2005.
Using Metadata in CONTENTdm Diana Brooking and Allen Maberry Metadata Implementation Group, Univ. of Washington Crossing Organizational Boundaries Oct.
COMP 6703 eScience Project Semantic Web for Museums Student : Lei Junran Client/Technical Supervisor : Tom Worthington Academic Supervisor : Peter Strazdins.
Thesaurus Design and Development
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
A Registry for controlled vocabularies at the Library of Congress
1 Languages for aboutness n Indexing languages: –Terminological tools Thesauri (CV – controlled vocabulary) Subject headings lists (CV) Authority files.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Thesaurusmanagement Quickstart Introduction. What are controlled vocabularies? organized arrangement of words and phrases used to index content and/or.
PREMIS Tools and Services Rebecca Guenther Network Development & MARC Standards Office, Library of Congress NDIIPP Partners Meeting July 21,
Teaching Metadata and Networked Information Organization & Retrieval The UNT SLIS Experience William E. Moen School of Library and Information Sciences.
8/28/97Organization of Information in Collections Introduction to Description: Dublin Core and History University of California, Berkeley School of Information.
Rutherford Appleton Laboratory SKOS Ecoterm 2006 Alistair Miles CCLRC Rutherford Appleton Laboratory Semantic Web Best Practices and Deployment.
Indexing Knowledge Daniel Vasicek 2014 March 27 Introduction Basic topic is : All Human Knowledge Who Cares? Simple Examples.
A J Miles Rutherford Appleton Laboratory SKOS Standards and Best Practises for USING Knowledge Organisation Systems ON THE Semantic Web NKOS workshop ECDL.
SKOS-2-HIVE Interactive Seminar. Introductions Hollie White Jane Greenberg
AthenaPlus: WP4 Eva Coudyzer Koninklijke Musea voor Kunst en Geschiedenis Europeana Overlegplatform, 7 juni 2013.
LIS 506 (Fall 2006) LIS 506 Information Technology Week 11: Digital Libraries & Institutional Repositories.
D4: SKOS and HIVE—Enhancing the Creation, Design and Flow of Information Speakers: Hollie White Jane Greenberg Coordinator: Alan Keely.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
1 Ontology-based Semantic Annotatoin of Process Template for Reuse Yun Lin, Darijus Strasunskas Depart. Of Computer and Information Science Norwegian Univ.
MD9.6 Release: Highlights Increased the character limit for all URL resources to 600 characters. Data_Center/Service_Provider Data_Set_Citation/Service_Citation.
Incorporating ARGOVOC in DSpace-based Agricultural Repositories Dr. Devika P. Madalli & Nabonita Guha Documentation Research & Training Centre Indian Statistical.
The Agricultural Ontology Service (AOS) A Tool for Facilitating Access to Knowledge AGRIS/CARIS and Documentation Group Library and Documentation Systems.
HIVE: Enabling Common Language and Interdisciplinarity EPA-NIEHS Advancing Environmental Health Data Sharing and Analysis: Finding a Common Language June.
Ontology Summit2007 Survey Response Analysis Ken Baclawski Northeastern University.
Coastal Atlas Interoperability - Ontologies (Advanced topics that we did not get to in detail) Luis Bermudez Stephanie Watson Marine Metadata Interoperability.
1 Metadata –Information about information – Different objects, different forms – e.g. Library catalogue record Property:Value: Author Ian Beardwell Publisher.
The KOS interoperability in aquatic science field through mapping processes Carmen Reverté Reverté Aquatic Ecosystems Documentation Center. IRTA. (Sant.
Discovery Metadata for Special Collections Concepts, Considerations, Choices William E. Moen School of Library and Information Sciences Texas Center for.
Semantic web course – Computer Engineering Department – Sharif Univ. of Technology – Fall Knowledge Representation Semantic Web - Fall 2005 Computer.
It’s all semantics! The premises and promises of the semantic web. Tony Ross Centre for Digital Library Research, University of Strathclyde
SKOS. Ontologies Metadata –Resources marked-up with descriptions of their content. No good unless everyone speaks the same language; Terminologies –Provide.
AGROVOC Thesaurus. 1980s: developed as multilingual structured thesaurus for agricultural terminology (“rice”) : parallel effort to express thesaurus.
Metadata Common Vocabulary a journey from a glossary to an ontology of statistical metadata, and back Sérgio Bacelar
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
Strategies for subject navigation of linked Web sites using RDF topic maps Carol Jean Godby Devon Smith OCLC Online Computer Library Center Knowledge Technologies.
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
Public Access and Spatial Metadata Values: Semantic Network Services Response to EU Directives Maria Rüther Federal Environment Agency,
THE SEMANTIC WEB By Conrad Williams. Contents  What is the Semantic Web?  Technologies  XML  RDF  OWL  Implementations  Social Networking  Scholarly.
1 Open Ontology Repository initiative - Planning Meeting - Thu Co-conveners: PeterYim, LeoObrst & MikeDean ref.:
Achieving Semantic Interoperability at the World Bank Designing the Information Architecture and Programmatically Processing Information Denise Bedford.
Controlled Vocabulary & Thesaurus Design Associative Relationships & Thesauri.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
ORGANIZATION OF ELEMENTS OF INFORMATION The Thesaurus.
SKOS : A language to describe simple knowledge structures for the web
The Agricultural Ontology Server (AOS) A Tool for Facilitating Access to Knowledge AGRIS/CARIS and Documentation Group Food and Agriculture Organization.
Ontologies COMP6028 Semantic Web Technologies Dr Nicholas Gibbins
Semantic Web. P2 Introduction Information management facilities not keeping pace with the capacity of our information storage. –Information Overload –haphazardly.
1 How do we describe something? n What something is about? –What the content of an object is “about”? n Different methods (Wilson, 1968) –counting terms.
Some basic concepts Week 1 Lecture notes INF 384C: Organizing Information Spring 2016 Karen Wickett UT School of Information.
Semantic Web Technologies Readings discussion Research presentations Projects & Papers discussions.
Information Organization
COMP6215 Semantic Web Technologies
Information Organization
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Introduction to Metadata
PREMIS Tools and Services
Presentation transcript:

SKOS-2-HIVE GWU workshop

Introductions Hollie White Jane Greenberg

Morning Session Schedule Introductions Section 1: Characterizing Knowledge Organization Structures Section 2: Thesauri and What They Represent BREAK Section 3: From Thesauri to SKOS Section 4: From SKOS to HIVE Exploring HIVE

Section 1: Characterizing knowledge organization structures

Types of knowledge organization structures From least to most structure Term lists Controlled vocabularies Thesauri Taxonomy Ontology

Languages for aboutness Indexing languages: Terminological tools Thesauri (CV – controlled vocabulary) Subject headings lists Authority files for named entities (people, places, structures, organizations) Classification / Classificatory systems Keyword lists Natural language systems (broad interpretation) 6

Term lists Controlled but semi-unstructured list Term List in practice

Authority files -standardization of names, subjects and titles for easier identification and interoperability of information Authority Files:

Thesauri Less-structured and structured thesauri Lexical semantic relationships Composed of indexing terms/descriptors Descriptors - representations of concepts Concepts - Units of meaning

Thesaurus basics Preferred terms vs. non-preferred terms --ex. dress vs. clothing Semantic relations between terms --broader, narrower, related How to apply terms (guidelines, rules) Scope notes

Common thesaural identifiers SN Scope Note Instruction, e.g. don’t invert phrases USE Use (another term in preference to this one) UF Used For BT Broader Term NT Narrower Term RT Related Term

Controlled Vocabularies (less structured thesauri also referred to as subject heading lists) Library of Congress Subject Headings (LCSH) Sears Subject Headings Medical Subject Headings (MeSH)

Thesauri Thesaurus in practice ERIC NBII NASA thesaurus

Taxonomy First used by Carl von Linne (Linneaus) to classify zoology. A grouping of terms representing topics or subject categories. A taxonomy is typically structured so that its terms exhibit hierarchical relationships to one another, between broader and narrower concepts. taxonomy == a subject-based classification that arranges the terms in the controlled vocabulary into a hierarchy (Garshol 2004)

Ontology In general (in the LIS domain): a tool to help organize knowledge a way to convey or represent a class (or classes) of things, and relationships among the class/es. No exact definition…this comes from the community you are coming from 15

KOS used in Digital Libraries Looked at 269 online digital libraries and collections KOS used: Locally developed taxonomy (113) LCSH (78) Author list (34) Thesauri (26) Alphabetical listing (20) Geographic arrangement (16) Shiri, A. and Chase-Kruszewski, S. (2009) Knowledge organization systems in North American digital library collections. Program:electronic library and information systems. 43 (2) pp

Discussion: Think about your own organization. What type of controlled vocabularies, thesauri, and ontologies does your organization use for everyday work? How do these vocabulary choices help you meet the goals of your institution?

Organizing Knowledge Organization Structures

Hodge’s Types of Knowledge Organization Systems Terms Lists : Authority Files, Glossaries, Gazetteers, Dictionaries Classifications and Categories: Subject Headings, Classification Schemes, Taxonomies, and Categorization Schemes Relationship Lists: Thesauri, Semantic Networks, Ontologies Hodge, G. (2000) Systems of Knowledge Organization for Digital Libraries: Beyond Traditional Authority Files.

(McGuinness, D. L. (2003). Ontologies Come of Age. In Fensel, et al, Spinning the Semantic Web. Cambridge, MIT Press), pp [see also, p ])

Classical view of ILS languages Simple thesauri/ deeper taxonomies low level full/intricate Key word CV thesauri ontologies ontologies Lists (WordNet) (OWL) Greenberg’s Ontology Continuum

(

Section 2: Thesauri and what they represent

Examples of different types of “thesauri” Cook’s Thesaurus BZZURKK! Thesaurus of Champions bac.gc.ca/100/200/300/ktaylor/kaboom/bzzurkk.htm General Multilingual Environmental Thesaurus

Common thesaural identifiers SN Scope Note Instruction, e.g. don’t invert phrases USE Use (another term in preference to this one) UF Used For BT Broader Term NT Narrower Term RT Related Term

Syndetic Relationships Hierarchical Equivalent Associative

Hierarchical Level of generality – both preferred terms BT (broader term) Birthday cakes BT Cakes NT (narrower term) Cakes NT Birthday cakes …remember inheritance

Equivalent When two or more terms represent the same concept One is the preferred term ( descriptor ), where all the information is collected The other is the non-preferred and helps the user to find the appropriate term

Equivalent Non-preferred term USE Preferred term – Biological diversification USE Biodiversity Preferred term UF (used for) Non- preferred term – Biodiversity UF Biological diversification

Associative One preferred term is related to another preferred term Non-hierarchical “See also” function In any large thesaurus, a significant number of terms will mean similar things or cover related areas, without necessarily being synonyms or fitting into a defined hierarchy

Associative Related Terms ( RT ) can be used to show these links within the thesaurus – Bed RT Bedding – Paint Brushes RT Painting – Vandalism RT Hostility – Programming RT Software

Exercise: Thesauri Building Montages Digital photographs Illustrations Pictures Photographic prints Drawings Photographs Daguerreotypes Negatives

Where to start: Look at the overall offering Determine the aboutness Identify the “root” element or broadest term Identify groups/categories of information Start structuring based on the syndetic relations you know Create hierarchies based on the semantic relations Use the appropriate identifiers to show the relationships

Section 3: From Thesauri to SKOS

Simple Knowledge Organization Systems Classical view of ILS languages Simple thesauri/ deeper taxonomies low level full/intricate Key word CV thesauri ontologies ontologies Lists (i.e WordNet) (i.e. OWL) SKOS

Example 1:web view of NBII entry

Descriptive Markup “the markup is used to label parts of the document rather than to provide specific instructions as to how they should be processed. The objective is to decouple the inherent structure of the document from any particular treatment or rendition of it. Such markup is often described as "semantic". --from Wikipedia

Markup Languages “is a system for annotating a text in a way which is syntactically distinguishable from that text.”annotating Using tags: content to be rendered Or a keyword in brackets to distinguish texts --from Wikipedia

HTML Hypertext Markup Language --language used to mark up webpages --both descriptive and processing

HTML encoding Hello HTML Hello World!

NBII in HTML Heterozygotes BT Genotypes NT Carriers (genetics) RT Heterozygosity RT Homozygotes SC LSC LifeSciences Homozygotes BT Genotypes RT Heterozygotes RT Homozygosity SC LSC LifeSciences ;

XML Extensible Markup Language --Created by the World Wide Web Consortium (W3C). --Used to mark up documents on the internet or electronic documents. --Users get to describe the tags that are used and define how they are used.

XML encoding

NBII in XML Zygotes Ookinetes Ova Oocysts Hemizygosity Reproduction Zygosity ASF Aquatic Sciences and Fisheries LSC Life Sciences Approved Descriptor

RDF Resource Description Framework “is a family of World Wide Web Consortium (W3C) specifications originally designed as a metadata data model. It has come to be used as a general method for conceptual description or modeling of information that is implemented in web resources, using a variety of syntax formats”World Wide Web Consortiumspecificationsmetadatadata model --from Wikipedia

RDF data model is similar to Entity-Relationship or Class diagrams,Entity-RelationshipClass diagrams statements about resource in subject-predicate- object expressions called “triples”.statements subject = resource predicate = traits or aspects of the resource and expresses a relationship between the subject and the object.

The sky has the color blue RDF triple: a subject denoting "the sky“ a predicate denoting "has the color” an object denoting "blue”

OWL Web Ontology Language --knowledge representation language for displaying ontologies working with logic

SKOS Family of languages used to describe thesauri, controlled vocabulary, subject headings, and taxonomies.

NBII in SKOS/RDF Ookinetes Zygotes ASF Aquatic Sciences and Fisheries LSC LifeSciences

Basic SKOS Tags Skos:concept Skos:prefLabel Skos:altLabel Skos:broader Skos:narrower Skos:related

SKOS tags SN Scope Note = skos:scopeNote USE Use = skos:prefLabel UF Used For =skos:altLabel BT Broader Term = skos:broader NT Narrower Term = skos:narrower RT Related Term = skos:related Each entry term has a skos:concept

Terms vs. Concepts? Example: Table Lexical level : Table Conceptual level :

What is a SKOS Concept? Zygotes BT Ova NTOocysts RTHemizygosity RTReproduction RTZygosity UFOokinetes All these relationships make up a SKOS concept

Projects Using SKOS: Library of Congress Europeana HIVE

EXPERIMENTING WITH SKOS Instructions: SKOS tags can easily be mapped to identifiers found in traditional thesauri. For this activity try mapping basic SKOS tags to an TGM: Subject Terms excerpt.

Section 4: From SKOS to HIVE

Overview HIVE—Helping Interdisciplinary Vocabulary Engineering Motivation—Dryad repository HIVE—Goals, status, and design A scenario Usability Conclusion and questions

61 HIVE model  approach for integrating discipline CVs  Model addressing C V cost, interoperability, and usability constraints (interdisciplinary environment)

Motivation

63

American Society of Naturalists American Naturalist Ecological Society of America Ecology, Ecological Letters, Ecological Monographs, etc. European Society for Evolutionary Biology Journal of Evolutionary Biology Society for Integrative and Comparative Biology Integrative and Comparative Biology Society for Molecular Biology and Evolution Molecular Biology and Evolution Society for the Study of Evolution Evolution Society for Systematic Biology Systematic Biology Commercial journals Molecular Ecology Molecular Phylogenetics and Evolution Partner Journals

Dryad’s workflow ~ low burden submission

Vocabulary needs for Dryad Vocabulary analysis – 600 keywords, Dryad partner journals Vocabularies: NBII Thesaurus, LCSH, the Getty’s TGN, ERIC Thesaurus, Gene Ontology, IT IS (10 vocabularies) Facets: taxon, geographic name, time period, topic, research method, genotype, phenotype… Results 431 topical terms, exact matches – NBII Thesaurus, 25%; MeSH, 18% 531 terms (research method and taxon) – LCSH, 22% found exact matches, 25% partial Conclusion: Need multiple vocabularies

Goals, status, and design

HIVE... as a solution Address CV (controlled vocabulary) cost, interoperability, and usability constraints COST: Expensive to create, maintain, and use INTEROPERABILITY: Developed in silos (structurally and intellectually) USABILITY: Interface design and functionality limitations have been well documented

HIVE Goals − Automatic metadata generation approach that dynamically integrates discipline-specific controlled vocabularies encoded with the Simple Knowledge Organisation System (SKOS)Simple Knowledge Organisation System (SKOS) Provide efficient, affordable, interoperable, and user friendly access to multiple vocabularies during metadata creation activities A model that can be replicated —> model and service Three phases of HIVE: 1. Building HIVE - Vocabulary preparation - Server development - Primate Life Histories Working Group - Wood Anatomy and Wood Density Working Group 2. Sharing HIVE empowering information professionals - Continuing education (empowering information professionals) 3. Evaluating HIVE - Examining HIVE in Dryad

HIVE Partners Vocabulary Partners Library of Congress: LCSH the Getty Research Institute (GRI): TGN (Thesaurus of Geographic Names ) United States Geological Survey (USGS): NBII Thesaurus, Integrated Taxonomic Information System (ITIS) Agrovoc Thesaurus Advisory Board Jim Balhoff, NESCent Libby Dechman, LCSH Mike Frame, USGS Alistair Miles, Oxford, UK William Moen, University of North Texas Eva Méndez Rodríguez, University Carlos III of Madrid Joseph Shubitowski, Getty Research Institute Ed Summers, LCSH Barbara Tillett, Library of Congress Kathy Wisser, Simmons Lisa Zolly, USGS WORKSHOPS HOSTS: Columbia Univ.; Univ. of California, San Diego; Univ. of North Texas; Universidad Carlos III de Madrid, Madrid, Spain

HIVE Construction HIVE stores millions of concepts from different vocabularies, and makes them available on the Web by a simple HTTP – Vocabularies are imported into HIVE using SKOS/RDF format HIVE is divided in two different modules: 1. HIVE Core – SKOS/RDF storage and management (SESAME/Elmo) – SMART HIVE – SMART HIVE : Automatic Metadata Extraction and Topic Detection (KEA++) – Concept Retrieval (Lucene) 2. HIVE Web – Web user Interface (GWT—Google Web Toolkit) – Machine oriented interface (SOAP and REST)

A scenario HIVE for scientists, depositors HIVE for information professionals: curators, professional librarians, archivists, museum catalogers

Meet Amy Amy Zanne is a botanist. Like every good scientist, she publishes.

~~~~Amy Amy Zanne is a botanist. Like every good scientist, she publishes. She deposits data in Dryad.

Dryad’s workflow ~ low burden submission

Usability LS and IS students (32 students) - Understanding HIVE: 3.8 on 5 pt. scale - Ease of navigation: Concept cloud a good idea: Represent document accurately: 2.0 (simple HIVE), 3.3 ( smart HIVE) Advisory board (10 members) - Systems/technical folks want integration w/systems, Getty—EAD - Librarians/KO folks, want to see term relationships - Like tag cloud, want relevance percentages - Color, placement of box, labels.. White ; HIVE Team

Usability Formal usability study 4 biologist, 5 information professionals ~ Tasks, usability ratings, satisfaction ranking Average time to search a concept: Librarians: 6.53 minutes Scientists: 3.82 minutes ~ consistent w/research at NIEHS, 2 times as long Average time for automatic indexing sequence Librarians: 1.91 minutes Scientists: 2.1 minutes Huang, 2010

System usability and flow metrics Huang, 2010

Challenges Building vs. doing/analysis Source for HIVE generation, beyond abstracts Combining many vocabularies during the indexing/term matching phase is difficult, time consuming, inefficient. NLP and machine learning offer promise Interoperability = dumbing down ontologies Proof-of-concept/ illustrate the differences between HIVE and other vocabulary registries (NCBO and OBO Foundry) General large team logistics, and having people from multiple disciplines (also the ++)

Summary and next steps Open source, customizable, SKOS, + hybrid metadata generation Research and evaluation Team project relating to Dryad Hollie White--dissertation Lesley Skalla--master’s paper Craig Willis– MeSH/SKOS conversion Curator interface design Workshop evaluation User’s and developer’s groups on “Google Groups” Long Term Ecological Research (LTER) Network (

Exploring HIVE

Questions /Comments Hollie White Ryan Scherle Jane Greenberg Craig Willis