IBE312: Information Architecture 2013 Ch. 9 – Metadata Many of the slides in this slideset are reproduced and/or modified content from publically available.

Slides:



Advertisements
Similar presentations
Taxonomy as Content Outline, Site Map and Search Aid SLA NWR Vancouver October 6, 2006 Marjorie M.K. Hlava President
Advertisements

Alexandria Digital Library Project Integration of Knowledge Organization Systems into Digital Library Architectures Linda Hill, Olha Buchel, Greg Janée.
Advanced Information Systems Laboratory Department of Computer Science and Systems Engineering GI-DAYS MÜNSTER A software tool.
INFM 700: Session 6 Taxonomies and Metadata Paul Jacobs The iSchool University of Maryland Wednesday, Feb. 29, 2012 This work is licensed under a Creative.
Taxonomies and Classification for Organizing Content Prentiss Riddle INF 385E 9/21/2006.
Taxonomies, Lexicons and Organizing Knowledge Wendi Pohs, IBM Software Group.
INFM 700 Course Review Paul Jacobs The iSchool University of Maryland May 2, 2012 This work is licensed under a Creative Commons Attribution-Noncommercial-Share.
6. Applying metadata standards: Controlled vocabularies and quality issues Metadata Standards and Applications Workshop.
Leveraging Your Taxonomy to Increase User Productivity MAIQuery and TM Navtree.
The Role of the UMLS in Vocabulary Control CENDI Conference “Controlled Vocabulary and the Internet” Stuart J. Nelson, MD.
Text Operations: Preprocessing. Introduction Document preprocessing –to improve the precision of documents retrieved –lexical analysis, stopwords elimination,
WMES3103 : INFORMATION RETRIEVAL
Using Metadata in CONTENTdm Diana Brooking and Allen Maberry Metadata Implementation Group, Univ. of Washington Crossing Organizational Boundaries Oct.
CSE 730 Information Retrieval of Biomedical Data The use of medical lexicon in biomedical IR.
1 Languages for aboutness n Indexing languages: –Terminological tools Thesauri (CV – controlled vocabulary) Subject headings lists (CV) Authority files.
Knowledge organisation and information architecture, Nils Pharo Knowledge organisation and the Web Nils Pharo, 6th November 2002.
INFM 700: Session 4 Metadata Jimmy Lin The iSchool University of Maryland Monday, February 18, 2008 This work is licensed under a Creative Commons Attribution-Noncommercial-Share.
IBE312: Ch15 Building an IA Team & Ch16 Tools & Software 2013.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Unified Medical Language System® (UMLS®) NLM Presentation Theater MLA 2007 National Library of Medicine National Institutes of Health U.S. Dept. of Health.
Taxonomies: Hidden but Critical Tools Marjorie M.K. Hlava President Access Innovations, Inc.
Controlled Vocabulary & Thesaurus Design Planning & Maintenance.
Terminology services and the DDC: the High-Level Thesaurus and beyond Presented to the symposium Dewey goes Europe: on the use and development of the Dewey.
Indexing Knowledge Daniel Vasicek 2014 March 27 Introduction Basic topic is : All Human Knowledge Who Cares? Simple Examples.
INFM 700 Course Review Paul Jacobs The iSchool University of Maryland Tuesday, May 5, 2009 This work is licensed under a Creative Commons Attribution-Noncommercial-Share.
CSC271 Database Systems Lecture # 4.
Unified Medical Language System® (UMLS®) NLM Presentation Theater MLA 2005 May 16 & 17, 2005 Rachel Kleinsorge.
IBE312: Information Architecture Summary Information Architecture: Part I - Introduction.
Information retrieval wed sept data…. -start at 6.45.
Vocabularies in the VO Alasdair J G Gray Norman Gray Iadh Ounis.
D4: SKOS and HIVE—Enhancing the Creation, Design and Flow of Information Speakers: Hollie White Jane Greenberg Coordinator: Alan Keely.
Knowledge Representation and Indexing Using the Unified Medical Language System Kenneth Baclawski* Joseph “Jay” Cigna* Mieczyslaw M. Kokar* Peter Major.
1 st June 2006 St. George’s University of LondonSlide 1 Using UMLS to map from a Library to a Clinical Classification: Improving the Functionality of a.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
Real World Case Study KM Summer Institute June Rano Joshi, Vorsite.
AAT Art & Architecture Thesaurus. Diffuse list of museum standards
Controlled Vocabulary & Thesaurus Design Hierarchies & Taxonomies.
Metadata. Generally speaking, metadata are data and information that describe and model data and information For example, a database schema is the metadata.
Definition of a taxonomy “System for naming and organizing things into groups that share similar characteristics” Taxonomy Architectures Applications.
UMLS Unified Medical Language System. What is UMLS? A Unified knowledge representation system Project of NLM Large scale Distributed First launched in.
The UNESCO Thesaurus Meeting for Managers of UNESCO Documentation Networks Meron Ewketu UNESCO Library June
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
Semantic web course – Computer Engineering Department – Sharif Univ. of Technology – Fall Knowledge Representation Semantic Web - Fall 2005 Computer.
Sharing Ontologies in the Biomedical Domain Alexa T. McCray National Library of Medicine National Institutes of Health Department of Health & Human Services.
SKOS. Ontologies Metadata –Resources marked-up with descriptions of their content. No good unless everyone speaks the same language; Terminologies –Provide.
Information Architecture & Design Week 5 Schedule -Planning IA Structures -Other Readings -Research Topic Presentations Nadalia your Presentations.
Controlled Vocabulary & Thesaurus Design Hierarchies.
IMT530- Organization of Information Resources1 Feedback Lectures –More practical examples –Like guest lecturers –Generally helpful in understanding concepts.
A Patent Document Retrieval System Addressing Both Semantic and Syntactic Properties Liang Chen*,Naoyuki Tokuda+, Hisahiro Adachi+ *University of Northern.
Achieving Semantic Interoperability at the World Bank Designing the Information Architecture and Programmatically Processing Information Denise Bedford.
Controlled Vocabulary & Thesaurus Design Associative Relationships & Thesauri.
MSG Reuse Catalog T.W. van den Berg 7 April 2010.
Charlyn P. Salcedo Instructor Types of Indexing Languages.
The Agricultural Ontology Server (AOS) A Tool for Facilitating Access to Knowledge AGRIS/CARIS and Documentation Group Food and Agriculture Organization.
1 How do we describe something? n What something is about? –What the content of an object is “about”? n Different methods (Wilson, 1968) –counting terms.
Some basic concepts Week 1 Lecture notes INF 384C: Organizing Information Spring 2016 Karen Wickett UT School of Information.
Information organization Week 2 Lecture notes INF 380E: Perspectives on Information Spring 2015 Karen Wickett UT School of Information.
Controlled Vocabularies Ilia State University, July 2010 Elisabeth Jijavadze, Natia Gabrichidze 1.
Information organization Week 2 Lecture notes INF 380E: Perspectives on Information Spring 2015 Karen Wickett UT School of Information.
UNIFIED MEDICAL LANGUAGE SYSTEMS (UMLS)
Information Organization
DATA MODELS.
Networking and Health Information Exchange
Taxonomies, Lexicons and Organizing Knowledge
Search Techniques and Advanced tools for Researchers
Chapter 2 Database Environment.
Introduction to Semantic Metadata & Semantic Web
MANAGING DATA RESOURCES
INFM 700: Session 6 Taxonomies and Metadata
Taxonomies and Classification for Organizing Content
Presentation transcript:

IBE312: Information Architecture 2013 Ch. 9 – Metadata Many of the slides in this slideset are reproduced and/or modified content from publically available slidesets by Paul Jacobs (2012), The iSchool, University of Maryland These materials were made available and licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States See for details.

2 Metadata “Data about data” - Definitional and descriptive documentation/information about data… From Free On-line Dictionary of Computing: Data about data. In data processing, meta-data is definitional data that provides information about or documentation of other data managed within an application or environment. For example, meta-data would document data about data elements or attributes, (name, size, data type, etc) and data about records or data structures (length, fields, columns, etc) and data about data (where it is located, how it is associated, ownership, etc.). Meta-data may include descriptive information about the context, quality and condition, or characteristics of the data. (Some other definitions.)Some otherdefinitions

Metadata Why do we need this? Types of metadata – Descriptive/subjective/content (e.g. author, subject, keywords, …) – Administrative (e.g. owner, rights, cost, creation date, version, …) – Technical (e.g. format, size, dependencies, programs) –.... In practical terms: – Metadata helps users locate, navigate, interpret content – Metadata helps organizations manage content – Metadata helps systems manipulate content

Data without Metadata… Who: authored it? to contact about data? What: are contents of database? When: was it collected? processed? finalized? Where: was the study done? Why: was the data collected? How: were data collected? processed? Verified? … can be pretty useless!

Early Example of Metadata

Menagerie of Terms Classification Hierarchies Epistemology Directories Controlled vocabularies Knowledge representation Let’s focus on significant differences. Let’s focus on advantages/disadvantages. Let’s focus on how each is useful.

7 Controlled Vocabulary Any defined subset of natural language List of equivalent terms (synonym rings) – Use search logs. List of preferred terms (authority files) – Commonly also include variant terms – Educating users, enabling browsing – Term rotation (pointers in index) p.201 Classification scheme / taxonomy – Hierarchical relationships (narrower/broader)

Controlled Vocabulary Queries can be ”exploded” to increase recall

Controlled Vocabulary authority file – inclusive, preferred term can serve as the unique identifier for a collection of terms, educate users

Related Terms & Techniques Taxonomies – Anything organized in some sort of hierarchical structure Tagging – Adding almost any kind of metadata to content, but now often descriptive and user-provided Thesauri – Focus on relations between terms – Focus on “concepts” Ontologies – Usually model a specific domain or part of the world – Generally machine-readable Increasing complexity and richness Metadata Taxonomies & Thesauri Practical Uses

How are taxonomies, tagging, controlled vocabularies and thesauri used? The semantic gap: What’s the problem? – Synonymy – roughly, different words or phrases can be used to express similar ideas (e.g. “notebook”, “laptop”) – Polysemy – roughly, the same word can have different meanings (e.g., “line” (fishing, code, queue,...) ) Taxonomies try to group similar concepts “Tags” often assign words to concepts, making it easier to find related concepts Controlled vocabularies avoid ambiguity (like a specific tag set) Thesauri represent attempts to better organize mappings between words and concepts Do these present precision or recall problems?

Taxonomies – Organization of objects according to some principle – Familiar examples: Linnaean taxonomy (for living organisms) Web directories (e.g., Yahoo or ODP) Corporate directories Organization charts Organizational structures previously discussed Metadata Taxonomies & Thesauri Practical Uses

Tagging- e.g. Flickr – popular tags Metadata Taxonomies & Thesauri Practical Uses

Flickr – related tags Metadata Taxonomies & Thesauri Practical Uses

Del.icio.us – related tags Metadata Taxonomies & Thesauri Practical Uses

Thesauri: Motivation “Semantic gap” between concepts and words Online thesauri help mapping many synonyms or word variants onto one preferred term – improve precision in retrieval (p.203) Words are used to evoke concepts – Concrete objects: MacBook Pro, iPhone – Abstract ideas: freedom, peace Concepts Words Ideas Meaning

17 Thesauri Book of synonyms, often including related and contrasting words and antonyms. In this class: – A controlled vocabulary in which equivalence, hierarchical, and associative relationships are identified for purposes of improved retrieval. Technical lingo … Thesauri standards: ISO 2788, …

18 Thesauri Types

IA Uses of Thesauri For organization For navigation For indexing content For searching

Applying IA Principles Focus on users and user needs – users are different, and have different models Focus on content – concepts are different, too – different levels, words, complexity, vagueness Examples: – What’s the difference between laptop, PDA, phone, and convergence device? – When is “cancer research” “oncology”? – When a user browses a furniture catalog for chairs, do you show them ottomans and footstools?

Standard Thesaurus Structure Computer Notebook Laptop Desktop Replacement UltraportableTablet PC IS-A AKA Synonyms (variants) Narrower Terms Broader Terms Preferred

Semantic relationships in a thesaurus ( pp ): Abbreviations: PT, VT, BT, NT, RT, Use (U) – VT use PT, Use For (UF) – full list of VT on the PT record, Scope Note (SN) – meaning of the term to rule out ambiguity.

Semantic relationships of a wine thesaurus, p. 206

Some Real Examples Content tagging and social media (e.g. flickr, del.i.cious) Special-purpose classification schemes and thesauri (e.g. art & architecture thesaurus – AAT, UMLS) General semantic tools and classification schemes (e.g., Princeton WordNet, Roget’s Thesaurus)

Art & Architecture Thesaurus Metadata Taxonomies & Thesauri Practical Uses

UMLS (Unified Medical Labeling System) Source: National Library of Medicine (NIH) Metathesaurus Semantic Network SPECIALIST Lexicon +Tools 135 broad categories and 54 relationships between them 1 million+ biomedical concepts from over 100 sources lexical information and programs for language processing 3 Knowledge Sources used separately or together Metadata Taxonomies & Thesauri Practical Uses

E.g. UMLS (Unified Medical Labeling System) Source: National Library of Medicine (NIH) Metadata Taxonomies & Thesauri Practical Uses Began in 1986 as long-term R&D project  Designed for systems developers  Develop multi-purpose tools to enhance understanding of medical meaning across systems  Overcome barriers to effective retrieval of machine-readable information  Overcome variety of ways the same concepts are expressed in machine readable and human language

UMLS Uses Source: National Library of Medicine (NIH) Metadata Taxonomies & Thesauri Practical Uses  Information retrieval  Thesaurus construction  Natural language processing  Automated indexing  Electronic health records (EHR)  Distribution mechanism for  HIPAA, CHI, PHIN regulatory standards  SNOMED CT

UMLS Metathesaurus

UMLS Metathesaurus

UMLS Thesaurus Browser

32 Semantic Relationships Equivalence (PT = VT) Hierarchical: Generic (Bird NT Magpie), whole-part (Foot NT big toe) or instance (Seas NT Mediterranean Sea) – Faceted / multiple hierarchies Associative – Related terms (hammer RT nail) Preferred terms: – Form, selection, definition and specificity Polyhierarchy (Medline corss-lists viral pneumonia under both...Fig 9-25, p. 220) Faceted classification – multiple taxonomies that focus on different dimensions of the content. (e.g. wine.com pp )

Associative Term

Poly-Hierarchies Concepts can have multiple parents Example: What are the advantages and disadvantages? What’s the relationship to polysemy? Cracow (Poland : Voivodship) Auschwitz II-Birkenau (Poland : Death Camp) Block 25 (Auschwitz II-Birkenau) German death camps Kanada (Auschwitz II-Birkenau) From Shoah Foundation’s thesaurus of holocaust terms

Faceted Hierarchies Alternative to single and poly-hierarchies Basic idea: – Describe objects along multiple facets – Each facet has its associated hierarchy Issues: – What’s a facet? – How do you navigate faceted hierarchies?

Faceted Browsing Example

Demo:

Advantages of Facets Integrates searching and browsing Easy to build complex queries Easy to narrow, broaden, shift focus Helps users avoid getting lost Helps to prevent “categorization wars”

Relationship to IA? Database Web Server Application Server Network Ontologies are implicitly “hidden” here!!! Flight Trip From: Part-of Airplane Equipment To: Departure Time: Arrival Time: Origin: Destination: Type: Capacity: Rule: Arrival Time is always after Departure Time Rule: Distance from Origin to Destination typical > 100 miles

Putting it all together… Database Web Server Application Server Network Database Web Server Network Two-Layer Architecture Three-Layer Architecture Apache mySQL PHP

Popular Implementation Content Metadata Presentation SQL Database PHP/HTML

Content  Presentation A BC DEF GH You are here: A > C > D Contents at D Related - D - E Hierarchy(child, parent)Content(id, attribute 1, attribute 2, attribute 3, …)

Faceted Browsing Matching Results Filter by - Facet 1 (possible values) - Facet 2 (possible values) Hierarchy(child, parent)Content(id, attribute 1, attribute 2, attribute 3, …)

Summary Meta-data – General function – Types of meta-data Taxonomies and Thesauri – Role in organizing, navigating and searching content – General-purpose taxonomies – Special-purpose taxonomies Practical use & implementation