1 Enhancing search An update on taxonomies, metadata and thesauri Leonard Will Willpower Information.

Slides:



Advertisements
Similar presentations
1 Senn, Information Technology, 3 rd Edition © 2004 Pearson Prentice Hall James A. Senns Information Technology, 3 rd Edition Chapter 7 Enterprise Databases.
Advertisements

Current design issues for digital archives Robert Munro (presented by David Nathan) Endangered Languages Archive (ELAR), School of Oriental and African.
Chapter 1: The Database Environment
Chapter 7 System Models.
Copyright © 2003 Pearson Education, Inc. Slide 7-1 Created by Cheryl M. Hughes, Harvard University Extension School Cambridge, MA The Web Wizards Guide.
Copyright © 2003 Pearson Education, Inc. Slide 8-1 Created by Cheryl M. Hughes, Harvard University Extension School Cambridge, MA The Web Wizards Guide.
Copyright © 2003 Pearson Education, Inc. Slide 3-1 Created by Cheryl M. Hughes The Web Wizards Guide to XML by Cheryl M. Hughes.
1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 1 Embedded Computing.
OvidSP Flexible. Innovative. Precise. Introducing OvidSP Resources.
Properties Use, share, or modify this drill on mathematic properties. There is too much material for a single class, so you’ll have to select for your.
Putting the Pieces Together Grace Agnew Slide User Description Rights Holder Authentication Rights Video Object Permission Administration.
UNITED NATIONS Shipment Details Report – January 2006.
Introduction to HTML, XHTML, and CSS
Properties of Real Numbers CommutativeAssociativeDistributive Identity + × Inverse + ×
Putting together a METS profile. Questions to ask when setting down the METS path Should you design your own profile? Should you use someone elses off.
UKOLN, University of Bath
An overview of collection-level metadata Applications of Metadata BCS Electronic Publishing Specialist Group, Ismaili Centre, London, 29 May 2002 Pete.
Programming Language Concepts
Computer Literacy BASICS
1 AirWare : urban and industrial air quality assessment and management Release R5.3 beta DDr. Kurt Fedra Environmental Software & Services GmbH A-2352.
© Paradigm Publishing, Inc Access 2010 Level 1 Unit 1Creating Tables and Queries Chapter 2Creating Relationships between Tables.
Creating Tables in a Web Site
Microsoft Access.
AEMCPAGE Relaunch 1 June 2009.
Introducing WebDewey 2.0. Introducing WebDewey 2.0.
Access Tables 1. Creating a Table Design View Define each field and its properties Data Sheet View Essentially spreadsheet Enter fields You must go to.
Dr. Lorayne Robertson, UOIT
Name Convolutional codes Tomashevich Victor. Name- 2 - Introduction Convolutional codes map information to code bits sequentially by convolving a sequence.
How to convert a left linear grammar to a right linear grammar
Factor P 16 8(8-5ab) 4(d² + 4) 3rs(2r – s) 15cd(1 + 2cd) 8(4a² + 3b²)
1..
CONTROL VISION Set-up. Step 1 Step 2 Step 3 Step 5 Step 4.
© 2012 National Heart Foundation of Australia. Slide 2.
Executional Architecture
Macromedia Dreamweaver MX 2004 – Design Professional Dreamweaver GETTING STARTED WITH.
Pasewark & Pasewark Microsoft Office XP: Introductory Course 1 INTRODUCTORY MICROSOFT WORD Lesson 8 – Increasing Efficiency Using Word.
Model and Relationships 6 M 1 M M M M M M M M M M M M M M M M
Chapter 2 Entity-Relationship Data Modeling: Tools and Techniques
Chapter 10: The Traditional Approach to Design
Analyzing Genes and Genomes
Systems Analysis and Design in a Changing World, Fifth Edition
Lilian Blot CORE ELEMENTS SELECTION & FUNCTIONS Lecture 3 Autumn 2014 TPOP 1.
To the Assignments – Work in Progress Online Training Course
Setting Product Strategy
©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 12 View Design and Integration.
Chapter 12 Working with Forms Principles of Web Design, 4 th Edition.
Intracellular Compartments and Transport
PSSA Preparation.
Chapter 11 Describing Process Specifications and Structured Decisions
Essential Cell Biology
CINAHL Keyword Searching. This presentation will take you through the procedure of finding reliable information which can be used in your academic work.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 13 Slide 1 Application architectures.
RefWorks: The Basics October 12, What is RefWorks? A personal bibliographic software manager –Manages citations –Creates bibliogaphies Accessible.
1 Distributed Agents for User-Friendly Access of Digital Libraries DAFFODIL Effective Support for Using Digital Libraries Norbert Fuhr University of Duisburg-Essen,
From Model-based to Model-driven Design of User Interfaces.
South Dakota Library Network MetaLib User Interface South Dakota Library Network 1200 University, Unit 9672 Spearfish, SD © South Dakota.
What’s new in WebSpace Changes and improvements with Xythos 7.2 Effective June 24,
© Copyright 2011 John Wiley & Sons, Inc.

The RDF meta model: a closer look Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations.
1. 2 Module 7 Content and knowledge Management Objectives To provide basic concepts and knowledge of Content Management to CIOs and explore the applicability.
Metadata: Its Functions in Knowledge Representation for Digital Collections 1 Summary.
8/28/97Organization of Information in Collections Introduction to Description: Dublin Core and History University of California, Berkeley School of Information.
Definition of a taxonomy “System for naming and organizing things into groups that share similar characteristics” Taxonomy Architectures Applications.
Metadata and Documentation Iain Wallace Performing Arts Data Service.
The RDF meta model Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations of XML compared.
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
Some basic concepts Week 1 Lecture notes INF 384C: Organizing Information Spring 2016 Karen Wickett UT School of Information.
Presentation transcript:

1 Enhancing search An update on taxonomies, metadata and thesauri Leonard Will Willpower Information

2 Summary 1Metadata creation is cataloguing 2Taxonomies are classifications 3Thesauri and classifications are complementary ways of grouping concepts 4Facet analysis is a useful technique for constructing schemes systematically 5Most computer search interfaces are inadequate

3 Metadata = catalogue records Resources: any things that can be identified –documents, web pages, images, sound files, teaching packages, books, museum objects, people, organisations Metadata: structured information about resources –May be included with resources (e.g. “CIP”) or collected in separate “union catalogues” (e.g. OAI-PMH) –Some from the resource itself (size, format), some from external sources (provenance, location, accessibility)

4 Metadata standards Anglo-American Cataloguing Rules (AACR) Encoded Archival Description (EAD) Learning Object Metadata (LOM) Spectrum standard for museum information Friend of a Friend (FOAF) and vCard e-Government Metadata Standard (eGMS) Dublin Core - lowest common denominator

5 Kinds of standards Content standards: which pieces of information are to be recorded (DC, AACR) Value standards: how is the information to be recorded (= DC encoding schemes) –formats (ISO date format, NCA name formats, AACR) –lists of valid values (thesauri, authority files) Structure standards: how the information is to be grouped and labelled for use by computers and humans (XML schemas, MARC) Application profiles: Choices from the above

6 Dublin Core metadata Title Creator Subject Description Publisher Contributor Date Type Format Identifier Source Language Relation Coverage Rights + element refinements

7 Subject “Typically, Subject will be expressed as keywords, key phrases or classification codes that describe a topic of the resource. Recommended best practice is to select a value from a controlled vocabulary or formal classification scheme.”

8 Taxonomies = controlled vocabularies “Taxonomy”: woolly meaning -> confusion –keep it for biological classification systems Knowledge organization systems (KOS) –a better expression for the general concept Main types are –thesauri –classification schemes –ontologies

9 Thesauri and classification schemes Thesauri and classification schemes are alternative ways of showing concepts and their relationships They are complementary and both approaches are needed They can both be built on the principles of facet analysis

10 Building blocks of all knowledge organisation schemes concepts relationships 35 m cameras CC:H012 BT:film cameras aqualungs CC:D002 BT:diving equipment camera accessories CC:H002 BT:photographic equipment NT:flash guns light meters tripods RT:cameras

11 Relationships are between concepts, not words BT NT vehicles road vehicles conveyances voitures cars automobiles autos private cars Choose one term as a descriptor to label the concept: cars USE automobiles

12 Preferred term substitution Anything on farming? I use the term agriculture for farming, so I’ll search for that

13 Relationships between concepts Paradigmatic, or a priori: apply generally, independently of any specific document –shoes BT footwear –shoes RT shoemakers Syntagmatic, or a posteriori: concepts that are related only in the context of a specific document –shoes : history –shoes : prices A thesaurus can show these A classification scheme can also show these

14 Searching hierarchies I need information on road vehicles I know that buses,cars and lorries are all kinds of road vehicles, so I’ll search for these terms as well as for road vehicles

15 Searching related terms Please give me information about agriculture OK,I’ll look for that. Would you also be interested in items dealing with forestry, livestock or pet breeding?

16 Paradigmatic relationships in a thesaurus Many relationships are indicated as RT/RT, but their nature is not specified, so cannot be used for systematic grouping (ontologies overcome this) Hierarchical generic-specific relationship (BT/NT) allows (requires) grouping of concepts into facets - the terms have to be in the same facet

17 What is a facet? (Sometimes called a fundamental facet) A high-level grouping of concepts of the same inherent category, e.g. activities, disciplines, people, materials, places, times. For example:  animals, mice, daffodils and bacteria could all be members of a living organisms facet;  digging, writing and cooking could all be members of an activities facet;  birthdays, wars and football matches could all be members of an events facet. A concept cannot belong to more than one facet

18 Facets in the AAT associated (i.e. abstract) concepts physical attributes styles and periods agents activities materials objects

19 A grouping of concepts within a facet by some stated characteristic of division. vehicles  bicycles  tricycles  four-wheeled vehicles automobiles  goods vehicles lorries  passenger vehicles automobiles buses What is an array? (Sometimes called a subfacet) Node labels showing characteristics of division Array A concept may occur in more than one array

20 Parametric search Searching for resources that have one or more specified characteristics e.g. vehicles which –have three wheels AND –are used for carrying passengers This is an important and useful aspect of post-coordinate searching, but it is not faceted classification

21 Ways of displaying concepts and their paradigmatic relationships 1. Alphabetically, with their relationships 35 mm cameras BT:film cameras aqualungs BT:diving equipment camera accessories BT:photographic equipment NT:flash guns light meters tripods RT:cameras

22 Ways of displaying concepts and their paradigmatic relationships 2. Hierarchically - one tree for each facet (fields of work). diving. photography. physics.. optics (people). infants. children. adults. divers. models (people). photographers. physicists (equipment). diving equipment.. aqualungs.. diving suits... dry suits... wet suits.. face masks. photo equipment.. cameras

23 Ways of displaying concepts and their paradigmatic relationships 3. In subject groups or categories (microthesauri) –one tree for each facet in each category (fields of work). diving.. scuba diving.. snorkel diving (people). divers (equipment). diving equipment.. aqualungs.. diving suits... dry suits (fields of work). photography.. colour photography (people). models (people). photographers (equipment). photo equipment.. cameras : DIVING 770: PHOTOGRAPHY

24 Combining concepts : syntagmatic relationships (places) A1Italy A2The Netherlands A3Russia (people) B1potters B2repairers B3ceramicists (activities) C1moulding C2throwing C3decoration (objects) D1earthenware D2porcelain D3stoneware Combine to express compound subjects - either post-coordinate, for searching: porcelain AND decoration AND Russia or pre-coordinate, for browsing: porcelain decoration in Russia: D2C3A3 Node labels showing facet names

25 Order of combining facets thing - kind - part - property - material - process - operation - system operated on - product - by- product - agent - space - time - form e.g. porcelain (thing) - decoration (process) - in Russia (space) A facet may occur more than once in a string

26 Faceted classification with processes subordinated to objects (processes) A ceramic production processes in general AAforming in general AAAcoiling AABmoulding AACthrowing AB decoration in general ABAglazing ABBtransfer printing (objects) B ceramics in general BBearthenware in general (processes) BB.AA forming of earthenware BB.AAB moulding of earthenware BB.AB decoration of earthenware BB.ABA glazing of earthenware BB.ABB transfer printing of earthenware BC porcelain in general (processes) BC.AA forming of porcelain BC.AAB moulding of porcelain Words shown in blue may be omitted as they are implied by the hierarchical structure

27 Faceted classification generation of subject strings (objects) B ceramics BBearthenware (processes) BB.AA forming BB.AAB moulding BB.AB decoration BB.ABA glazing BB.ABB transfer printing BC porcelain (processes) BC.AA forming BC.AAB moulding ceramics > earthenware > forming ceramics > earthenware > forming > moulding ceramics > earthenware > decoration ceramics > earthenware > decoration > glazing ceramics > earthenware > decoration > transfer printing ceramics > porcelain ceramics > porcelain > forming ceramics > porcelain > forming > moulding

28 Alphabetical index ceramic production processesA ceramicsB coiling : forming : ceramic productionAAA decoration : ceramic productionAB decoration : earthenware : ceramicsBB.AB earthenware : ceramicsBB forming : ceramic productionAA forming : earthenware : ceramicsBB.AA forming : porcelain : ceramicsBC.AA glazing : decoration : ceramic productionABA glazing : decoration : earthenware : ceramicsBB.ABA moulding : earthenware : ceramicsBB.AAB moulding : forming : ceramic productionAAB moulding : porcelain : ceramicsBC.AAB porcelain : ceramicsBC throwing : forming : ceramic productionAAC transfer printing : decoration : ceramic productionABB transfer printing : decoration : earthenware : ceramicsBB.ABB

29 The same concepts viewed in different ways Thesaurus view  Good for searching if you know what you want  Like a gazetteer  Like a book’s index  Gets quickly to individual concepts  Usually arranged by facet  Shows paradigmatic relationships  Lets you combine concepts when searching Classification view  Good for browsing or surveying a topic  Like a map  Like a book’s contents page  Shows related concepts together  Usually arranged by discipline  Shows syntagmatic and paradigmatic relationships  Shows compound topics as pre-combined subject strings

30 Some clarifications A classification can be both hierarchical and faceted A classification built on faceted principles can be enumerative A symbolic notation is not essential, and should not determine the structure A classification can arrange compound topics in a useful linear sequence - a thesaurus cannot One-to-one mapping between a thesaurus and a classification is not possible A “guide to popular topics” may be used to supplement a systematic classification

31 Use of a thesaurus A thesaurus as a search aid with unindexed material –Allows searching on terms linked to the term asked for Software support for formulating questions –Browsing the thesaurus to choose terms –Combining terms with AND, OR, NOT and ( )

32 An ambiguous search interface Does this mean: (lorries OR cars) AND diesel ? or does it mean: lorries OR (cars AND diesel) ?

33 Thesaurus creation and management Standards –BS/ISO standards give helpful guidance –Draft revised BS standard now out for comments Software –Many packages available –Best if integrated with database used for cataloguing Cooperative thesaurus development and use –DIY is a major and continuing task

34 Thesaurus development never ends It is an ongoing task It needs a knowledgeable thesaurus editor It needs cooperation and input from indexers and users User feedback

35 What we need Software for the combined development of thesaurus and classification –Thesaurofacet; Classaurus; ROOT; Bliss; Taxomita Software support for combining facets when searching, using a thesaurus. Often referred to as faceted classification, but not the same thing –Flamenco; View-based searching; No zero match (NZM) Software support for browsing in a classified catalogue with notation, captions and an alphabetical index

36 Links and further information