Download presentation
Presentation is loading. Please wait.
Published byLindsay Walton Modified over 9 years ago
1
1 Metadata Andy Powell Technical Development and Research UKOLN University of Bath http://www.ukoln.ac.uk/ a.powell@ukoln.ac.uk
2
2 Metadata What is metadata? an introduction The Dublin Core metadata for the Web Metadata management Models for dealing with Web-site metadata UKOLN metadata projects overviews (and problems)
3
3 What is metadata? by definition:..data about data....data which provides information about a resource.. by example: title, author, subject classification, shelf mark digital format, terms and conditions, location (URL)
4
4 What is metadata? (2) by usage: Resource discovery –Searching, location –Authentication –Quality/rating Semantic interoperability Resource management User interface –Grouping resources for printing –3-D visualisations
5
5 Range of formats Dublin Core IAFA SOIF MARC TEI headers CIMI SimpleRich robot generated hand crafted Alta Vista NetFirst Lycos
6
6 Where is metadata? Embedded within resource HTML tags Linked to resource Remote database distributed union (centralised)
7
7 Who creates metadata? Publisher side author webmaster institution Service side search service third party creators robot generated hand crafted
8
8 Dublin Core 15 element core metadata set Primarily intended to aid resource discovery on the Web Main usage currently embedded into HTML META tags All elements optional and repeatable Status? Agreed syntax for embedding in HTML Still discussion about the use of some of the elements http://www.ukoln.ac.uk/metadata/resources/dc.html
9
9 Dublin Core History 4 DC meetings Dublin, Warwick, Dublin, Canberra (DC-5 - Helsinki coming soon) Mailing list discussions meta2@lut.ac.uk W3C interest RDF (PICS-NG), MCF Various projects Still no significant interest yet from the big search engines :-(
10
10 DC Elements - 1 Title Subject intended to promote use of controlled vocabularies but in practice likely to be used for uncontrolled list of keywords Description abstract Creator Publisher
11
11 DC Elements - 2 Contributor Date the date ‘the resource was made available in its present form’. Agreed default format uses subset of ISO 8601, e.g. 1997-09-15 Type category of resource - document, image, sound, home page, novel, poem, etc. Still much discussion about the content of this element Format MIME type Identifier
12
12 DC Elements - 3 Source Language language of the resource - NOT the metadata Relation no guidelines for usage currently Coverage separate working party looking at usage Rights rights management seen as too complex for DC. This will give a URL to some external information
13
13 Simple Example UKOLN Home Page...
14
14 Element qualifiers Need to refine meaning in some cases TYPE Refines meaning of element - sub-divides element namespace SCHEME Element value taken from external schema, e.g. LCSH for DC.subject, Z39.53 for DC.language LANGUAGE Language of element value (not of the resource being described!)
15
15 Examples - TYPE Original DC.creator tag Non-personal author Author’s email address
16
16 Examples - SCHEME Library of Congress Subject Heading … or …
17
17 Metadata Management Practical issues of using Dublin Core for Internet resource description... UKOLN metadata system Requirements 3 models for metadata management Implementation at UKOLN
18
18 UKOLN metadata system requirements Easy to use Work with a variety of methods of creating HTML Simple migration to future metadata formats Separate metadata from resource
19
19 Managing Dublin Core (1) HTML Authoring tool Pros… Simple May be useful for training and familiarisation Cons… May not be possible with all editors Maintenance problems Easy to make errors Embed by hand using HTML or text editor
20
20 DC-dot A Web based tool for creating Dublin Core tags Automatic generation of some tags based on content of the resource Forms based editing of tags Cut-and-paste output into HTML Conversion to other formats… SOIF, ROADS/WHOIS++, USMARC, GILS... http://www.ukoln.ac.uk/metadata/dcdot/
21
21 Managing Dublin Core (2) Web-site management tool Pros… Use of Web-site management tools likely to increase Object-oriented database approach Cons… Proprietry formats Early days - too early to evaluate use for metadata yet? Use Web-site management tool, for example NetObjects Fusion
22
22 Managing Dublin Core (3) On the fly generation Pros… Separates metadata from resource Future migration fairly simple Cons… Performance Lack of integration with HTML tools Server specific Hold Dublin Core separately and embed on-the-fly using server-side include (SSI)
23
23 UKOLN metadata system (1) Embed on-the-fly Apache SSI script Store metadata using SOIF records Use MS-Access as tool to create the records Associate metadata with resource by co-locating them in the Web server filestore
24
24 UKOLN metadata system (2) MS-Access Database HTML editor …... …... intro.html @FILE { http://www.ukoln.ac.... keywords{13}: xxx, yyy, zzz description{14}: blah blah b author{13}: Stark, Isobel... } @FILE { http://www.ukoln.ac.... keywords{13}: xxx, yyy, zzz description{14}: blah blah b author{13}: Stark, Isobel... } intro.html.soif Apache syntax for calling server-side script
25
25 UKOLN metadata system (3) MS-Access front end... Filename browser Text boxes Name choosers UKOLN specific metadata
26
26 UKOLN metadata system (4) UKOLN Web server …... …... intro.html intro.html.soif SSI script 2 3 4 5 6 1 @FILE { http://www.ukoln.ac.... keywords{13}: xxx, yyy, zzz description{14}: blah blah b author{13}: Stark, Isobel... } @FILE { http://www.ukoln.ac.... keywords{13}: xxx, yyy, zzz description{14}: blah blah b author{13}: Stark, Isobel... } Web robot
27
27 Issues Performance Interaction with Web caches Dublin Core vs Alta Vista style metadata Granularity Which pages should have metadata?
28
28 What's the point... …of embedding DC tags? Alta Vista isn't going to look for them But, worth doing... within individual projects within specific communities (e.g. eLib) Improve local search facilities e.g. load SOIF records into a Netscape Catalogue Server Web-site management benefits
29
29 UKOLN Metadata projects ROADS Software for Subject Service DESIRE European Web indexing NewsAgent Current awareness service for Library and Information Staff BIBLINK Information flow from publishers to National Bibliographic Agencies
30
30 ROADS Resource Organisation and Discovery in Subject-based Services Web based tools for Subject Services SOSIG, ADAM, OMNI, … Manage and search Internet resource descriptions ROADS templates (based on IAFA templates) WHOIS++ http://www.ukoln.ac.uk/roads/
31
31 ROADS - WHOIS++ (1) Simple client-server search and retrieve protocol Developed originally for ‘white pages’ applications Offer search facilities across several Subject Services Distribute a Subject Service across several physical servers Query routing - centroids and CIP
32
32 ROADS - WHOIS++ (2) Centroid generated by ADAM contains… “you’ll find the string ‘mona’ in the ‘title’ attribute of at least one record in the ADAM database”. CGI-based WHOIS++ client SOSIG OMNI ADAM CIP sharing of centroids Web browser 1 2 3 4 5 6
33
33 DESIRE European Web cataloguing Subject Services EuroSOSIG (Bristol), EELS (Lund), Arts (Koninklijke Bibliotheek) Manually created ROADS templates European Web Index based on Nordic Web Index (NWI) Robot generated, all resources Multiple servers linked with Z39.50 GILS http://www.nic.surfnet.nl/surfnet/projects/desire/desire.html
34
34 DESIRE - current work (1) Internationalisation of ROADS Use of robots to: aid manual cataloguing of resources build indexes based on list of URLs in a ROADS database Robot will use embedded Dublin Core if available
35
35 DESIRE - current work (2) Re-design of EWI robot - including: support for Dublin Core EWI records GILS-II compatible Allow users to search across subject services and the EWI using Z39.50 by converting ROADS records into GILS records by building a WHOIS++ to Z39.50 gateway http://roads.ukoln.ac.uk/cgi-bin/egwcgi/egwirtcl/targets.egw
36
36 NewsAgent Current awareness service for LIS... Distributed database servers at LITC, FD, UKOLN - Z39.50 metadata (and some full-text) based on DALI Mixture of content streams Variety of access methods Web, e-mail and Z39.50 clients user-configurable profiles http://www.ukoln.ac.uk/metadata/NewsAgent/
37
37 NewsAgent - Content Journals Program, VINE, Journal of Librarianship and Information Science News and briefing material LA, IIS, UKOLN (Ariadne), BL, LITC Web pages E-mail lists and USENET news
38
38 NewsAgent - Harvesting Web crawler looking for embedded Dublin Core Limiting the harvest –simple heuristics –use of Dublin Core Relation element E-mail parser http://www.ukoln.ac.uk/metadata/NewsAgent/dcusage.html
39
39 BIBLINK Information flow between publishers traditional new - CD-ROM or Web (new to publishing) and National Bibliographic Agencies British Library, UK Biblioteca Nacional, Madrid, Spain Bibliothèque Nationale de France, Paris Koninklijke Bibliotheek, Den Haag, Netherlands Nasjonalbiblioteket, Rana, Norway Universitat Oberta de Catalunya, Barcelona, Spain http://www.ukoln.ac.uk/metadata/BIBLINK/
40
40 BIBLINK - research Scope Electronic publications suitable for inclusion in National Bibliographies Metadata Dublin Core (with extensions!), SGML DTD Identifiers ISBN, ISSN, SICI, DOI, URN Transmission Simple e-mail or Web crawler Authentication MD5 hash assigned to each resource
41
41 BIBLINK - data set Minimum data set –Author, Title, Publisher, Place of Publication, Price, Extent (size), Keywords, Description, Edition/Version, Date of Publication, System Requirements, Format, Language, Terms and Conditions, Frequency, Identifier, Contributor, Checksum Similar to DC but some don’t fit… Issues over conversion to MARC
42
42 NBAs/National Libraries Publishers BIBLINK - demonstrator Dublin Core UNIMARC ??MARC E-mail Cataloguing in Publication(CIP) level records Conversion on to local MARC format using USEMARCON Enhanced records optionally returned to publishers
43
43 Conclusions Think about metadata as a ‘process’ Dublin Core syntax now stable enough to use Use within projects initially Choose metadata management model appropriate to your site Consider long term maintenance and transition to other formats
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.