Download presentation
Presentation is loading. Please wait.
Published byShannon Watkins Modified over 9 years ago
1
Dublin Core and metadata: a tutorial Lorcan Dempsey Andy Powell UKOLN, University of Bath (with a little help from our friends) http://www.ukoln.ac.uk/metadata
2
2 - Lux, 1-2 Dec 1997 Questions for you... Metadata EAD, CIMI, TEI PICS, XML, RDF MARC 856 Dublin Core you are geeks/people with sensible shoes goers/doers
3
3 - Lux, 1-2 Dec 1997 Overview UKOLN and metadata Metadata landscape Dublin Core Metadata management Interoperability Harvesting Future
4
4 - Lux, 1-2 Dec 1997 UKOLN and metadata ROADS subject gateways WHOIS++ templates BIBLINK CIP for electronic data Dublin Core (+ MARC) Desire WHOIS++, GILS, Dublin Core Z39.50/WHOIS++ NewsAgent current awareness, Ariadne Dublin Core, DC-dot MODELS collection description?? Agora PRIDE Initiatives
5
Metadata landscape
6
6 - Lux, 1-2 Dec 1997 What is metadata …? It’s just cataloguing, isn’t it … ? Yes and no … Data which supports operations carried out on information objects … –discover, buy,... In the company of strangers (Brody) Relieve user of having to have full advance knowledge of characteristics of resources … … variety
7
7 - Lux, 1-2 Dec 1997 Semantics, syntax, content MARC, ISO 2709, AACR2 Libraries MARCAACR2 Metadata model: the library example Picture by Stu Weibel
8
8 - Lux, 1-2 Dec 1997 Variety of formal and informal metadata models Museums GeospatialLibraries Internet Commons Commerce Whatever... Scientific Data Home Pages Picture by Stu Weibel
9
9 - Lux, 1-2 Dec 1997 Variety of operations... Discovery Location Selection fit for use Acquire terms Manipulate Exploit IPR Document Contextualise Preserve Manage dates, people, structures, … Agent/client access ….
10
10 - Lux, 1-2 Dec 1997 Variety of sectors... Curatorial traditions ‘cataloguing’/documentation libraries, archives, text archives, museums, geospatial data, etc Network resource discovery directory services, search engines, etc influence from computer science Network information management web developments, W3C, database sitemap, time to live,... pragmatic - market needs, vendor push
11
11 - Lux, 1-2 Dec 1997 Variety of creation models... Author/creator web pages? Repository/site manager effective disclosure better management Third party creator e.g. eLib subject gateways Library
12
12 - Lux, 1-2 Dec 1997 Metadata... Variety of metadata models syntax, semantics, content scope sectors/domains Variety of operations supported Variety of creation models Variety of architectures for disclosure/discovery Search and retrieve Disclosure/distribution Management … complex
13
13 - Lux, 1-2 Dec 1997 Some formats richer… semantics, structure, domain-specific,...
14
Dublin core in the metadata landscape
15
15 - Lux, 1-2 Dec 1997 Dublin Core Metadata model Simple element set focus on semantics - several target syntaxes Operations resource discovery on the web Explicitly cross sector/domain No constraint on creation model or application architecture FGDCMARCMuseum... Dublin Core … simple and intuitive
16
16 - Lux, 1-2 Dec 1997 Dublin core - why success? Simple Coincides with strategic needs in each of sectors we identified –Curatorial: semantic interoperability between richer metadata models –Resource discovery: a simple format for descriptive metadata (DLOs) –Web management: associate metadata with Web resources Inclusive (countries/domains/traditions) Stu Weibel
17
Introduction to Dublin Core
18
18 - Lux, 1-2 Dec 1997 Dublin Core - elements Title Subject Description Creator Publisher Contributor Date Type Format Identifier Source Language Relation Coverage Rights 15 element core metadata set
19
19 - Lux, 1-2 Dec 1997 Dublin Core - HTML Example UKOLN Home Page...
20
Management
21
21 - Lux, 1-2 Dec 1997 Data creation Practical issues of using Dublin Core for Internet resource description... UKOLN metadata system Requirements 3 models for metadata management Implementation at UKOLN
22
22 - Lux, 1-2 Dec 1997 UKOLN metadata system requirements Easy to use Work with a variety of methods of creating HTML Simple migration to future metadata formats Separate metadata from resource
23
23 - Lux, 1-2 Dec 1997 Managing Dublin Core (1) HTML Authoring tool Pros… Simple May be useful for training and familiarisation Cons… May not be possible with all editors Maintenance problems Easy to make errors Embed by hand using HTML or text editor
24
24 - Lux, 1-2 Dec 1997 DC-dot A Web based tool for creating Dublin Core tags Automatic generation of some tags based on content of the resource Forms based editing of tags Cut-and-paste output into HTML Conversion to other formats… SOIF, ROADS/WHOIS++, USMARC, GILS... http://www.ukoln.ac.uk/metadata/dcdot/ Run demo
25
25 - Lux, 1-2 Dec 1997 Managing Dublin Core (2) Web-site management tool Pros… Use of Web-site management tools likely to increase Object-oriented database approach Cons… Proprietry formats Early days - too early to evaluate use for metadata yet? Use Web-site management tool, for example NetObjects Fusion
26
26 - Lux, 1-2 Dec 1997 Managing Dublin Core (3) On the fly generation Pros… Separates metadata from resource Future migration fairly simple Cons… Performance Lack of integration with HTML tools Server specific Hold Dublin Core separately and embed on-the-fly using server-side include (SSI)
27
27 - Lux, 1-2 Dec 1997 UKOLN metadata system (1) Embed on-the-fly Apache SSI script Store metadata using SOIF records Use MS-Access as tool to create the records Associate metadata with resource by co-locating them in the Web server filestore
28
28 - Lux, 1-2 Dec 1997 UKOLN metadata system (2) MS-Access Database HTML editor …... …... intro.html @FILE { http://www.ukoln.ac.... keywords{13}: xxx, yyy, zzz description{14}: blah blah b author{13}: Stark, Isobel... } @FILE { http://www.ukoln.ac.... keywords{13}: xxx, yyy, zzz description{14}: blah blah b author{13}: Stark, Isobel... } intro.html.soif Apache syntax for calling server-side script
29
29 - Lux, 1-2 Dec 1997 UKOLN metadata system (3) MS-Access front end... Filename browser Text boxes Name choosers UKOLN specific metadata
30
30 - Lux, 1-2 Dec 1997 UKOLN metadata system (4) UKOLN Web server …... …... intro.html intro.html.soif SSI script 2 3 4 5 6 1 @FILE { http://www.ukoln.ac.... keywords{13}: xxx, yyy, zzz description{14}: blah blah b author{13}: Stark, Isobel... } @FILE { http://www.ukoln.ac.... keywords{13}: xxx, yyy, zzz description{14}: blah blah b author{13}: Stark, Isobel... } Web robot
31
31 - Lux, 1-2 Dec 1997 Issues Performance Interaction with Web caches Dublin Core vs Alta Vista style metadata Granularity Which pages should have metadata?
32
A short history: Dublin to Helsinki We have borrowed some of this material from Stu Weibel, with permission
33
33 - Lux, 1-2 Dec 1997 Dublin Core Workshop Series.. DC-1: OCLC/NCSA Metadata Workshop Mar, 1995 Limited Scope: Discovery of document-like objects 13 element Dublin Core Interdisciplinary consensus DC-2: OCLC/UKOLN Warwick Workshop April, 1996 Warwick Framework - modularity Syntax issues
34
34 - Lux, 1-2 Dec 1997.. Dublin Core Workshop Series DC-3: CNI/OCLC Image Metadata Workshop, Sep, 1996 Images are in scope 15 element core; some element name changes DC-4: Canberra Metadata Workshop Mar, 1997 Minimalists and Structuralists Canberra Qualifiers (additional information useful for interpretation of metadata)
35
35 - Lux, 1-2 Dec 1997 Dublin core - qualifiers Language of element value Scheme specifies a context for interpretation Sub-element specifies a facet - narrows <META NAME="DC.Creator.Address" CONTENT=“l.dempsey@ukoln.ac.uk">
36
36 - Lux, 1-2 Dec 1997 DC-5 DC-5: National Library of Finland/OCLC Workshop, October 1997 –Formal Data Model (expressed in RDF) –many other problems are hereby made simpler –Resource Description Framework –The return of modularity –Finnish finish (of unqualified DC) –minimalist DC is done and will not be changed –Semantics for additional sub-structure –a small number of sub-elements will be established –Closer DC-W3C collaboration
37
37 - Lux, 1-2 Dec 1997 Working groups Data Model date, relationship, source what is a resource? 1:1 RDF Relationships Typology Sub-elements Date
38
38 - Lux, 1-2 Dec 1997 RFCs in preparation Simple DC semantics (the minimalist position) Simple DC syntax for embedded HTML DC semantics with qualifiers DC syntax with qualifiers HTML 2.0 HTML 4.0 RDF
39
Dublin Core implementation
40
40 - Lux, 1-2 Dec 1997 Projects 30 projects; 10 countries http://purl.org/metadata/dublin_core/projects.html “Interdisciplinary and international recognition as the lingua franca for resource discovery metadata for electronic resources” Stu Weibel Support for use for non-digital objects
41
41 - Lux, 1-2 Dec 1997 The HTML 2.0 “kludge” Convention for simple embedded metadata Bootstrapping early Dublin Core deployments META tags and standard HTML syntax Useful for simple metadata without qualifiers Can support Dublin Core qualifiers, but with risks for interoperability and indexing purity <META NAME="DC.Subject" CONTENT="(SCHEME=LCSH) Information technology -- higher education">
42
42 - Lux, 1-2 Dec 1997 HTML 4.0 - DC influences the web Richer tag attributes LANG (language of the metadata) SCHEME (formal qualifier) SUB-ELEMENTS (dot syntax extensions) Allows syntactically “clean” implementation of metadata with qualifiers <META NAME="DC.Subject" SCHEME="LCSH" CONTENT="Information technology -- higher education">
43
43 - Lux, 1-2 Dec 1997 Some quick statistics UK (academic sites only) Total pages: ~1.5M (a guess!) Embedded DC: ‘a few hundred’ http://www.cs.ukc.ac.uk/people/staff/djb1/ Sweden Total pages: 1.4M Embedded DC: ‘a few dozen’ http://www.lub.lu.se/nwiPaper/ Information provided by Dave Beckett Information provided by Sigfrid Lundburg
44
Interoperability
45
45 - Lux, 1-2 Dec 1997 Interoperability What do we mean by interoperability? Issues Z39.50 and Dublin Core Metadata registries
46
46 - Lux, 1-2 Dec 1997 Interoperability? Unify access to data in different domains - Web, library, museums, archives,... Issues Protocols - Z39.50, WHOIS++, … –gateways Attribute names - author/creator/... –Semantic interoperability - mapping tables Format of results –format converters In real life these can all get mixed up
47
47 - Lux, 1-2 Dec 1997 Protocol Gateways - an example ZEXI - a Z39.50 to WHOIS++ gateway Based on CNIDR's Isite Accepts Z39.50 searches Converts them to WHOIS++ Returns SUTRS records http://roads.ukoln.ac.uk/cgi-bin/egwcgi/egwirtcl/targets.egw
48
48 - Lux, 1-2 Dec 1997 Attribute names Different databases may use different ‘names’ for the same thing ‘creator’ vs ‘author’ Need to be able to construct searches that ‘work’ against different databases irrespective of the ‘names’ in use Dublin Core provides a minimal set of agreed ‘names’ with which we can construct searches
49
49 - Lux, 1-2 Dec 1997 Format of results Different databases may return results in different formats USMARC, GRS-1, SUTRS, IAFA,... Early stages of searching ideally need results to be returned in single ‘simple’ format Dublin Core provides a minimal set of agreed data elements with which we can construct results
50
50 - Lux, 1-2 Dec 1997 Z39.50 and DC - searching Version 2 Searches phrased in terms of single attribute set only Either need to –add DC attributes to Bib-1 –map DC to Bib-1 Version 3 Multiple attribute sets allowed for searching New simple DC attribute set to be proposed Other attributes taken from Bib-1 http://cypress.dev.oclc.org:12345/~rrl/docs/dublincoreandz3950.html
51
51 - Lux, 1-2 Dec 1997 Z39.50 and DC - retrieval To return Dublin Core ‘records’ using Z39.50… use GRS-1 (General Record Syntax) elements are assigned tags DC elements have been added to tagset-G
52
52 - Lux, 1-2 Dec 1997 Format conversion - issues Simple to rich, e.g. DC to MARC May not generate valid rich record without manual enhancement Use of DC qualifiers required for decent MARC record Rich to simple, e.g. MARC to DC Loss of data
53
53 - Lux, 1-2 Dec 1997 Metadata registries Semantics Agreement on element meanings Agreement on enumerated lists Qualifiers Thesaurus naming Publishing existing metadata sets Re-use by others - prevent duplication of work e.g. Administrative metadata
54
54 - Lux, 1-2 Dec 1997 Some pointers Mapping tables http://www.ukoln.ac.uk/metadata/interoperability/ Software General http://www.ukoln.ac.uk/metadata/software-tools/ d2m : Dublin Core to MARC converter http://www.bibsys.no/meta/d2m/ USEMARCON http://www2.echo.lu/libraries/en/projects/usemarc.html
55
Harvesting
56
56 - Lux, 1-2 Dec 1997 Harvesting Dublin Core General Issues Building a Web index Harvest and NWI Building a ‘local’ search engine Harvest, SWISH-E, Isite, Zebra DC as cataloguer’s aid
57
57 - Lux, 1-2 Dec 1997 Harvesting - issues Mappings Multiple element values Multiple languages Complex data values e.g. DC.Date, DC.Coverage SCHEMES
58
58 - Lux, 1-2 Dec 1997 Harvesting - issues Frames Harvesting non-embedded metadata HTML 3.2 vs HTML 4.0 Hidden pages Controlling the robot
59
59 - Lux, 1-2 Dec 1997 Harvest Resource discovery suite of tools - robot, summarisers, indexers SOIF records Supports a variety of indexers Supports database brokerage model CGI based user-interface UKOLN’s HTML summariser is Dublin Core aware http://www.tardis.ed.ac.uk/harvest/
60
60 - Lux, 1-2 Dec 1997 Nordic Web Index Custom robot - NWI/Combine Dublin Core aware GILS-II records Indexed using Zebra Searched using Z39.50 User interface based on Europagate http://nwi.ub2.lu.se/?lang=uk
61
61 - Lux, 1-2 Dec 1997 Other software SWISH-E system for indexing local collections of Web pages or other text files http://sunsite.berkeley.edu/SWISH-E/ Isite text indexer (Isearch) and Z39.50 http://www.cnidr.org/ir/isite.html Zebra text indexer and Z39.50 http://www.indexdata.dk
62
62 - Lux, 1-2 Dec 1997 DC as cataloguer’s aid ROADS Software to create, manage and search Internet resource descriptions WHOIS++ Records created manually Pump-prime’ metadata record with values based on embedded DC using robot http://www.ukoln.ac.uk/roads/
63
63 - Lux, 1-2 Dec 1997 DC as cataloguer’s aid BIBLINK Flow of information from publishers to National Bibliographic Agencies MARC based catalogues of electronic publications Initial MARC record based on DC description supplied by publisher using email http://www.ukoln.ac.uk/metadata/BIBLINK/
64
64 - Lux, 1-2 Dec 1997 Building a Web index Centralised databases Harvest, database brokerage Multiple databases - parallel NWI - Z39.50 Multiple databases - query routing WHOIS++ Common Indexing Protocol, centroids
65
65 - Lux, 1-2 Dec 1997 Architecture - centralised Robot Client Typically Web/CGI based Central database One or more Web robots
66
66 - Lux, 1-2 Dec 1997 Architecture - Harvest Robot Web/CGI based client Web/CGI based client Gatherers Brokers SOIF records in central database
67
67 - Lux, 1-2 Dec 1997 Architecture - multiple databases Robot Client Might be Web/CGI based Separate databases Access protocol Z39.50 or WHOIS++
68
68 - Lux, 1-2 Dec 1997 WHOIS++ - Query routing Centroid generated by database C contains… “you’ll find the string ‘mona’ in the ‘title’ attribute of at least one record in this database”. CGI-based WHOIS++ client Database A CIP sharing of centroids Web browser 1 2 3 4 5 6 Database B Database C
69
Dublin Core - critique
70
70 - Lux, 1-2 Dec 1997 Limits In development Syntax Simple Discovery Document like objects Weak model Administrative metadata Addressed in Helsinki
71
Futures The material on RDF has been adapted from Stu Weibel’s material, with permission
72
72 - Lux, 1-2 Dec 1997 Dublin Core futures Internal Syntax and semantics External environment
73
73 - Lux, 1-2 Dec 1997 Syntax HTML 2, HTML 4, RDF,... RDF - W3C (World Wide Web Consortium) initiative “RDF is the realization of the Warwick Framework for the Web” RDF will be the foundation for an architecture for metadata on the Web Resource description Electronic commerce Site mapping Third party rating Digital signatures
74
74 - Lux, 1-2 Dec 1997 RDF: Why is it important? RDF provides a coherent data model and syntactical framework for ‘plug-n-play’ metadata the semantics and structure of metadata packages will be determined by stakeholder communities via independently developed and maintained metadata element sets e.g.: MARC, DC, TEI, GILS, CIMI, Ratings…. Political imperatives for deployment Software infrastructure will be ubiquitous (and come for free in browsers and servers)
75
75 - Lux, 1-2 Dec 1997 Semantics Tension simple vs complex generic vs specific interoperability vs selfstanding Development relationship sub-elements scheme
76
76 - Lux, 1-2 Dec 1997 Environment ‘Save the time of the user’ Diverse resources Broker/middleware/ gateway/trading place/… Variety of protocols and metadata models DC simple - volume ‘shallow’ - interop
77
77 - Lux, 1-2 Dec 1997 Further Information Dublin Core Home Page http://purl.org/metadata/dublin_core W3 Metadata Overview and RDF Working Group Home Page http://www.w3.org/Metadata/RDF UKOLN metadata page http://www.ukoln.ac.uk/metadata/
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.