Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dublin Core and metadata: a tutorial Lorcan Dempsey Andy Powell UKOLN, University of Bath (with a little help from our friends)

Similar presentations


Presentation on theme: "Dublin Core and metadata: a tutorial Lorcan Dempsey Andy Powell UKOLN, University of Bath (with a little help from our friends)"— Presentation transcript:

1 Dublin Core and metadata: a tutorial Lorcan Dempsey Andy Powell UKOLN, University of Bath (with a little help from our friends) http://www.ukoln.ac.uk/metadata

2 2 - Lux, 1-2 Dec 1997 Questions for you... Metadata EAD, CIMI, TEI PICS, XML, RDF MARC 856 Dublin Core you are geeks/people with sensible shoes goers/doers

3 3 - Lux, 1-2 Dec 1997 Overview UKOLN and metadata Metadata landscape Dublin Core Metadata management Interoperability Harvesting Future

4 4 - Lux, 1-2 Dec 1997 UKOLN and metadata ROADS subject gateways WHOIS++ templates BIBLINK CIP for electronic data Dublin Core (+ MARC) Desire WHOIS++, GILS, Dublin Core Z39.50/WHOIS++ NewsAgent current awareness, Ariadne Dublin Core, DC-dot MODELS collection description?? Agora PRIDE Initiatives

5 Metadata landscape

6 6 - Lux, 1-2 Dec 1997 What is metadata …? It’s just cataloguing, isn’t it … ? Yes and no … Data which supports operations carried out on information objects … –discover, buy,... In the company of strangers (Brody) Relieve user of having to have full advance knowledge of characteristics of resources … … variety

7 7 - Lux, 1-2 Dec 1997 Semantics, syntax, content MARC, ISO 2709, AACR2 Libraries MARCAACR2 Metadata model: the library example Picture by Stu Weibel

8 8 - Lux, 1-2 Dec 1997 Variety of formal and informal metadata models Museums GeospatialLibraries Internet Commons Commerce Whatever... Scientific Data Home Pages Picture by Stu Weibel

9 9 - Lux, 1-2 Dec 1997 Variety of operations... Discovery Location Selection fit for use Acquire terms Manipulate Exploit IPR Document Contextualise Preserve Manage dates, people, structures, … Agent/client access ….

10 10 - Lux, 1-2 Dec 1997 Variety of sectors... Curatorial traditions ‘cataloguing’/documentation libraries, archives, text archives, museums, geospatial data, etc Network resource discovery directory services, search engines, etc influence from computer science Network information management web developments, W3C, database sitemap, time to live,... pragmatic - market needs, vendor push

11 11 - Lux, 1-2 Dec 1997 Variety of creation models... Author/creator web pages? Repository/site manager effective disclosure better management Third party creator e.g. eLib subject gateways Library

12 12 - Lux, 1-2 Dec 1997 Metadata... Variety of metadata models syntax, semantics, content scope sectors/domains Variety of operations supported Variety of creation models Variety of architectures for disclosure/discovery Search and retrieve Disclosure/distribution Management … complex

13 13 - Lux, 1-2 Dec 1997 Some formats richer… semantics, structure, domain-specific,...

14 Dublin core in the metadata landscape

15 15 - Lux, 1-2 Dec 1997 Dublin Core Metadata model Simple element set focus on semantics - several target syntaxes Operations resource discovery on the web Explicitly cross sector/domain No constraint on creation model or application architecture FGDCMARCMuseum... Dublin Core … simple and intuitive

16 16 - Lux, 1-2 Dec 1997 Dublin core - why success? Simple Coincides with strategic needs in each of sectors we identified –Curatorial: semantic interoperability between richer metadata models –Resource discovery: a simple format for descriptive metadata (DLOs) –Web management: associate metadata with Web resources Inclusive (countries/domains/traditions) Stu Weibel

17 Introduction to Dublin Core

18 18 - Lux, 1-2 Dec 1997 Dublin Core - elements Title Subject Description Creator Publisher Contributor Date Type Format Identifier Source Language Relation Coverage Rights 15 element core metadata set

19 19 - Lux, 1-2 Dec 1997 Dublin Core - HTML Example UKOLN Home Page...

20 Management

21 21 - Lux, 1-2 Dec 1997 Data creation Practical issues of using Dublin Core for Internet resource description... UKOLN metadata system Requirements 3 models for metadata management Implementation at UKOLN

22 22 - Lux, 1-2 Dec 1997 UKOLN metadata system requirements Easy to use Work with a variety of methods of creating HTML Simple migration to future metadata formats Separate metadata from resource

23 23 - Lux, 1-2 Dec 1997 Managing Dublin Core (1) HTML Authoring tool Pros… Simple May be useful for training and familiarisation Cons… May not be possible with all editors Maintenance problems Easy to make errors Embed by hand using HTML or text editor

24 24 - Lux, 1-2 Dec 1997 DC-dot A Web based tool for creating Dublin Core tags Automatic generation of some tags based on content of the resource Forms based editing of tags Cut-and-paste output into HTML Conversion to other formats… SOIF, ROADS/WHOIS++, USMARC, GILS... http://www.ukoln.ac.uk/metadata/dcdot/ Run demo

25 25 - Lux, 1-2 Dec 1997 Managing Dublin Core (2) Web-site management tool Pros… Use of Web-site management tools likely to increase Object-oriented database approach Cons… Proprietry formats Early days - too early to evaluate use for metadata yet? Use Web-site management tool, for example NetObjects Fusion

26 26 - Lux, 1-2 Dec 1997 Managing Dublin Core (3) On the fly generation Pros… Separates metadata from resource Future migration fairly simple Cons… Performance Lack of integration with HTML tools Server specific Hold Dublin Core separately and embed on-the-fly using server-side include (SSI)

27 27 - Lux, 1-2 Dec 1997 UKOLN metadata system (1) Embed on-the-fly Apache SSI script Store metadata using SOIF records Use MS-Access as tool to create the records Associate metadata with resource by co-locating them in the Web server filestore

28 28 - Lux, 1-2 Dec 1997 UKOLN metadata system (2) MS-Access Database HTML editor …... …... intro.html @FILE { http://www.ukoln.ac.... keywords{13}: xxx, yyy, zzz description{14}: blah blah b author{13}: Stark, Isobel... } @FILE { http://www.ukoln.ac.... keywords{13}: xxx, yyy, zzz description{14}: blah blah b author{13}: Stark, Isobel... } intro.html.soif Apache syntax for calling server-side script

29 29 - Lux, 1-2 Dec 1997 UKOLN metadata system (3) MS-Access front end... Filename browser Text boxes Name choosers UKOLN specific metadata

30 30 - Lux, 1-2 Dec 1997 UKOLN metadata system (4) UKOLN Web server …... …... intro.html intro.html.soif SSI script 2 3 4 5 6 1 @FILE { http://www.ukoln.ac.... keywords{13}: xxx, yyy, zzz description{14}: blah blah b author{13}: Stark, Isobel... } @FILE { http://www.ukoln.ac.... keywords{13}: xxx, yyy, zzz description{14}: blah blah b author{13}: Stark, Isobel... } Web robot

31 31 - Lux, 1-2 Dec 1997 Issues Performance Interaction with Web caches Dublin Core vs Alta Vista style metadata Granularity Which pages should have metadata?

32 A short history: Dublin to Helsinki We have borrowed some of this material from Stu Weibel, with permission

33 33 - Lux, 1-2 Dec 1997 Dublin Core Workshop Series.. DC-1: OCLC/NCSA Metadata Workshop Mar, 1995 Limited Scope: Discovery of document-like objects 13 element Dublin Core Interdisciplinary consensus DC-2: OCLC/UKOLN Warwick Workshop April, 1996 Warwick Framework - modularity Syntax issues

34 34 - Lux, 1-2 Dec 1997.. Dublin Core Workshop Series DC-3: CNI/OCLC Image Metadata Workshop, Sep, 1996 Images are in scope 15 element core; some element name changes DC-4: Canberra Metadata Workshop Mar, 1997 Minimalists and Structuralists Canberra Qualifiers (additional information useful for interpretation of metadata)

35 35 - Lux, 1-2 Dec 1997 Dublin core - qualifiers Language of element value Scheme specifies a context for interpretation Sub-element specifies a facet - narrows <META NAME="DC.Creator.Address" CONTENT=“l.dempsey@ukoln.ac.uk">

36 36 - Lux, 1-2 Dec 1997 DC-5 DC-5: National Library of Finland/OCLC Workshop, October 1997 –Formal Data Model (expressed in RDF) –many other problems are hereby made simpler –Resource Description Framework –The return of modularity –Finnish finish (of unqualified DC) –minimalist DC is done and will not be changed –Semantics for additional sub-structure –a small number of sub-elements will be established –Closer DC-W3C collaboration

37 37 - Lux, 1-2 Dec 1997 Working groups Data Model date, relationship, source what is a resource? 1:1 RDF Relationships Typology Sub-elements Date

38 38 - Lux, 1-2 Dec 1997 RFCs in preparation Simple DC semantics (the minimalist position) Simple DC syntax for embedded HTML DC semantics with qualifiers DC syntax with qualifiers HTML 2.0 HTML 4.0 RDF

39 Dublin Core implementation

40 40 - Lux, 1-2 Dec 1997 Projects 30 projects; 10 countries http://purl.org/metadata/dublin_core/projects.html “Interdisciplinary and international recognition as the lingua franca for resource discovery metadata for electronic resources” Stu Weibel Support for use for non-digital objects

41 41 - Lux, 1-2 Dec 1997 The HTML 2.0 “kludge” Convention for simple embedded metadata Bootstrapping early Dublin Core deployments META tags and standard HTML syntax Useful for simple metadata without qualifiers Can support Dublin Core qualifiers, but with risks for interoperability and indexing purity <META NAME="DC.Subject" CONTENT="(SCHEME=LCSH) Information technology -- higher education">

42 42 - Lux, 1-2 Dec 1997 HTML 4.0 - DC influences the web Richer tag attributes LANG (language of the metadata) SCHEME (formal qualifier) SUB-ELEMENTS (dot syntax extensions) Allows syntactically “clean” implementation of metadata with qualifiers <META NAME="DC.Subject" SCHEME="LCSH" CONTENT="Information technology -- higher education">

43 43 - Lux, 1-2 Dec 1997 Some quick statistics UK (academic sites only) Total pages: ~1.5M (a guess!) Embedded DC: ‘a few hundred’ http://www.cs.ukc.ac.uk/people/staff/djb1/ Sweden Total pages: 1.4M Embedded DC: ‘a few dozen’ http://www.lub.lu.se/nwiPaper/ Information provided by Dave Beckett Information provided by Sigfrid Lundburg

44 Interoperability

45 45 - Lux, 1-2 Dec 1997 Interoperability What do we mean by interoperability? Issues Z39.50 and Dublin Core Metadata registries

46 46 - Lux, 1-2 Dec 1997 Interoperability? Unify access to data in different domains - Web, library, museums, archives,... Issues Protocols - Z39.50, WHOIS++, … –gateways Attribute names - author/creator/... –Semantic interoperability - mapping tables Format of results –format converters In real life these can all get mixed up

47 47 - Lux, 1-2 Dec 1997 Protocol Gateways - an example ZEXI - a Z39.50 to WHOIS++ gateway Based on CNIDR's Isite Accepts Z39.50 searches Converts them to WHOIS++ Returns SUTRS records http://roads.ukoln.ac.uk/cgi-bin/egwcgi/egwirtcl/targets.egw

48 48 - Lux, 1-2 Dec 1997 Attribute names Different databases may use different ‘names’ for the same thing ‘creator’ vs ‘author’ Need to be able to construct searches that ‘work’ against different databases irrespective of the ‘names’ in use Dublin Core provides a minimal set of agreed ‘names’ with which we can construct searches

49 49 - Lux, 1-2 Dec 1997 Format of results Different databases may return results in different formats USMARC, GRS-1, SUTRS, IAFA,... Early stages of searching ideally need results to be returned in single ‘simple’ format Dublin Core provides a minimal set of agreed data elements with which we can construct results

50 50 - Lux, 1-2 Dec 1997 Z39.50 and DC - searching Version 2 Searches phrased in terms of single attribute set only Either need to –add DC attributes to Bib-1 –map DC to Bib-1 Version 3 Multiple attribute sets allowed for searching New simple DC attribute set to be proposed Other attributes taken from Bib-1 http://cypress.dev.oclc.org:12345/~rrl/docs/dublincoreandz3950.html

51 51 - Lux, 1-2 Dec 1997 Z39.50 and DC - retrieval To return Dublin Core ‘records’ using Z39.50… use GRS-1 (General Record Syntax) elements are assigned tags DC elements have been added to tagset-G

52 52 - Lux, 1-2 Dec 1997 Format conversion - issues Simple to rich, e.g. DC to MARC May not generate valid rich record without manual enhancement Use of DC qualifiers required for decent MARC record Rich to simple, e.g. MARC to DC Loss of data

53 53 - Lux, 1-2 Dec 1997 Metadata registries Semantics Agreement on element meanings Agreement on enumerated lists Qualifiers Thesaurus naming Publishing existing metadata sets Re-use by others - prevent duplication of work e.g. Administrative metadata

54 54 - Lux, 1-2 Dec 1997 Some pointers Mapping tables http://www.ukoln.ac.uk/metadata/interoperability/ Software General http://www.ukoln.ac.uk/metadata/software-tools/ d2m : Dublin Core to MARC converter http://www.bibsys.no/meta/d2m/ USEMARCON http://www2.echo.lu/libraries/en/projects/usemarc.html

55 Harvesting

56 56 - Lux, 1-2 Dec 1997 Harvesting Dublin Core General Issues Building a Web index Harvest and NWI Building a ‘local’ search engine Harvest, SWISH-E, Isite, Zebra DC as cataloguer’s aid

57 57 - Lux, 1-2 Dec 1997 Harvesting - issues Mappings Multiple element values Multiple languages Complex data values e.g. DC.Date, DC.Coverage SCHEMES

58 58 - Lux, 1-2 Dec 1997 Harvesting - issues Frames Harvesting non-embedded metadata HTML 3.2 vs HTML 4.0 Hidden pages Controlling the robot

59 59 - Lux, 1-2 Dec 1997 Harvest Resource discovery suite of tools - robot, summarisers, indexers SOIF records Supports a variety of indexers Supports database brokerage model CGI based user-interface UKOLN’s HTML summariser is Dublin Core aware http://www.tardis.ed.ac.uk/harvest/

60 60 - Lux, 1-2 Dec 1997 Nordic Web Index Custom robot - NWI/Combine Dublin Core aware GILS-II records Indexed using Zebra Searched using Z39.50 User interface based on Europagate http://nwi.ub2.lu.se/?lang=uk

61 61 - Lux, 1-2 Dec 1997 Other software SWISH-E system for indexing local collections of Web pages or other text files http://sunsite.berkeley.edu/SWISH-E/ Isite text indexer (Isearch) and Z39.50 http://www.cnidr.org/ir/isite.html Zebra text indexer and Z39.50 http://www.indexdata.dk

62 62 - Lux, 1-2 Dec 1997 DC as cataloguer’s aid ROADS Software to create, manage and search Internet resource descriptions WHOIS++ Records created manually Pump-prime’ metadata record with values based on embedded DC using robot http://www.ukoln.ac.uk/roads/

63 63 - Lux, 1-2 Dec 1997 DC as cataloguer’s aid BIBLINK Flow of information from publishers to National Bibliographic Agencies MARC based catalogues of electronic publications Initial MARC record based on DC description supplied by publisher using email http://www.ukoln.ac.uk/metadata/BIBLINK/

64 64 - Lux, 1-2 Dec 1997 Building a Web index Centralised databases Harvest, database brokerage Multiple databases - parallel NWI - Z39.50 Multiple databases - query routing WHOIS++ Common Indexing Protocol, centroids

65 65 - Lux, 1-2 Dec 1997 Architecture - centralised Robot Client Typically Web/CGI based Central database One or more Web robots

66 66 - Lux, 1-2 Dec 1997 Architecture - Harvest Robot Web/CGI based client Web/CGI based client Gatherers Brokers SOIF records in central database

67 67 - Lux, 1-2 Dec 1997 Architecture - multiple databases Robot Client Might be Web/CGI based Separate databases Access protocol Z39.50 or WHOIS++

68 68 - Lux, 1-2 Dec 1997 WHOIS++ - Query routing Centroid generated by database C contains… “you’ll find the string ‘mona’ in the ‘title’ attribute of at least one record in this database”. CGI-based WHOIS++ client Database A CIP sharing of centroids Web browser 1 2 3 4 5 6 Database B Database C

69 Dublin Core - critique

70 70 - Lux, 1-2 Dec 1997 Limits In development Syntax Simple Discovery Document like objects Weak model Administrative metadata Addressed in Helsinki

71 Futures The material on RDF has been adapted from Stu Weibel’s material, with permission

72 72 - Lux, 1-2 Dec 1997 Dublin Core futures Internal Syntax and semantics External environment

73 73 - Lux, 1-2 Dec 1997 Syntax HTML 2, HTML 4, RDF,... RDF - W3C (World Wide Web Consortium) initiative “RDF is the realization of the Warwick Framework for the Web” RDF will be the foundation for an architecture for metadata on the Web Resource description Electronic commerce Site mapping Third party rating Digital signatures

74 74 - Lux, 1-2 Dec 1997 RDF: Why is it important? RDF provides a coherent data model and syntactical framework for ‘plug-n-play’ metadata the semantics and structure of metadata packages will be determined by stakeholder communities via independently developed and maintained metadata element sets e.g.: MARC, DC, TEI, GILS, CIMI, Ratings…. Political imperatives for deployment Software infrastructure will be ubiquitous (and come for free in browsers and servers)

75 75 - Lux, 1-2 Dec 1997 Semantics Tension simple vs complex generic vs specific interoperability vs selfstanding Development relationship sub-elements scheme

76 76 - Lux, 1-2 Dec 1997 Environment ‘Save the time of the user’ Diverse resources Broker/middleware/ gateway/trading place/… Variety of protocols and metadata models DC simple - volume ‘shallow’ - interop

77 77 - Lux, 1-2 Dec 1997 Further Information Dublin Core Home Page http://purl.org/metadata/dublin_core W3 Metadata Overview and RDF Working Group Home Page http://www.w3.org/Metadata/RDF UKOLN metadata page http://www.ukoln.ac.uk/metadata/


Download ppt "Dublin Core and metadata: a tutorial Lorcan Dempsey Andy Powell UKOLN, University of Bath (with a little help from our friends)"

Similar presentations


Ads by Google