© Tefko Saracevic, Rutgers University1 metadata considerations for digital libraries
© Tefko Saracevic, Rutgers University2 the Web fastest growing technology in history explosive growth of WWW provided –ubiquity of information and access –but also information chaos & anarchy growing difficulty in identifying, searching & retrieving ‘lost in an ocean’ metaphors
© Tefko Saracevic, Rutgers University3 problem to organize & search the Web needed: knowledge about the structure of data –but Web data & databases fuzzy –structures vary widely; no consistency –constantly evolve over time –lack of agreement about meaning of even simple terms & concepts in structure
© Tefko Saracevic, Rutgers University4 solution some standardized description or language to increase functionality –a mechanism for a more precise description of things on the Web going from machine-readable to machine-understandable –missing in original Web architecture METADATA !
© Tefko Saracevic, Rutgers University5 metadata
© Tefko Saracevic, Rutgers University6 what? metadata: ‘data about data’ –machine understandable information for the Web - emphasis on machine –description of what a text (or any object) part is all about e.g. labeling title, author, source … many evolving standards suggested to be applied in various domains
© Tefko Saracevic, Rutgers University7 where? in volatile digital environments –metadata describe electronic resources, texts & multimedia –metadata exist or have meaning only in relation to the referenced document or object provide information about the object
© Tefko Saracevic, Rutgers University8 why? to standardize description of what is what in electronic resources in order to aid in identification, organization, & location of a great variety to enable effective search of variety of objects (documents) distributed all over sometimes also to provide controls (e.g. validation, rights, provenance, ratings...)
© Tefko Saracevic, Rutgers University9 importance standard metadata descriptions are a prerequisite to –common use –effective searching –‘intelligent’ roaming by agents –validation, ratings,
© Tefko Saracevic, Rutgers University10 markup languages SGML - granddaddy (standard in 1986) –marks elements within documents derived from old markups for typesetting adapted by communities producing electronic documents machine independent - reason for success –transportable from one hardware & software to another; substitutes strings many extensions & specific applications
© Tefko Saracevic, Rutgers University11 principles ALL markup language must specify what markup means what markup is allowed what markup is required how markup is distinguished from text all markup languages & applications follow these principles underlying concepts are fairly simple but they get very confusing real fast.
© Tefko Saracevic, Rutgers University12 specifications types of documents defined by DTD Document Type Definitions –many types & applications formulated vary greatly in complexity and use RDF - Resource Description Framework –a common syntax, data model & scheme for describing
© Tefko Saracevic, Rutgers University13 extensions HTML - most famous & successful –allows for metatags in the Head not used much, even discouraged in the body could be indirect XML - the next big thing (hopefully) data format for structured document interchange & interoperability on WWW increases functionality of SGML & combines with ease of use of HTML
© Tefko Saracevic, Rutgers University14 who specifies standards? formal groups –national & international standards organizations - ISO, ANSI, NISO informal groups –WWW Consortium (W3C) –Dublin Core –Library of Congress
© Tefko Saracevic, Rutgers University15 proliferation currently: proliferation of metadata standards activities -many domains –a lot of confusion & incompatibility –in document description & libraries coordination through liaisons & a number of projects in the U.S & internatioanly –strength: domain experts involvement –weakness: limited perspective; re-invention
© Tefko Saracevic, Rutgers University16 libraries in libraries metadata has a very long tradition long preceding the Web (but not called metadata) –cataloging rules, standards MARC (Machine Readable Cataloging) enabled worldwide exchange of cataloging records but long standing problems with searching
© Tefko Saracevic, Rutgers University17 sample of projects Encoded Archival Description (EAD) Text Encoding Initiative (TEI) Federal Geographic Data Committee ( FGDC) - geospacial data Z39.50 standards - searching crosswalks: mapping e.g. DC to MARC
© Tefko Saracevic, Rutgers University18 Dublin Core (DC) international initiative to describe a core set of Web resources –a set of 15 elements Title; Creator; Subject; Description; Publisher; Contributor; Date; Type; Format; Identifier; Source; Language; Relation; Coverage; Rights wide interest & a lot of work but not widely applied on the Web
© Tefko Saracevic, Rutgers University19 library interoperability library catalogs bound by proprietary software & hardware middleware needed –protocols (based on Z39.50) provide for interaction of clients with many servers (catalogs) problems remain with semantic interoperability
© Tefko Saracevic, Rutgers University20 digitization metadata assignment (cataloging) a key component in digitization or electronic publishing choices: a spectrum of possibilities to select & apply metadata search for automation - e.g. templates connection with cataloging, indexing
© Tefko Saracevic, Rutgers University21 decisions, decision –how & what to plan for metadata creation in conjunction with dl? –target audience? –scope and depth? –what to adopt? plug-in in a scheme? –how to integrate metadata projects? –needed skills? training? staffing?
© Tefko Saracevic, Rutgers University22 $$$$ costs of metadata: HUGE –involved operations –time, personnel, effort –learning many new things included –making decisions complex & involved cooperative activities essential libraries pushed out of libraries
© Tefko Saracevic, Rutgers University23