Cataloging the Internet Successes and the Future
* "The World Wide Web (or the Web) is a system of interlinked, hypertext documents that runs over the Internet. With a Web browser, a user views Web pages that may contain text, images, and other multimedia and navigates between them using hyperlinks." (Wikipedia)
World Wide Web is governed by: Three standards: URL – Uniform Resource Locator HTTP – HyperText Transfer Protocol HTML – HyperText Markup Language But, contains no Knowledge Infrastructure No Classification System Data is unstructured No content information linked to data
Internet Retrieval Systems Vary widely in design Built upon different protocols Lack controlled vocabularies No authority control Incompatible with other systems
Subject Gateways A Cataloger’s Approach Based upon resource description Supports systematic resource discovery through the application of traditional library tools Taxonomies, usually subject-based Thesauri Controlled vocabularies Entry vocabularies Authority control Metadata – “Data about data”
Dublin Core Used in OCLC’s CORC Captures 13 characteristics about the data: Title Creator Subject Description Publisher Date Language May be mapped into MARC Records The emergence of a metadata quality vocabulary could allow its use as a quality assurance of web site content.
Practical Issues Consistent information in MARC Records URL maintenance – Link checking Web page movement Web page changes in content
Globalization of Information Demand for interoperability of: Thesauri Authority Records Controlled vocabularies New platform technologies Resource Description Framework (RDF) Unicode Standard International standards for all library tools