1 CS 430: Information Discovery Lecture 5 Descriptive Metadata 1 Libraries Catalogs Dublin Core
2 Course Administration
3 Descriptive Metadata Catalog: metadata records that have a consistent structure, organized according to systematic rules. Abstract: a free text record that summarizes a longer document. Indexing record: less formal than a catalog record, but more structure than a simple abstract. Some methods of information discovery search descriptive metadata about the objects. Metadata typically consists of a catalog or indexing record, or an abstract, one record for each object.
4 Descriptive Metadata Usually stored separately from the objects that it describes, but sometimes is embedded in the objects. Usually the metadata is a set of text fields. Textual metadata can be used to describe non-textual objects, e.g., software, images, music
5 Descriptive metadata Information discovery is often most effective when applied to metadata rather than raw information Allows fielded searching author = "Goethe" Suitable for non-textual material type = "picture" and subject = "Ithaca" Can be used with controlled vocabulary language = "en"
6 Origins of Library Catalogs Bibliographic Objective: To bring together like items To differentiate among similar ones Sir Anthony Panizzi, Keeper of Books at the British Museum ( ). His Ninety-One Rules (1841) were the basis of modern catalogue rules.
7 Origins of Library Catalogs Information Discovery: to enable a person to find a book of which either the author, title or subject is known to show what the library has by a given author, on a given subject, or in a given kind of literature to assist in the choice of a book as to its edition (bibliographically) or to its character (literary or topical). Charles Ammi Cutter Librarian of the Boston Athenaeum Rules for a Dictionary Catalog, 1874
8 Origins of Library Catalogs Classification: Division of subject matter into a hierarchy. Typically used in libraries to provided a subject- based order for shelving books. Melvil Dewey Acting Librarian of Amherst College (1874) Dewey Decimal system of book classification, uses the numbers 000 to 999 to cover the general fields of knowledge and decimals to fit special subjects.
9 Technology Materials to be catalogued: Originally books Extended to serials, maps, music, etc., but concepts still rely heavily on experience with books Form of catalog: Entries in books (Panizzi) Index cards (Cutter) Online databases (Kilgour) [Library Cataloguing will be continued in Lecture 6.]
10 Catalogs as Investments Costs: Conventional Catalog Records are created by skilled librarians. (cost estimate $100 per record). OCLC's catalog has 43 million records. Total investment is several billion dollars. Cataloguing Standards: Enable libraries to share records Combine records of the past with records created today Allow readers and librarians to move between libraries
11 Dublin Core Simple set of metadata elements for online information 15 basic elements intended for all types and genres of material all elements optional all elements repeatable Developed by an international group chaired by Stuart Weibel since (Diane Hillmann and Carl Lagoze of Cornell are very active in this group.)
12
13 Dublin Core publisher: OCLC creator: Weibel, Stuart L. creator: Miller, Eric J. title: Dublin Core Reference Page date: format: text/html (MIME type) language: en (English) identifier:
14 Dublin Core with Meta Tags
15 Dublin Core elements 1. Title The name given to the resource by the creator or publisher. 2. Creator The person or organization primarily responsible for the intellectual content of the resource. For example, authors in the case of written documents, artists, photographers, or illustrators in the case of visual resources. 3. Subject The topic of the resource. Typically, subject will be expressed as keywords or phrases that describe the subject or content of the resource. The use of controlled vocabularies and formal classification schemes is encouraged.
16 Dublin Core elements 4. Description A textual description of the content of the resource, including abstracts in the case of document-like objects or content descriptions in the case of visual resources. 5. Publisher The entity responsible for making the resource available in its present form, such as a publishing house, a university department, or a corporate entity. 6. Contributor A person or organization not specified in a creator element who has made significant intellectual contributions to the resource but whose contribution is secondary to any person or organization specified in a creator element (for example, editor, transcriber, and illustrator).
17 Dublin Core elements 7. Date A date associated with the creation or availability of the resource. 8. Type The category of the resource, such as home page, novel, poem, working paper, preprint, technical report, essay, dictionary. 9. Format The data format of the resource, used to identify the software and possibly hardware that might be needed to display or operate the resource. 10. Identifier A string or number used to uniquely identify the resource. Examples for networked resources include URLs and URNs.
18 Dublin Core elements 11. Source Information about a second resource from which the present resource is derived. 12. Language The language of the intellectual content of the resource. 13. Relation An identifier of a second resource and its relationship to the present resource. This element permits links between related resources and resource descriptions to be indicated. Examples include an edition of a work (IsVersionOf), or a chapter of a book (IsPartOf).
19 Dublin Core elements 14. Coverage The spatial locations and temporal durations characteristic of the resource. 15. Rights A rights management statement, an identifier that links to a rights management statement, or an identifier that links to a service providing information about rights management for the resource.
20 Qualifiers Element qualifier Example: Date DC.Date -> Created: DC.Date -> Issued: DC.Date -> Available: / DC.Date -> Valid: /
21 Qualifiers Value qualifiers Example: Subject DC.Subject -> DDC: DC.Subject -> LCSH: Digital libraries-United States
22
23 Dublin Core with qualifiers Digital Libraries and the Problem of Purpose David M. Levy Corporation for National Research Initiatives January 2000 article /january2000-levy English Copyright (c) David M. Levy
24 Limits of Dublin Core Complex objects Article within a journal A thumbnail of another image The March 28 final edition of a newspaper Complete object Sub-objects Metadata records
25 Flat v. linked records Flat record All information about an item is held in a single Dublin Core record, including information about related items convenient for access and preservation information is repeated -- maintenance problem Linked record Related information is held in separate records with a link from the item record less convenient for access and preservation information is stored once Compare with normal forms in relational databases
26 Dublin Core with flat record extension Continuation D-Lib Magazine
27 Events Version 1 New material Version 2 Should Version 2 have its own record or should extra information be added to the Version 2 record? How are these represented in Dublin Core?
28 Minimalist versus structuralist Minimalist 15 elements, no qualifiers, suitable for non-professionals encourage creators to provide metadata Structuralists 15 elements, qualifiers, RDF, detailed coding rules will require trained metadata experts [For an example of how complex Dublin Core can become, see the source of: htm#]
29 Dublin Core in many languages See: Thomas Baker, Languages for Dublin Core, D-Lib Magazine December 1998,