CS 430: Information Discovery

Slides:



Advertisements
Similar presentations
Ali Alshowaish. dc.coverage element articulates limitations in the scope of the resource, typically along the following lines: geographical, temporal,
Advertisements

Metadata and Search at Boeing Julie Martin Library & Learning Center Services
Putting together a METS profile. Questions to ask when setting down the METS path Should you design your own profile? Should you use someone elses off.
Metadata 8/7/2012 Katie Moss Digital Metadata Technician, Digital Library Services
1 CS 502: Computing Methods for Digital Libraries Lecture 18 Descriptive Metadata: Metadata Models.
Content and Bibliographic Theory CS 431 Architecture of Web Information Systems Carl Lagoze Cornell University Acks to H. Van de Sompel.
SLIDE 1IS 257 – Fall 2007 Codes and Rules for Description: History 2 University of California, Berkeley School of Information IS 245: Organization.
Dublin Core A meta future Kara Luedke & Manny Brown.
8/28/97Information Organization and Retrieval Metadata and Data Structures University of California, Berkeley School of Information Management and Systems.
William Y. Arms Corporation for National Research Initiatives March 22, 1999 Object models, overlay journals, and virtual collections.
The Future of the Document Paper is OUT Trees are IN UVic Humanities Computing and Media Centre.
1 CS 502: Computing Methods for Digital Libraries Lecture 13 Descriptive Metadata I: cataloguing, classification, authority files.
1 CS 502: Computing Methods for Digital Libraries Lecture 17 Descriptive Metadata: Dublin Core.
1 CS 502: Computing Methods for Digital Libraries Lecture 4 Identifiers and Reference Links.
1 CS 430: Information Discovery Lecture 15 Library Catalogs 3.
Metadata and identifiers for e- journals Copenhagen Juha Hakala Helsinki University Library
1 Open-source platform for accessible content management Museo & Web CMS.
RDF (Resource Description Framework) Why?. XML XML is a metalanguage that allows users to define markup XML separates content and structure from formatting.
Cornell CS Bibliographic Concepts CS 502 – Carl Lagoze – Cornell University Acks to H. Van de Sompel.
1 © Netskills Quality Internet Training, University of Newcastle Metadata Explained © Netskills, Quality Internet Training.
8/28/97Organization of Information in Collections Introduction to Description: Dublin Core and History University of California, Berkeley School of Information.
Metadata: An Overview Katie Dunn Technology & Metadata Librarian
1 CS 430: Information Discovery Lecture 17 Library Catalogs 2.
1 CS 430: Information Discovery Lecture 14 Automatic Extraction of Metadata.
Metadata Xiangming Mu. What is metadata? What is metadata? (cont’) Data about data –Any data aids in the identification, description and location of.
1 CS/INFO 430 Information Retrieval Lecture 20 Metadata 2.
Meta Tagging / Metadata Lindsay Berard Assisted by: Li Li.
Content and Computer Platforms Week 3. Today’s goals Obtaining, describing, indexing content –XML –Metadata Preparing for the installation of Dspace –Computers.
1 CS/INFO 430 Information Retrieval Lecture 16 Metadata 3.
1 CS 430: Information Discovery Lecture 6 Descriptive Metadata 2 Library Catalogs Dublin Core.
INLS 520 – Fall 2007 Erik Mitchell INLS 520 Information Organization.
1 CS 430: Information Discovery Lecture 7 Descriptive Metadata 3 Dublin Core Automatic Generation of Catalog Records.
1 herbert van de sompel CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel
1 Metadata –Information about information – Different objects, different forms – e.g. Library catalogue record Property:Value: Author Ian Beardwell Publisher.
LIS654 lecture 5 DC metadata and omeka tables Thomas Krichel
Modularization and Interoperability: Dublin Core and the Warwick Framework Sandra D. Payette Digital Library Research Group Cornell University November.
1 Discussion Class 4 The Dublin Core Metadata Initiative.
Metadata and Documentation Iain Wallace Performing Arts Data Service.
EEL 5937 Ontologies EEL 5937 Multi Agent Systems Lecture 5, Jan 23 th, 2003 Lotzi Bölöni.
Evidence from Metadata INST 734 Doug Oard Module 8.
Introduction to Metadata Jenn Riley Metadata Librarian IU Digital Library Program.
1 CS 430: Information Discovery Sample Midterm Examination Notes on the Solutions.
Intellectual Works and their Manifestations Representation of Information Objects IR Systems & Information objects Spring January, 2006 Bharat.
1 CS 430: Information Discovery Lecture 5 Descriptive Metadata 1 Libraries Catalogs Dublin Core.
Functional Requirements for Bibliographic Records The Changing Face of Cataloging William E. Moen Texas Center for Digital Knowledge School of Library.
1 CS 430: Information Discovery Lecture 6 Descriptive Metadata 2 Library Catalogs.
Metadata and Meta tag. What is metadata? What does metadata do? Metadata schemes What is meta tag? Meta tag example Table of Content.
Differences and distinctions: metadata types and their uses Stephen Winch Information Architecture Officer, SLIC.
1 CS 430: Information Discovery Lecture 8 Collection-Level Metadata Vector Methods.
8/28/97Information Organization and Retrieval Introduction University of California, Berkeley School of Information Management and Systems SIMS 245: Organization.
1 CS 430: Information Discovery Lecture 26 Architecture of Information Retrieval Systems 1.
Describing resources II: Dublin Core CERN-UNESCO School on Digital Libraries Rabat, Nov 22-26, 2010 Annette Holtkamp CERN.
An Application Profile and Prototype Metadata Management System for Licensed Electronic Resources Adam Chandler Information Technology Librarian Central.
Dublin Core Basics Workshop Lisa Gonzalez KB/LM Librarian.
1 CS 430: Information Discovery Lecture 7 Automatic Generation of Catalog Records.
Global Rangelands Data Entry Guidelines March 23, 2015.
Attributes and Values Describing Entities. Metadata At the most basic level, metadata is just another term for description, or information about an entity.
1 Midterm Examination. 2 General Observations Examination was too long! Most people submitted by .
CS 430: Information Discovery
Lecture 12 Why metadata? CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel
Chapter Eight Interoperability How to Build a Digital Library
WHAT DOES THE FUTURE HOLD? Ann Ellis Dec. 18, 2000
Catherine Lai MUMT-611 MIR January 27, 2005
Introduction to Metadata
Attributes and Values Describing Entities.
Metadata - Catalogues and Digitised works
Introduction to Metadata
Some Options for Non-MARC Descriptive Metadata
Attributes and Values Describing Entities.
CS 430: Information Discovery
Presentation transcript:

CS 430: Information Discovery Lecture 13 Descriptive Metadata: Dublin Core

Course Administration •

Notes on MARC A great achievement: Developed in 1960s Magnetic tape exchange format for printing catalog records The dawn of computing: mixed upper and lower case variable length fields, repeated fields non-Roman scripts 100(?) million records with standard content and format Thousands of trained librarians (millions?)

Notes on MARC A great problem: Not designed for computer algorithms One record per item (poor links between records) Tied to traditional materials and traditional practices Not Unicode 100 million records at $100 -- $10 billion A classic legacy system!

IFLA Model Work A work is the underlying abstraction, e.g., The Iliad The Computer Science departmental web site Beethoven's Fifth Symphony Unix operating system The 1996 U.S. census This is roughly equivalent to the concept of "literary work" used in copyright law.

IFLA Model Expression. A work is realized through an expression, e.g., The Illiad has oral expressions and written expressions A musical work has score and performance(s). Software has source code and machine code Many works have only a single expression, e.g. a web page, or a book.

IFLA Model Manifestation. A expression is given form in one or more manifestations, e.g., The text of The Iliad has been manifest in numerous manuscripts and printed books. A musical performance can be distributed on CD, or broadcast on television. Software is manifest as files, which may be stored or transmitted in any digital medium.

IFLA Model Item. When many copies are made of a manifestation, each is a separate item, e.g., a specific copy of a book computer file [Works, expressions, manifestations and items are explored in CS 502, Architecture of Web Information Systems.]

Dublin Core Simple set of metadata elements for online information 15 basic elements intended for all types and genres of material all elements optional all elements repeatable Developed by an international group chaired by Stuart Weibel since 1995. (Diane Hillmann and Carl Lagoze of Cornell are very active in this group.)

Dublin Core publisher: OCLC creator: Weibel, Stuart L. creator: Miller, Eric J. title: Dublin Core Reference Page date: 1996-05-28 format: text/html (MIME type) language: en (English) identifier: http://purl.org/dc/documents/rec-dces-199809.htm#

Dublin Core elements 1. Title The name given to the resource by the creator or publisher. 2. Creator The person or organization primarily responsible for the intellectual content of the resource. For example, authors in the case of written documents, artists, photographers, or illustrators in the case of visual resources. 3. Subject The topic of the resource. Typically, subject will be expressed as keywords or phrases that describe the subject or content of the resource. The use of controlled vocabularies and formal classification schemes is encouraged.

Dublin Core elements 4. Description A textual description of the content of the resource, including abstracts in the case of document-like objects or content descriptions in the case of visual resources. 5. Publisher The entity responsible for making the resource available in its present form, such as a publishing house, a university department, or a corporate entity. 6. Contributor A person or organization not specified in a creator element who has made significant intellectual contributions to the resource but whose contribution is secondary to any person or organization specified in a creator element (for example, editor, transcriber, and illustrator).

Dublin Core elements 7. Date A date associated with the creation or availability of the resource. 8. Type The category of the resource, such as home page, novel, poem, working paper, preprint, technical report, essay, dictionary. 9. Format The data format of the resource, used to identify the software and possibly hardware that might be needed to display or operate the resource. 10. Identifier A string or number used to uniquely identify the resource. Examples for networked resources include URLs and URNs.

Dublin Core elements 11. Source Information about a second resource from which the present resource is derived. 12. Language The language of the intellectual content of the resource. 13. Relation An identifier of a second resource and its relationship to the present resource. This element permits links between related resources and resource descriptions to be indicated. Examples include an edition of a work (IsVersionOf), or a chapter of a book (IsPartOf).

Dublin Core elements 14. Coverage The spatial locations and temporal durations characteristic of the resource. 15. Rights A rights management statement, an identifier that links to a rights management statement, or an identifier that links to a service providing information about rights management for the resource.

Qualifiers Element qualifier Example: Date DC.Date.Created 1997-11-01 DC.Date.Issued 1997-11-15 DC.Date.Available 1997-12-01/1998-06-01 DC.Date.Valid 1998-01-01/1998-06-01

Qualifiers Value qualifiers Example: Subject DC.Subject.DDC 509.123 DC.Subject.LCSH Digital libraries-United States

Representations of Dublin Core: Meta Tags <meta name="publisher" content="OCLC"> <meta name="creator" content="Weibel, Stuart L."> <meta name="creator" content="Miller, Eric J."> <meta name="title" content="Dublin Core Reference Page"> <meta name="date" content="1996-05-28"> <meta name="format" content="text/html"> <meta name="language" content="en"> <meta name="identifier" content="http://purl.org/dc/documents/rec-dces-199809.htm#">

Representations of Dublin Core: XML (with qualifiers) <title>Digital Libraries and the Problem of Purpose</title> <creator>David M. Levy</creator> <publisher>Corporation for National Research Initiatives</publisher> <date date-type = "publication">January 2000</date> <type resource-type = "work">article</type> <identifier uri-type = "DOI">10.1045/january2000-levy</identifier> <identifier uri-type = "URL">http://www.dlib.org/dlib/january00/01levy.html</identifier> <language>English</language> <rights>Copyright (c) David M. Levy</rights>

Representations of Dublin Core: Text (with qualifiers) See next two slides for an example of a Dublin Core record for a web site prepared by a professional cataloguer at the Library of Congress. Note that the record does not follow the principle of dumbing-down.

Old Midterm Examination What is the Dublin Core principle of dumbing-down? Are there any fields in this record that do not satisfy the principle?

Old Midterm Examination What is the Dublin Core principle of dumbing-down? Are there any fields in this record that do not satisfy the principle? "The theory behind this principle is that consumers of metadata should be able to strip off qualifiers and return to the base form of a property. ... this principle makes it possible for client applications to ignore qualifiers in the context of more coarse-grained, cross-domain searches." Lagoze 2001

Old Midterm Examination Dumbing-down failures: Description.note Title from home page as viewed on Nov. 1, 2000. Description Title from home page as viewed on Nov. 1, 2000. which is not a description of the object Publisher.place Nashville, Tenn. : Publisher Nashville, Tenn. : which is not the publisher of the object Correct dumbing-down: Subject.class.LCC E840.8.G65 Subject E840.8.G65 which is a subject code

Old Midterm Examination 4(b) The metadata in the fields Publisher and Publisher place end in punctuation marks. Can you suggest any reasons for doing so?

Old Midterm Examination 4(b) The metadata in the fields Publisher and Publisher place end in punctuation marks. Can you suggest any reasons for doing so? This is a historic curiosity. It comes from the concept that the metadata will be printed, so that the metadata is stored in a printable format. Publisher Gore/Lieberman, Publisher.place Nashville, Tenn. : is intended to be combined with a date as follows: Nashville, Tenn. : Gore/Lieberman, 2001

Old Midterm Examination 4(c) This record has no Creator field. It has a Contributor.nameCorporate field with value "Gore/Lieberman, Inc." Do you consider that this is correct use of Dublin Core? What would you put in the Creator and Contributor fields? Why?

Old Midterm Examination Specification of Dublin Core: A. All fields are optional. It is not necessary to have a Creator. B. Definitions of fields Creator The person or organization primarily responsible for the intellectual content of the resource. Contributor A person or organization not specified in a creator element who has made significant intellectual contributions to the resource but whose contribution is secondary to any person or organization specified in a creator element. Gore/Lieberman, Inc. is the corporate author of this web site and is therefore the Creator.

Limits of Dublin Core Complex objects Metadata records Complete object Sub-objects Article within a journal A thumbnail of another image The March 28 final edition of a newspaper

Flat v. linked records Flat record All information about an item is held in a single Dublin Core record, including information about related items convenient for access and preservation information is repeated -- maintenance problem Linked record Related information is held in separate records with a link from the item record less convenient for access and preservation information is stored once Compare with normal forms in relational databases

Dublin Core with flat record extension Continuation <relation rel-type = "InSerial"> <serial-name>D-Lib Magazine</serial-name> <issn>1082-9873</issn> <volume>6</volume> <issue>1</issue> </relation>

Events Version 1 Version 2 New material Should Version 2 have its own record or should extra information be added to the Version 2 record? How are these represented in Dublin Core?

Minimalist versus structuralist 15 elements, no qualifiers, suitable for non-professionals encourage creators to provide metadata Structuralists 15 elements, qualifiers, RDF, detailed coding rules will require trained metadata experts [For an example of how complex Dublin Core can become, see the source of: http://purl.org/dc/documents/rec-dces-199809.htm#]

Dublin Core: Personal Opinion Dublin Core is a simple way to describe digital content that: is a single, self-contained object ("document-like") is static with time has few relationships Some web sites satisfy these criteria Dublin Core is not suitable for digital content that: is heavily structured changes dynamically Dublin Core contains limited descriptive metadata for information discovery

Dublin Core in Many Languages See: Thomas Baker, Languages for Dublin Core, D-Lib Magazine December 1998, http://www.dlib.org/dlib/december98/12baker.html