CS 430: Information Discovery

Slides:



Advertisements
Similar presentations
Ali Alshowaish. dc.coverage element articulates limitations in the scope of the resource, typically along the following lines: geographical, temporal,
Advertisements

Metadata and Search at Boeing Julie Martin Library & Learning Center Services
Developing a Metadata Exchange Format for Mathematical Literature David Ruddy Project Euclid Cornell University Library DML 2010 Paris 7 July 2010.
1 CS 502: Computing Methods for Digital Libraries Lecture 18 Descriptive Metadata: Metadata Models.
RDA & Serials. RDA Toolkit CONSER RDA Cataloging Checklist for Textual Serials (DRAFT) CONSER RDA Core Elements Where’s that Tool? CONSER RDA Cataloging.
Content and Bibliographic Theory CS 431 Architecture of Web Information Systems Carl Lagoze Cornell University Acks to H. Van de Sompel.
SLIDE 1IS 257 – Fall 2007 Codes and Rules for Description: History 2 University of California, Berkeley School of Information IS 245: Organization.
Dublin Core A meta future Kara Luedke & Manny Brown.
8/28/97Information Organization and Retrieval Metadata and Data Structures University of California, Berkeley School of Information Management and Systems.
William Y. Arms Corporation for National Research Initiatives March 22, 1999 Object models, overlay journals, and virtual collections.
1 CS 502: Computing Methods for Digital Libraries Lecture 17 Descriptive Metadata: Dublin Core.
The RDF meta model: a closer look Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations.
1 CS 430: Information Discovery Lecture 15 Library Catalogs 3.
Metadata and identifiers for e- journals Copenhagen Juha Hakala Helsinki University Library
1 Open-source platform for accessible content management Museo & Web CMS.
Publishing Digital Content to a LOR Publishing Digital Content to a LOR 1.
RDF (Resource Description Framework) Why?. XML XML is a metalanguage that allows users to define markup XML separates content and structure from formatting.
Cornell CS Bibliographic Concepts CS 502 – Carl Lagoze – Cornell University Acks to H. Van de Sompel.
1 © Netskills Quality Internet Training, University of Newcastle Metadata Explained © Netskills, Quality Internet Training.
8/28/97Organization of Information in Collections Introduction to Description: Dublin Core and History University of California, Berkeley School of Information.
Metadata: An Overview Katie Dunn Technology & Metadata Librarian
1 CS 430: Information Discovery Lecture 17 Library Catalogs 2.
1 CS 430: Information Discovery Lecture 14 Automatic Extraction of Metadata.
Metadata Xiangming Mu. What is metadata? What is metadata? (cont’) Data about data –Any data aids in the identification, description and location of.
1 CS 502: Computing Methods for Digital Libraries Lecture 28 Current work in preservation.
Metadata Considerations Implementing Administrative and Descriptive Metadata for your digital images 1.
1 CS/INFO 430 Information Retrieval Lecture 20 Metadata 2.
Meta Tagging / Metadata Lindsay Berard Assisted by: Li Li.
1 CS/INFO 430 Information Retrieval Lecture 16 Metadata 3.
1 CS 430: Information Discovery Lecture 6 Descriptive Metadata 2 Library Catalogs Dublin Core.
1 CS 430: Information Discovery Lecture 7 Descriptive Metadata 3 Dublin Core Automatic Generation of Catalog Records.
1 herbert van de sompel CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel
1 CS 502: Computing Methods for Digital Libraries Lecture 19 Interoperability Z39.50.
1 Metadata –Information about information – Different objects, different forms – e.g. Library catalogue record Property:Value: Author Ian Beardwell Publisher.
LIS654 lecture 5 DC metadata and omeka tables Thomas Krichel
Modularization and Interoperability: Dublin Core and the Warwick Framework Sandra D. Payette Digital Library Research Group Cornell University November.
1 Discussion Class 4 The Dublin Core Metadata Initiative.
Metadata and Documentation Iain Wallace Performing Arts Data Service.
Metadata Bridget Jones Information Architecture I February 23, 2009.
APPLYING FRBR TO LIBRARY CATALOGUES A REVIEW OF EXISTING FRBRIZATION PROJECTS Martha M. Yee September 9, 2006 draft.
EEL 5937 Ontologies EEL 5937 Multi Agent Systems Lecture 5, Jan 23 th, 2003 Lotzi Bölöni.
Evidence from Metadata INST 734 Doug Oard Module 8.
RDA DAY 1 – part 2 web version 1. 2 When you catalog a “book” in hand: You are working with a FRBR Group 1 Item The bibliographic record you create will.
1 CS 430: Information Discovery Sample Midterm Examination Notes on the Solutions.
1 CS 430: Information Discovery Lecture 5 Descriptive Metadata 1 Libraries Catalogs Dublin Core.
The RDF meta model Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations of XML compared.
FRBR: Cataloging’s New Frontier Emily Dust Nimsakont Nebraska Library Commission NCompass Live December 15, 2010 Photo credit:
Functional Requirements for Bibliographic Records The Changing Face of Cataloging William E. Moen Texas Center for Digital Knowledge School of Library.
1 CS 430: Information Discovery Lecture 6 Descriptive Metadata 2 Library Catalogs.
1 CS 430: Information Discovery Lecture 8 Collection-Level Metadata Vector Methods.
8/28/97Information Organization and Retrieval Introduction University of California, Berkeley School of Information Management and Systems SIMS 245: Organization.
1 CS 430: Information Discovery Lecture 26 Architecture of Information Retrieval Systems 1.
Describing resources II: Dublin Core CERN-UNESCO School on Digital Libraries Rabat, Nov 22-26, 2010 Annette Holtkamp CERN.
An Application Profile and Prototype Metadata Management System for Licensed Electronic Resources Adam Chandler Information Technology Librarian Central.
Dublin Core Basics Workshop Lisa Gonzalez KB/LM Librarian.
1 CS 430: Information Discovery Lecture 7 Automatic Generation of Catalog Records.
Attributes and Values Describing Entities. Metadata At the most basic level, metadata is just another term for description, or information about an entity.
1 Midterm Examination. 2 General Observations Examination was too long! Most people submitted by .
Professional Development Programme: Design and Development of Institutional Repository Using DSpace Nipul G Shihora INFLIBNET Centre Gandhinagar
Lecture 12 Why metadata? CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel
CS 430: Information Discovery
Chapter Eight Interoperability How to Build a Digital Library
WHAT DOES THE FUTURE HOLD? Ann Ellis Dec. 18, 2000
Catherine Lai MUMT-611 MIR January 27, 2005
Introduction to Metadata
Attributes and Values Describing Entities.
Metadata - Catalogues and Digitised works
Some Options for Non-MARC Descriptive Metadata
Attributes and Values Describing Entities.
CS 430: Information Discovery
Presentation transcript:

CS 430: Information Discovery Lecture 14 Library Catalogs 2

Course Administration •

Dublin Core Dublin Core is an attempt to apply cataloguing methods to online materials, notably the Web. History The methods of full text indexing that were used by the early Web search engines, such as Lycos, would not scale up. "... indexes are most useful in small collections within a given domain. As the scope of their coverage expands, indexes succumb to problems of large retrieval sets and problems of cross disciplinary semantic drift. Richer records, created by content experts, are necessary to improve search and retrieval." Weibel 1995

Dublin Core Simple set of metadata elements for online information 15 basic elements intended for all types and genres of material all elements optional all elements repeatable Developed by an international group chaired by Stuart Weibel since 1995. (Diane Hillmann and Carl Lagoze of Cornell have been very active in this group.)

Dublin Core elements 1. Title The name given to the resource by the creator or publisher. 2. Creator The person or organization primarily responsible for the intellectual content of the resource. For example, authors in the case of written documents, artists, photographers, or illustrators in the case of visual resources. 3. Subject The topic of the resource. Typically, subject will be expressed as keywords or phrases that describe the subject or content of the resource. The use of controlled vocabularies and formal classification schemes is encouraged.

Dublin Core elements 4. Description A textual description of the content of the resource, including abstracts in the case of document-like objects or content descriptions in the case of visual resources. 5. Publisher The entity responsible for making the resource available in its present form, such as a publishing house, a university department, or a corporate entity. 6. Contributor A person or organization not specified in a creator element who has made significant intellectual contributions to the resource but whose contribution is secondary to any person or organization specified in a creator element (for example, editor, transcriber, and illustrator).

Dublin Core elements 7. Date A date associated with the creation or availability of the resource. 8. Type The category of the resource, such as home page, novel, poem, working paper, preprint, technical report, essay, dictionary. 9. Format The data format of the resource, used to identify the software and possibly hardware that might be needed to display or operate the resource. 10. Identifier A string or number used to uniquely identify the resource. Examples for networked resources include URLs and URNs.

Dublin Core elements 11. Source Information about a second resource from which the present resource is derived. 12. Language The language of the intellectual content of the resource. 13. Relation An identifier of a second resource and its relationship to the present resource. This element permits links between related resources and resource descriptions to be indicated. Examples include an edition of a work (IsVersionOf), or a chapter of a book (IsPartOf).

Dublin Core elements 14. Coverage The spatial locations and temporal durations characteristic of the resource. 15. Rights A rights management statement, an identifier that links to a rights management statement, or an identifier that links to a service providing information about rights management for the resource.

Dublin Core publisher: OCLC creator: Weibel, Stuart L. creator: Miller, Eric J. title: Dublin Core Reference Page date: 1996-05-28 format: text/html (MIME type) language: en (English) identifier: http://purl.org/dc/documents/rec-dces-199809.htm#

Representations of Dublin Core: Meta Tags <meta name="publisher" content="OCLC"> <meta name="creator" content="Weibel, Stuart L."> <meta name="creator" content="Miller, Eric J."> <meta name="title" content="Dublin Core Reference Page"> <meta name="date" content="1996-05-28"> <meta name="format" content="text/html"> <meta name="language" content="en"> <meta name="identifier" content="http://purl.org/dc/documents/rec-dces-199809.htm#">

Qualifiers Element qualifier Example: Date DC.Date.Created 1997-11-01 DC.Date.Issued 1997-11-15 DC.Date.Available 1997-12-01/1998-06-01 DC.Date.Valid 1998-01-01/1998-06-01

Qualifiers Value qualifiers Example: Subject DC.Subject.DDC 509.123 DC.Subject.LCSH Digital libraries-United States

Dumbing Down Principle "The theory behind this principle is that consumers of metadata should be able to strip off qualifiers and return to the base form of a property. ... this principle makes it possible for client applications to ignore qualifiers in the context of more coarse-grained, cross-domain searches." Lagoze 2001

Dumbing Down Principle Qualified version DC.Date.Created 1997-11-01 DC.Subject.LCSH Digital libraries-United States Dumbed-down version DC.Date 1997-11-01 a valid date DC.Subject Digital libraries-United States a valid subject description

Representations of Dublin Core: Text (with qualifiers) See next two slides for an example of a Dublin Core record for a web site prepared by a professional cataloguer at the Library of Congress. Note that the record does not follow the principle of dumbing-down.

Old Midterm Examination Dumbing-down failures: Description.note Title from home page as viewed on Nov. 1, 2000. Description Title from home page as viewed on Nov. 1, 2000. which is not a description of the object Publisher.place Nashville, Tenn. : Publisher Nashville, Tenn. : which is not the publisher of the object Correct dumbing-down: Subject.class.LCC E840.8.G65 Subject E840.8.G65 which is a subject code

What to Catalog: IFLA Model Work A work is the underlying abstraction, e.g., The Iliad The Computer Science departmental web site Beethoven's Fifth Symphony Unix operating system The 1996 U.S. census This is roughly equivalent to the concept of "literary work" used in copyright law.

IFLA Model Expression. A work is realized through an expression, e.g., The Illiad has oral expressions and written expressions A musical work has score and performance(s). Software has source code and machine code Many works have only a single expression, e.g. a web page, or a book.

IFLA Model Manifestation. A expression is given form in one or more manifestations, e.g., The text of The Iliad has been manifest in numerous manuscripts and printed books. A musical performance can be distributed on CD, or broadcast on television. Software is manifest as files, which may be stored or transmitted in any digital medium.

IFLA Model Item. When many copies are made of a manifestation, each is a separate item, e.g., a specific copy of a book computer file [Works, expressions, manifestations and items are explored in CS 431, Architecture of Web Information Systems.]

Limits of Dublin Core and MARC: Complex Objects Metadata records Complete object Sub-objects Article within a journal Page within a Web site A thumbnail of another image The March 28 final edition of a newspaper

Flat v. linked records Flat record All information about an item is held in a single Dublin Core record, including information about related items convenient for access and preservation information is repeated -- maintenance problem Linked record Related information is held in separate records with a link from the item record less convenient for access and preservation information is stored once Compare with normal forms in relational databases

Representations of Dublin Core: XML (with qualifiers) <title>Digital Libraries and the Problem of Purpose</title> <creator>David M. Levy</creator> <publisher>Corporation for National Research Initiatives</publisher> <date date-type = "publication">January 2000</date> <type resource-type = "work">article</type> <identifier uri-type = "DOI">10.1045/january2000-levy</identifier> <identifier uri-type = "URL">http://www.dlib.org/dlib/january00/01levy.html</identifier> <language>English</language> <rights>Copyright (c) David M. Levy</rights> to be continued

Dublin Core with flat record extension Continuation of D-Lib Magazine record <relation rel-type = "InSerial"> <serial-name>D-Lib Magazine</serial-name> <issn>1082-9873</issn> <volume>6</volume> <issue>1</issue> </relation>

Limits of Dublin Core and MARC: Events Version 1 Version 2 New material Should Version 2 have its own record or should extra information be added to the Version 2 record? How are these represented in Dublin Core or MARC?

Using Catalog Data for Information Retrieval The basic operation of information retrieval is to match the way that a user describes an information requirement (a query), against the way that items are described (an index). The success of conventional catalogs (e.g., MARC + Anglo-American Cataloguing Rules) or indexing services (e.g., Medline) comes from the use of precise language to describe items combined with trained and experienced users to formulate queries.

Why is Dublin Core not used to Index and Search the Web? Technology: The methods used in early Infoseek, Lycos and Altavista have been greatly enhanced. (Note that these methods provide quite good precision at the expense of low recall.) Users: The typical user who searches the Web has limited training and does not understand catalogs. Economics: The size of the Web makes human indexing of every important site impossible. The rate of change requires frequent re-indexing.

Dublin Core in Many Languages See: Thomas Baker, Languages for Dublin Core, D-Lib Magazine December 1998, http://www.dlib.org/dlib/december98/12baker.html