1 CS/INFO 430 Information Retrieval Lecture 20 Metadata 2.

Slides:



Advertisements
Similar presentations
Ali Alshowaish. dc.coverage element articulates limitations in the scope of the resource, typically along the following lines: geographical, temporal,
Advertisements

Metadata vocabularies and ontologies Dr. Manjula Patel Technical Research and Development
Developing a Metadata Exchange Format for Mathematical Literature David Ruddy Project Euclid Cornell University Library DML 2010 Paris 7 July 2010.
Content and Systems Week 3. Today’s goals Obtaining, describing, indexing content –XML –Metadata Preparing for the installation of Dspace –Computers available.
SLIDE 1IS 257 – Fall 2007 Codes and Rules for Description: History 2 University of California, Berkeley School of Information IS 245: Organization.
© Tefko Saracevic, Rutgers University1 metadata considerations for digital libraries.
8/28/97Information Organization and Retrieval Metadata and Data Structures University of California, Berkeley School of Information Management and Systems.
Lis512 lecture 6 identifiers, dublin core and RDF.
William Y. Arms Corporation for National Research Initiatives March 22, 1999 Object models, overlay journals, and virtual collections.
1 CS 502: Computing Methods for Digital Libraries Lecture 17 Descriptive Metadata: Dublin Core.
The RDF meta model: a closer look Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations.
1 CS 430: Information Discovery Lecture 15 Library Catalogs 3.
LIS654 lecture 5 DC metadata and omeka tables Thomas Krichel
Cornell CS Bibliographic Concepts CS 502 – Carl Lagoze – Cornell University Acks to H. Van de Sompel.
1 © Netskills Quality Internet Training, University of Newcastle Metadata Explained © Netskills, Quality Internet Training.
8/28/97Organization of Information in Collections Introduction to Description: Dublin Core and History University of California, Berkeley School of Information.
Metadata: An Overview Katie Dunn Technology & Metadata Librarian
Dublin Core Metadata Jenn Riley Metadata Librarian IU Digital Library Program.
1 CS 430: Information Discovery Lecture 17 Library Catalogs 2.
1 CS 430: Information Discovery Lecture 14 Automatic Extraction of Metadata.
Metadata Xiangming Mu. What is metadata? What is metadata? (cont’) Data about data –Any data aids in the identification, description and location of.
© Keith G Jeffery & Anne AssersonCERIF Course: Evolution CERIF COURSE Session 6: Evolution Keith G Jeffery, Director, IT CLRC
Meta Tagging / Metadata Lindsay Berard Assisted by: Li Li.
JENN RILEY METADATA LIBRARIAN IU DIGITAL LIBRARY PROGRAM Introduction to Metadata.
Content and Computer Platforms Week 3. Today’s goals Obtaining, describing, indexing content –XML –Metadata Preparing for the installation of Dspace –Computers.
1 CS/INFO 430 Information Retrieval Lecture 16 Metadata 3.
1 CS 430: Information Discovery Lecture 6 Descriptive Metadata 2 Library Catalogs Dublin Core.
1 CS 430: Information Discovery Lecture 7 Descriptive Metadata 3 Dublin Core Automatic Generation of Catalog Records.
1 herbert van de sompel CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel
1 Metadata –Information about information – Different objects, different forms – e.g. Library catalogue record Property:Value: Author Ian Beardwell Publisher.
LIS654 lecture 5 DC metadata and omeka tables Thomas Krichel
Modularization and Interoperability: Dublin Core and the Warwick Framework Sandra D. Payette Digital Library Research Group Cornell University November.
1 Discussion Class 4 The Dublin Core Metadata Initiative.
Metadata and Documentation Iain Wallace Performing Arts Data Service.
Metadata Bridget Jones Information Architecture I February 23, 2009.
Introduction to metadata
Evidence from Metadata INST 734 Doug Oard Module 8.
1 Dublin Core & DCMI – an introduction Some slides are from DCMI Training Resources at:
Introduction to Metadata Jenn Riley Metadata Librarian IU Digital Library Program.
A Whirlwind Tour Through Part of the Metadata Landscape Jenn Riley Metadata Librarian IU Digital Library Program.
Content and Systems Week 3. Today’s goals Obtaining, describing, indexing content –XML –Metadata Preparing for the installation of Dspace –Computers available.
1 CS 430: Information Discovery Sample Midterm Examination Notes on the Solutions.
1 CS/INFO 430 Information Retrieval Lecture 15 Metadata 2.
Intellectual Works and their Manifestations Representation of Information Objects IR Systems & Information objects Spring January, 2006 Bharat.
1 CS 430: Information Discovery Lecture 5 Descriptive Metadata 1 Libraries Catalogs Dublin Core.
The RDF meta model Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations of XML compared.
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
Metadata and Meta tag. What is metadata? What does metadata do? Metadata schemes What is meta tag? Meta tag example Table of Content.
Differences and distinctions: metadata types and their uses Stephen Winch Information Architecture Officer, SLIC.
8/28/97Information Organization and Retrieval Introduction University of California, Berkeley School of Information Management and Systems SIMS 245: Organization.
Describing resources II: Dublin Core CERN-UNESCO School on Digital Libraries Rabat, Nov 22-26, 2010 Annette Holtkamp CERN.
An Application Profile and Prototype Metadata Management System for Licensed Electronic Resources Adam Chandler Information Technology Librarian Central.
Dublin Core Basics Workshop Lisa Gonzalez KB/LM Librarian.
1 CS 430: Information Discovery Lecture 7 Automatic Generation of Catalog Records.
Attributes and Values Describing Entities. Metadata At the most basic level, metadata is just another term for description, or information about an entity.
Geospatial metadata Prof. Wenwen Li School of Geographical Sciences and Urban Planning 5644 Coor Hall
Some basic concepts Week 1 Lecture notes INF 384C: Organizing Information Spring 2016 Karen Wickett UT School of Information.
1 Midterm Examination. 2 General Observations Examination was too long! Most people submitted by .
prepared by Dr. Ammar Yakan
CS 430: Information Discovery
CS 430: Information Discovery
WHAT DOES THE FUTURE HOLD? Ann Ellis Dec. 18, 2000
Introduction to Metadata
Attributes and Values Describing Entities.
A Whirlwind Tour Through Part of the Metadata Landscape
Session 2: Metadata and Catalogues
Some Options for Non-MARC Descriptive Metadata
Proposal of a Geographic Metadata Profile for WISE
Attributes and Values Describing Entities.
CS 430: Information Discovery
Presentation transcript:

1 CS/INFO 430 Information Retrieval Lecture 20 Metadata 2

2 Course Administration

3 Cataloguing Online Materials: Dublin Core Dublin Core is an attempt to apply cataloguing methods to online materials, notably the Web. History It was anticipated that the methods of full text indexing that were used by the early Web search engines, such as Lycos, would not scale up. "... [automated] indexes are most useful in small collections within a given domain. As the scope of their coverage expands, indexes succumb to problems of large retrieval sets and problems of cross disciplinary semantic drift. Richer records, created by content experts, are necessary to improve search and retrieval." Weibel 1995

4 Dublin Core Simple set of metadata elements for online information 15 basic elements intended for all types and genres of material all elements optional all elements repeatable Developed by an international group chaired by Stuart Weibel since (Diane Hillmann of Cornell has been very active in this group.)

5

6 Dublin Core record for the Dublin Core Web Site contributor: Dublin Core Metadata Initiative description: The Dublin Core Metadata Initiative is an open forum engaged in the development of interoperable online metadata standards that support a broad range of purposes and business models... title: Dublin Core Metadata Initiative (DCMI) Home Page date: format: text/html (MIME type) language: en (English)

7 Dublin Core elements Element Name: Title Definition: A name given to the resource. Comment: Typically, Title will be a name by which the resource is formally known. Element Name: Creator Definition: An entity primarily responsible for making the content of the resource. Comment: Examples of Creator include a person, an organization, or a service. Typically, the name of a Creator should be used to indicate the entity.

8 Dublin Core elements Element Name: Subject Definition: A topic of the content of the resource. Comment: Typically, Subject will be expressed as keywords, key phrases or classification codes that describe a topic of the resource. Recommended best practice is to select a value from a controlled vocabulary or formal classification scheme. Element Name: Description Definition: An account of the content of the resource. Comment: Examples of Description include, but is not limited to: an abstract, table of contents, reference to a graphical representation of content or a free-text account of the content.

9 Dublin Core elements Element Name: Publisher Definition: An entity responsible for making the resource available Comment: Examples of Publisher include a person, an organization, or a service. Typically, the name of a Publisher should be used to indicate the entity. Element Name: Contributor Definition: An entity responsible for making contributions to the content of the resource. Comment: Examples of Contributor include a person, an organization, or a service. Typically, the name of a Contributor should be used to indicate the entity.

10 Dublin Core elements Element Name: Date Definition: A date of an event in the lifecycle of the resource. Comment: Typically, Date will be associated with the creation or availability of the resource. Recommended best practice for encoding the date value is defined in a profile of ISO 8601 [W3CDTF] and includes (among others) dates of the form YYYY- MM-DD.

11 Dublin Core elements Element Name: Type Definition: The nature or genre of the content of the resource. Comment: Type includes terms describing general categories, functions, genres, or aggregation levels for content. Recommended best practice is to select a value from a controlled vocabulary (for example, the DCMI Type Vocabulary [DCT1]). To describe the physical or digital manifestation of the resource, use the FORMAT element.

12 Dublin Core elements Element Name: Format Definition: The physical or digital manifestation of the resource. Comment: Typically, Format may include the media-type or dimensions of the resource. Format may be used to identify the software, hardware, or other equipment needed to display or operate the resource. Examples of dimensions include size and duration. Recommended best practice is to select a value from a controlled vocabulary (for example, the list of Internet Media Types [MIME] defining computer media formats).

13 Dublin Core elements Element Name: Identifier Definition: An unambiguous reference to the resource within a given context. Comment: Recommended best practice is to identify the resource by means of a string or number conforming to a formal identification system. Formal identification systems include but are not limited to the Uniform Resource Identifier (URI) (including the Uniform Resource Locator (URL)), the Digital Object Identifier (DOI) and the International Standard Book Number (ISBN).

14 Dublin Core elements Element Name: Source Definition: A Reference to a resource from which the present resource is derived. Comment: The present resource may be derived from the Source resource in whole or in part. Recommended best practice is to identify the referenced resource by means of a string or number conforming to a formal identification system.

15 Dublin Core elements Element Name: Language Definition: A language of the intellectual content of the resource. Comment: Recommended best practice is to use RFC 3066 [RFC3066] which, in conjunction with ISO639 [ISO639]), defines two- and three-letter primary language tags with optional subtags. Examples include "en" or "eng" for English, "akk" for Akkadian", and "en-GB" for English used in the United Kingdom. Element Name: Relation Definition: A reference to a related resource. Comment: Recommended best practice is to identify the referenced resource by means of a string or number conforming to a formal identification system.

16 Dublin Core elements Element Name: Coverage Definition: The extent or scope of the content of the resource. Comment: Typically, Coverage will include spatial location (a place name or geographic coordinates), temporal period (a period label, date, or date range) or jurisdiction (such as a named administrative entity). Recommended best practice is to select a value from a controlled vocabulary (for example, the Thesaurus of Geographic Names [TGN]) and to use, where appropriate, named places or time periods in preference to numeric identifiers such as sets of coordinates or date ranges.

17 Dublin Core elements Element Name: Rights Definition: Information about rights held in and over the resource. Comment: Typically, Rights will contain a rights management statement for the resource, or reference a service providing such information. Rights information often encompasses Intellectual Property Rights (IPR), Copyright, and various Property Rights. If the Rights element is absent, no assumptions may be made about any rights held in or over the resource.

18 Qualifiers Example: element qualifier Example: Date DC.Date.Created DC.Date.Issued DC.Date.Available / DC.Date.Valid / A qualifier refines the element name to add specificity

19 Qualifiers Example: value qualifiers Example: Subject DC.Subject.DDC (Dewey Decimal Classification) DC.Subject.LCSH Digital libraries-United States (Library of Congress Subject Heading)

20 Dumbing Down Principle "The theory behind this principle is that consumers of metadata should be able to strip off qualifiers and return to the base form of a property.... this principle makes it possible for client applications to ignore qualifiers in the context of more coarse-grained, cross-domain searches." Lagoze 2001

21 Dumbing Down Principle Qualified version DC.Date.Created DC.Subject.LCSHDigital libraries-United States Dumbed-down version DC.Date a valid date DC.SubjectDigital libraries-United States a valid subject description

22 Dublin Core with qualifiers See the next two slides for an example of a Dublin Core record for a web site prepared by a professional cataloguer at the Library of Congress. Note that the record does not follow the principle of dumbing-down.

23

24

25 Theoretical Problems in Metadata: What to Catalog The IFLA Model Work A work is the underlying abstraction, e.g., The Iliad The Computer Science departmental web site Beethoven's Fifth Symphony Unix operating system The 1996 U.S. census This is roughly equivalent to the concept of "literary work" used in copyright law.

26 IFLA Model Expression. A work is realized through an expression, e.g., The Illiad has oral expressions and written expressions A musical work has score and performance(s). Software has source code and machine code Many works have only a single expression, e.g. a Web page, or a book.

27 IFLA Model Manifestation. A expression is given form in one or more manifestations, e.g., The text of The Iliad has been manifest in numerous manuscripts and printed books. A musical performance can be distributed on CD, or broadcast on television. Software is manifest as files, which may be stored or transmitted in any digital medium.

28 IFLA Model Item. When many copies are made of a manifestation, each is a separate item, e.g., a specific copy of a book computer file [Works, expressions, manifestations and items are explored in CS 431, Architecture of Web Information Systems.]

29 Theoretical Problems in Metadata: : Events Version 1 New material Version 2 Should Version 2 have its own record or should extra information be added to the Version 2 record? How are these represented in Dublin Core or MARC?

30 Theoretical Problems in Metadata: : Complex Objects Complex objects Article within a journal Page within a Web site A thumbnail of another image The March 28 final edition of a newspaper Complete object Sub-objects Metadata records

31 Theoretical Problems in Metadata: Packaging Rules When an object consists of various parts, how should their interaction be described? Example: An object on the Web may consist of several html pages with images, applets, etc. Metadata Object Description Schema (MODS) MPEG 21

32 MPEG 21

33 Theoretical Problems in Metadata: Flat v. linked records Flat record All information about an item is held in a single record (e.g., a Dublin Core record), including information about related items convenient for access and preservation information is repeated -- maintenance problem Linked record Related information is held in separate records with a link from the item record less convenient for access and preservation information is stored once Compare with normal forms in relational databases

34

35 Representations of Dublin Core: XML (with qualifiers) Digital Libraries and the Problem of Purpose David M. Levy Corporation for National Research Initiatives January 2000 article /january2000-levy English Copyright (c) David M. Levy to be continued

36 Dublin Core with flat record extension Continuation of D-Lib Magazine record D-Lib Magazine

37 Theoretical Problems in Metadata: Many Languages See: Thomas Baker, Languages for Dublin Core, D-Lib Magazine December 1998,