1 CS 430: Information Discovery Lecture 7 Descriptive Metadata 3 Dublin Core Automatic Generation of Catalog Records.

Slides:



Advertisements
Similar presentations
Ali Alshowaish. dc.coverage element articulates limitations in the scope of the resource, typically along the following lines: geographical, temporal,
Advertisements

Metadata and Search at Boeing Julie Martin Library & Learning Center Services
Metadata vocabularies and ontologies Dr. Manjula Patel Technical Research and Development
Metadata 8/7/2012 Katie Moss Digital Metadata Technician, Digital Library Services
Module 5a: Authority Control and Encoding Schemes IMT530: Organization of Information Resources Winter 2007 Michael Crandall.
Alexandria Digital Library Project The ADEPT Bucket Framework.
1 CS 502: Computing Methods for Digital Libraries Lecture 18 Descriptive Metadata: Metadata Models.
RDA & Serials. RDA Toolkit CONSER RDA Cataloging Checklist for Textual Serials (DRAFT) CONSER RDA Core Elements Where’s that Tool? CONSER RDA Cataloging.
SLIDE 1IS 257 – Fall 2007 Codes and Rules for Description: History 2 University of California, Berkeley School of Information IS 245: Organization.
8/28/97Information Organization and Retrieval Metadata and Data Structures University of California, Berkeley School of Information Management and Systems.
William Y. Arms Corporation for National Research Initiatives March 22, 1999 Object models, overlay journals, and virtual collections.
RDF Kitty Turner. Current Situation there is hardly any metadata on the Web search engine sites do the equivalent of going through a library, reading.
1 CS 502: Computing Methods for Digital Libraries Lecture 17 Descriptive Metadata: Dublin Core.
10/24/2000Information Organization and Retrieval Information Structures and Metadata University of California, Berkeley School of Information Management.
The RDF meta model: a closer look Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations.
1 CS 502: Computing Methods for Digital Libraries Lecture 4 Identifiers and Reference Links.
1 CS 430: Information Discovery Lecture 15 Library Catalogs 3.
UKOLUG - July Metadata for the Web RDF and the Dublin Core Andy Powell UKOLN, University of Bath UKOLN.
Metadata and identifiers for e- journals Copenhagen Juha Hakala Helsinki University Library
1 Open-source platform for accessible content management Museo & Web CMS.
1 © Netskills Quality Internet Training, University of Newcastle Metadata Explained © Netskills, Quality Internet Training.
1 CS 502: Computing Methods for Digital Libraries Lecture 4 Text.
8/28/97Organization of Information in Collections Introduction to Description: Dublin Core and History University of California, Berkeley School of Information.
Andy Powell, Eduserv Foundation June 2006 Eprints Application Profile.
1 CS 430: Information Discovery Lecture 17 Library Catalogs 2.
1 CS 430: Information Discovery Lecture 14 Automatic Extraction of Metadata.
Metadata Xiangming Mu. What is metadata? What is metadata? (cont’) Data about data –Any data aids in the identification, description and location of.
Lecture Four: Steps 3 and 4 INST 250/4.  Does one look for facts, or opinions, or both when conducting a literature search?  What is the difference.
Metadata Considerations Implementing Administrative and Descriptive Metadata for your digital images 1.
Organizing Internet Resources OCLC’s Internet Cataloging Project -- funded by the Department of Education -- from October 1, 1994 to March 31, 1996.
1 CS/INFO 430 Information Retrieval Lecture 20 Metadata 2.
Meta Tagging / Metadata Lindsay Berard Assisted by: Li Li.
1 CS/INFO 430 Information Retrieval Lecture 16 Metadata 3.
1 CS 430: Information Discovery Lecture 6 Descriptive Metadata 2 Library Catalogs Dublin Core.
1 Metadata –Information about information – Different objects, different forms – e.g. Library catalogue record Property:Value: Author Ian Beardwell Publisher.
LIS654 lecture 5 DC metadata and omeka tables Thomas Krichel
Modularization and Interoperability: Dublin Core and the Warwick Framework Sandra D. Payette Digital Library Research Group Cornell University November.
1 Discussion Class 4 The Dublin Core Metadata Initiative.
Metadata and Documentation Iain Wallace Performing Arts Data Service.
1 CS/INFO 430 Information Retrieval Lecture 21 Metadata 3.
Metadata for the Web Andy Powell UKOLN University of Bath
BEN METADATA SPECIFICATION Isovera Consulting Feb
Evidence from Metadata INST 734 Doug Oard Module 8.
1 Dublin Core & DCMI – an introduction Some slides are from DCMI Training Resources at:
1 CS 430: Information Discovery Sample Midterm Examination Notes on the Solutions.
The DSpace Course Module – An introduction to metadata in DSpace.
1 CS 430: Information Discovery Lecture 5 Descriptive Metadata 1 Libraries Catalogs Dublin Core.
The RDF meta model Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations of XML compared.
Metadata Applications Marcia Lei Zeng NSDL All Project Meeting October, 2003.
Metadata and Meta tag. What is metadata? What does metadata do? Metadata schemes What is meta tag? Meta tag example Table of Content.
Differences and distinctions: metadata types and their uses Stephen Winch Information Architecture Officer, SLIC.
1 CS 430: Information Discovery Lecture 8 Collection-Level Metadata Vector Methods.
8/28/97Information Organization and Retrieval Introduction University of California, Berkeley School of Information Management and Systems SIMS 245: Organization.
1 CS 430: Information Discovery Lecture 26 Architecture of Information Retrieval Systems 1.
Describing resources II: Dublin Core CERN-UNESCO School on Digital Libraries Rabat, Nov 22-26, 2010 Annette Holtkamp CERN.
A centre of expertise in digital information management UKOLN is supported by: Metadata – what, why and how Ann Chapman.
1 CS 430: Information Discovery Lecture 21 Non-Textual Materials 1.
1 CS 430: Information Discovery Lecture 7 Automatic Generation of Catalog Records.
1 CS 430: Information Discovery Lecture 23 Non-Textual Materials.
Attributes and Values Describing Entities. Metadata At the most basic level, metadata is just another term for description, or information about an entity.
1 Metadata: an overview Alan Hopkinson ILRS Middlesex University.
CS 430: Information Discovery
CS 430: Information Discovery
Metadata Standards - Types
WHAT DOES THE FUTURE HOLD? Ann Ellis Dec. 18, 2000
Catherine Lai MUMT-611 MIR January 27, 2005
Attributes and Values Describing Entities.
Cataloging the Internet
Attributes and Values Describing Entities.
CS 430: Information Discovery
Presentation transcript:

1 CS 430: Information Discovery Lecture 7 Descriptive Metadata 3 Dublin Core Automatic Generation of Catalog Records

2 Course Administration Relationship between Library of Congress, OCLC and American Memory

3 Dublin Core elements 1. Title The name given to the resource by the creator or publisher. 2. Creator The person or organization primarily responsible for the intellectual content of the resource. For example, authors in the case of written documents, artists, photographers, or illustrators in the case of visual resources. 3. Subject The topic of the resource. Typically, subject will be expressed as keywords or phrases that describe the subject or content of the resource. The use of controlled vocabularies and formal classification schemes is encouraged.

4 Dublin Core elements 4. Description A textual description of the content of the resource, including abstracts in the case of document-like objects or content descriptions in the case of visual resources. 5. Publisher The entity responsible for making the resource available in its present form, such as a publishing house, a university department, or a corporate entity. 6. Contributor A person or organization not specified in a creator element who has made significant intellectual contributions to the resource but whose contribution is secondary to any person or organization specified in a creator element (for example, editor, transcriber, and illustrator).

5 Dublin Core elements 7. Date A date associated with the creation or availability of the resource. 8. Type The category of the resource, such as home page, novel, poem, working paper, preprint, technical report, essay, dictionary. 9. Format The data format of the resource, used to identify the software and possibly hardware that might be needed to display or operate the resource. 10. Identifier A string or number used to uniquely identify the resource. Examples for networked resources include URLs and URNs.

6 Dublin Core elements 11. Source Information about a second resource from which the present resource is derived. 12. Language The language of the intellectual content of the resource. 13. Relation An identifier of a second resource and its relationship to the present resource. This element permits links between related resources and resource descriptions to be indicated. Examples include an edition of a work (IsVersionOf), or a chapter of a book (IsPartOf).

7 Dublin Core elements 14. Coverage The spatial locations and temporal durations characteristic of the resource. 15. Rights A rights management statement, an identifier that links to a rights management statement, or an identifier that links to a service providing information about rights management for the resource.

8 Qualifiers Element qualifier Example: Date DC.Date -> Created: DC.Date -> Issued: DC.Date -> Available: / DC.Date -> Valid: /

9 Qualifiers Value qualifiers Example: Subject DC.Subject -> DDC: DC.Subject -> LCSH: Digital libraries-United States

10 Metadata about subjects (a) Classification (usually manual) Dewey Decimal Classification (DDC) political web site Library of Congress classification system (LCC) E840.8.G65political web site (b) Subject headings (usually manual) Keywords assigned from controlled vocabulary e.g., Medical Subject Headings (MeSH) Library of Congress subject headings (LCSH) Political campaigns - United States (c) Terms extracted from text (automatic) Automatic indexing [CS 430] Methods from computational linguistics [CS 374/474]

11 Dewey Decimal Classification Main classes: 000 Computers, information, & general reference 100 Philosophy & psychology 200 Religion 300 Social sciences 400 Language 500 Science 600 Technology 700 Arts & recreation 800 Literature 900 History & geography

12 Dewey Decimal Classification Hierarchy, e.g.: 600Technology (Applied sciences) 630Agriculture and related technologies 636Animal husbandry 636.7Dogs 636.8Cats Uses: Shelving collections of physical objects so that items on similar subjects are shelved together Crude subject access Scorpion project (OCLC): Automatic subject recognition and assignment of DDC classes

13

14

15 Limits of Dublin Core Complex objects Article within a journal A thumbnail of another image The March 28 final edition of a newspaper Complete object Sub-objects Metadata records

16 Flat v. linked records Flat record All information about an item is held in a single Dublin Core record, including information about related items convenient for access and preservation information is repeated -- maintenance problem Linked record Related information is held in separate records with a link from the item record less convenient for access and preservation information is stored once Compare with normal forms in relational databases

17

18 Dublin Core with qualifiers Digital Libraries and the Problem of Purpose David M. Levy Corporation for National Research Initiatives January 2000 article /january2000-levy English Copyright (c) David M. Levy

19 Dublin Core with flat record extension Continuation D-Lib Magazine

20 Events Version 1 New material Version 2 Should Version 2 have its own record or should extra information be added to the Version 2 record? How are these represented in Dublin Core?

21 Minimalist versus structuralist Minimalist 15 elements, no qualifiers, suitable for non-professionals encourage creators to provide metadata Structuralists 15 elements, qualifiers, RDF, detailed coding rules will require trained metadata experts [For an example of how complex Dublin Core can become, see the source of: htm#]

22 Dublin Core in many languages See: Thomas Baker, Languages for Dublin Core, D-Lib Magazine December 1998,

23 Dublin Core: Personal Opinion Dublin Core is a simple way to describe digital content that: is a single, self-contained object ("document-like") is static with time has few relationships Some web sites satisfy these criteria Dublin Core is not suitable for digital content that: is heavily structured changes dynamically

24 Automatic extraction of catalog data Example: Dublin Core records for web pages Strategies Manual by trained cataloguers - high quality records, but expensive and time consuming Entirely automatic - fast, almost zero cost, but poor quality Automatic followed by human editing - cost and quality depend on the amount of editing Manual collection level record, automatic item level record - moderate quality, moderate cost

25 DC-dot DC-dot is a Dublin Core metadata editor for web pages, created by Andy Powell at UKOLN DC-dot has two parts: (a) A skeleton Dublin Core record is created automatically from clues in the web page (b) A user interface is provided for cataloguers to edit the record

26

27 Automatic record for CS 430 home page DC-dot applied to continued on next slide

28 Automatic record for CS 430 home page (continued) DC-dot applied to

29 Observations on DC-dot applied to CS430 home page DC.Title is a copy of the html field DC.Publisher is the owner of the IP address where the page was stored DC.Subject is a list of headings and noun phrases presented for editing DC.Date is taken from the Last-Modified field in the http header DC.Type and DC.Format are taken from the MIME type of the http response DC.Identifier was supplied by the user as input

30

31 DC-dot applied to continued on next slide Automatic record for George W. Bush home page

32 DC-dot applied to Automatic record for George W. Bush home page (continued)

33 Observations on DC-dot applied to George W. Bush home page The home page has several meta tags: [The page has no html ] <META NAME="KEYWORDS" CONTENT="George W. Bush, Bush, George Bush, President, republican, 2000 election and more

34 Collection-level metadata Several of the most difficult fields to extract automatically are the same across all pages in a web site. Therefore create a collection record manually and combine it with automatic extraction of other fields at item level. For the CS 430 home page, collection-level metadata: See: Jenkins and Inman

35 Collection-level metadata Compare: (a) Metadata extracted automatically by DC-dot (b) Collection-level record (c) Combined item-level record (DC-dot plus collection-level) (d) Manual record

36

37 Metadata extracted automatically by DC-dot D.C. Field Qualifier Content title Digital Libraries and the Problem of Purpose subject not included in this slide publisher Corporation for National Research Initiatives date W3CDTF type DCMIType Text format text/html format bytes identifier

38 Collection-level record D.C. Field Qualifier Content publisher Corporation for National Research Initiatives type article type resource work relation rel-type InSerial relation serial-name D-Lib Magazine relation issn language English rights Permission is hereby given for the material in D-Lib Magazine to be used for...

39 Combined item-level record (DC-dot plus collection-level) D.C. Field Qualifier Content title Digital Libraries and the Problem of Purpose publisher (*) Corporation for National Research Initiatives date W3CDTF type (*) article type resource (*) work type DCMIType Text format text/html format bytes (*) indicates collection-level metadata continued on next slide

40 Combined item-level record (DC-dot plus collection-level) D.C. Field Qualifier Content relation rel-type (*) InSerial relation serial-name (*) D-Lib Magazine relation issn (*) language (*) English rights (*) Permission is hereby given for the material in D-Lib Magazine to be used for... identifier (*) indicates collection-level metadata

41 Manually created record D.C. Field Qualifier Content title Digital Libraries and the Problem of Purpose creator (+) David M. Levy publisher Corporation for National Research Initiatives date publication January 2000 type article type resource work (+) entry that is not in the automatically generated records continued on next slide

42 Manually created record D.C. Field Qualifier Content relation rel-type InSerial relation serial-name D-Lib Magazine relation issn relation volume (+) 6 relation issue (+) 1 identifier DOI (+) /january2000-levy identifier URL language English rights (+) Copyright (c) David M. Levy (+) entry that is not in the automatically generated records