1 CS 430: Information Discovery Lecture 7 Automatic Generation of Catalog Records.

Slides:



Advertisements
Similar presentations
Principles of Cataloguing & Classification: a basic introduction
Advertisements

Future of Cataloging RDA and other innovations pt.1a.
Module 5a: Authority Control and Encoding Schemes IMT530: Organization of Information Resources Winter 2007 Michael Crandall.
1 FINDING BOOKS ENGLISH 115 Hudson Valley Community College Marvin Library Learning Commons.
Alexandria Digital Library Project The ADEPT Bucket Framework.
Content and Bibliographic Theory CS 431 Architecture of Web Information Systems Carl Lagoze Cornell University Acks to H. Van de Sompel.
The key to Library resources How to unlock it. What is a shelf number and why is it important?  It is the number that appears on the spine of a book.
OCLC Online Computer Library Center What’s New in WebDewey & Abridged WebDewey Giles Martin Assistant Editor, DDC OCLC Saturday, January 15, 2005 ALA MW.
SLIDE 1IS 257 – Fall 2007 Codes and Rules for Description: History 2 University of California, Berkeley School of Information IS 245: Organization.
William Y. Arms Corporation for National Research Initiatives March 22, 1999 Object models, overlay journals, and virtual collections.
1 CS 502: Computing Methods for Digital Libraries Lecture 6 DTDs.
1 CS 502: Computing Methods for Digital Libraries Lecture 17 Descriptive Metadata: Dublin Core.
The Library Cataloging Tradition
1 Minerva The Web Preservation Project. 2 Team Members Library of Congress Roger Adkins Cassy Ammen Allene Hayes Melissa Levine Diane Kresh Jane Mandelbaum.
1 CS 502: Computing Methods for Digital Libraries Lecture 4 Identifiers and Reference Links.
LSTA Digital Imaging Grants Presentation Projects Workshop September 13, 2002 Wendy Sistrunk Music Catalog Librarian University of Missouri—Kansas City.
Online the Library Michaelmas Term 2011 Trinity College Library Dublin 1 1.
SIRSI Online Catalog WLAC Heldman Learing Resource Center.
1 CS 430: Information Discovery Lecture 15 Library Catalogs 3.
UKOLUG - July Metadata for the Web RDF and the Dublin Core Andy Powell UKOLN, University of Bath UKOLN.
Cornell CS Bibliographic Concepts CS 502 – Carl Lagoze – Cornell University Acks to H. Van de Sompel.
1 © Netskills Quality Internet Training, University of Newcastle Metadata Explained © Netskills, Quality Internet Training.
1 CS 502: Computing Methods for Digital Libraries Lecture 4 Text.
8/28/97Organization of Information in Collections Introduction to Description: Dublin Core and History University of California, Berkeley School of Information.
Andy Powell, Eduserv Foundation June 2006 Eprints Application Profile.
1 CS 430: Information Discovery Lecture 17 Library Catalogs 2.
1 CS 430: Information Discovery Lecture 14 Automatic Extraction of Metadata.
Lecture Four: Steps 3 and 4 INST 250/4.  Does one look for facts, or opinions, or both when conducting a literature search?  What is the difference.
Organizing Internet Resources OCLC’s Internet Cataloging Project -- funded by the Department of Education -- from October 1, 1994 to March 31, 1996.
The Library Cataloging Tradition Marty Kurth CS 431 February 9, 2005 [slides stolen from Diane Hillmann]
1 CS/INFO 430 Information Retrieval Lecture 20 Metadata 2.
Meta Tagging / Metadata Lindsay Berard Assisted by: Li Li.
1 CS/INFO 430 Information Retrieval Lecture 16 Metadata 3.
1 CS 430: Information Discovery Lecture 6 Descriptive Metadata 2 Library Catalogs Dublin Core.
Friday §Assigning Call Numbers “from scratch” §Wrap-up.
1 CS 430: Information Discovery Lecture 7 Descriptive Metadata 3 Dublin Core Automatic Generation of Catalog Records.
Current Events and Issues Using Index Databases for Finding Answers.
1 herbert van de sompel CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel
1 Metadata –Information about information – Different objects, different forms – e.g. Library catalogue record Property:Value: Author Ian Beardwell Publisher.
FINDING NON-FICTION BOOKS IN THE LIBRARY. How are non-fiction books organised? BY THEIR SUBJECT.
1 CS/INFO 430 Information Retrieval Lecture 21 Metadata 3.
APPLYING FRBR TO LIBRARY CATALOGUES A REVIEW OF EXISTING FRBRIZATION PROJECTS Martha M. Yee September 9, 2006 draft.
BEN METADATA SPECIFICATION Isovera Consulting Feb
Evidence from Metadata INST 734 Doug Oard Module 8.
ENGLISH 115 Finding Books Hudson Valley Community College Marvin Library Learning Commons 1.
The physical parts of a computer are called hardware.
1 CS 430: Information Discovery Sample Midterm Examination Notes on the Solutions.
Intellectual Works and their Manifestations Representation of Information Objects IR Systems & Information objects Spring January, 2006 Bharat.
1 CS 430: Information Discovery Lecture 5 Descriptive Metadata 1 Libraries Catalogs Dublin Core.
Introduction to Information Retrieval Example of information need in the context of the world wide web: “Find all documents containing information on computer.
1 CS 430: Information Discovery Lecture 6 Descriptive Metadata 2 Library Catalogs.
Public Library Survey FY 2015 SDC General Session December 08, 2015.
Differences and distinctions: metadata types and their uses Stephen Winch Information Architecture Officer, SLIC.
1 CS 430: Information Discovery Lecture 8 Collection-Level Metadata Vector Methods.
8/28/97Information Organization and Retrieval Introduction University of California, Berkeley School of Information Management and Systems SIMS 245: Organization.
1 CS 430: Information Discovery Lecture 26 Architecture of Information Retrieval Systems 1.
The ___ is a global network of computer networks Internet.
College of Education School of Continuing and Distance Education 2014/2015 – 2016/2017 INFS 112 Introduction to information management Session 10 – Information.
Online Information and Education Conference 2004, Bangkok Dr. Britta Woldering, German National Library Metadata development in The European Library.
Maya Sharsheeva, reference-librarian AUCA Library Effective information search in the Library e-Resources.
Some basic concepts Week 1 Lecture notes INF 384C: Organizing Information Spring 2016 Karen Wickett UT School of Information.
1 Midterm Examination. 2 General Observations Examination was too long! Most people submitted by .
Slides Template for Module 3 Contextual details needed to make data meaningful to others CC BY-NC.
CS 430: Information Discovery
CS 430: Information Discovery
Professional development training on cataloging at the University Wisconsin-Madison Memorial Library, USA 14th October -24th October, 2016 Aigerim Shurshenova.
WHAT DOES THE FUTURE HOLD? Ann Ellis Dec. 18, 2000
Attributes and Values Describing Entities.
Attributes and Values Describing Entities.
Classification & Cataloging
Presentation transcript:

1 CS 430: Information Discovery Lecture 7 Automatic Generation of Catalog Records

2 Course Administration Text books are now in the Campus Store. You will need a textbook for Wednesday's reading. Laptop computers - Everybody who has (a) signed up and (b) submitted an assignment should receive an about collecting a laptop - Laptop surveys will be handed out in class

3 Automatic extraction of catalog data Example: Dublin Core records for web pages Strategies Manual by trained cataloguers [See Lecture 6] - high quality records, but expensive and time consuming Entirely automatic - fast, almost zero cost, but poor quality Automatic followed by human editing - cost and quality depend on the amount of editing Manual collection level record, automatic item level record - moderate quality, moderate cost

4 DC-dot DC-dot is a Dublin Core metadata editor for web pages, created by Andy Powell at UKOLN DC-dot has two parts: (a) A skeleton Dublin Core record is created automatically from clues in the web page (b) A user interface is provided for cataloguers to edit the record

5

6 Automatic record for CS 430 home page DC-dot applied to continued on next slide

7 Automatic record for CS 430 home page (continued) DC-dot applied to

8 Observations on DC-dot applied to CS430 home page DC.Title is a copy of the html field DC.Publisher is the owner of the IP address where the page was stored DC.Subject is a list of headings and noun phrases presented for editing DC.Date is taken from the Last-Modified field in the http header DC.Type and DC.Format are taken from the MIME type of the http response DC.Identifier was supplied by the user as input

9

10 DC-dot applied to continued on next slide Automatic record for George W. Bush home page

11 DC-dot applied to Automatic record for George W. Bush home page (continued)

12 Observations on DC-dot applied to George W. Bush home page The home page has several meta tags: [The page has no html ] <META NAME="KEYWORDS" CONTENT="George W. Bush, Bush, George Bush, President, republican, 2000 election and more

13

14

15 Collection-level metadata Several of the most difficult fields to extract automatically are the same across all pages in a web site. Therefore create a collection record manually and combine it with automatic extraction of other fields at item level. For the CS 430 home page, collection-level metadata: See: Jenkins and Inman

16 Collection-level metadata (Example from Lecture 5) Compare: (a) Metadata extracted automatically by DC-dot (b) Collection-level record (c) Combined item-level record (DC-dot plus collection-level) (d) Manual record

17

18 Metadata extracted automatically by DC-dot D.C. Field Qualifier Content title Digital Libraries and the Problem of Purpose subject not included in this slide publisher Corporation for National Research Initiatives date W3CDTF type DCMIType Text format text/html format bytes identifier

19 Collection-level record D.C. Field Qualifier Content publisher Corporation for National Research Initiatives type article type resource work relation rel-type InSerial relation serial-name D-Lib Magazine relation issn language English rights Permission is hereby given for the material in D-Lib Magazine to be used for...

20 Combined item-level record (DC-dot plus collection-level) D.C. Field Qualifier Content title Digital Libraries and the Problem of Purpose publisher (*) Corporation for National Research Initiatives date W3CDTF type (*) article type resource (*) work type DCMIType Text format text/html format bytes (*) indicates collection-level metadata continued on next slide

21 Combined item-level record (DC-dot plus collection-level) D.C. Field Qualifier Content relation rel-type (*) InSerial relation serial-name (*) D-Lib Magazine relation issn (*) language (*) English rights (*) Permission is hereby given for the material in D-Lib Magazine to be used for... identifier (*) indicates collection-level metadata

22 Manually created record D.C. Field Qualifier Content title Digital Libraries and the Problem of Purpose creator (+) David M. Levy publisher Corporation for National Research Initiatives date publication January 2000 type article type resource work (+) entry that is not in the automatically generated records continued on next slide

23 Manually created record D.C. Field Qualifier Content relation rel-type InSerial relation serial-name D-Lib Magazine relation issn relation volume (+) 6 relation issue (+) 1 identifier DOI (+) /january2000-levy identifier URL language English rights (+) Copyright (c) David M. Levy (+) entry that is not in the automatically generated records

24 Metadata about subjects (a) Classification (usually manual) Dewey Decimal Classification (DDC) political web site Library of Congress classification system (LCC) E840.8.G65political web site (b) Subject headings (usually manual) Keywords assigned from controlled vocabulary e.g., Medical Subject Headings (MeSH) Library of Congress subject headings (LCSH) Political campaigns - United States (c) Terms extracted from text (automatic) Automatic indexing [CS 430] Methods from computational linguistics [CS 374/474]

25 Dewey Decimal Classification Main classes: 000 Computers, information, & general reference 100 Philosophy & psychology 200 Religion 300 Social sciences 400 Language 500 Science 600 Technology 700 Arts & recreation 800 Literature 900 History & geography

26 Dewey Decimal Classification Hierarchy, e.g.: 600Technology (Applied sciences) 630Agriculture and related technologies 636Animal husbandry 636.7Dogs 636.8Cats Uses: Shelving collections of physical objects so that items on similar subjects are shelved together Crude subject access Scorpion project (OCLC): Automatic subject recognition and assignment of DDC classes

27 IFLA Model Work A work is the underlying abstraction, e.g., The Iliad The Computer Science departmental web site Beethoven's Fifth Symphony Unix operating system The 1996 U.S. census This is roughly equivalent to the concept of "literary work" used in copyright law.

28 IFLA Model Expression. A work is realized through an expression, e.g., The Illiad has oral expressions and written expressions A musical work has score and performance(s). Software has source code and machine code Many works have only a single expression, e.g. a web page, or a book.

29 IFLA Model Manifestation. A expression is given form in one or more manifestations, e.g., The text of The Iliad has been manifest in numerous manuscripts and printed books. A musical performance can be distributed on CD, or broadcast on television. Software is manifest as files, which may be stored or transmitted in any digital medium.

30 IFLA Model Item. When many copies are made of a manifestation, each is a separate item, e.g., a specific copy of a book computer file [Works, expressions, manifestations and items are explored in CS 502, Computing Methods of Digital Libraries.]

31 Cataloguing Objectives Functions of catalogs: finding collocating (recall and precision) choosing acquiring navigating... among items in a bibliographic universe Compare use cases in software design.

32 Cataloguing Principles User convenience common usage Representation Sufficiency and necessity parsimony Avoid using one device to serve multiple functions (e.g., to disambiguate and order)