Chapter Two Preliminaries: Sorting out the ingredients How to Build a Digital Library Ian H. Witten and David Bainbridge.

Slides:



Advertisements
Similar presentations
1 of 18 Information Dissemination New Digital Opportunities IMARK Investing in Information for Development Information Dissemination New Digital Opportunities.
Advertisements

Module 5a: Authority Control and Encoding Schemes IMT530: Organization of Information Resources Winter 2007 Michael Crandall.
Internet Research Techniques Graham Seibert Copyright 2006 This is a segment of the draft version of a large syllabus. I need your feedback to improve.
University of Adelaide Library Life Impact The University of Adelaide The well connected catalogue Patricia Scott, Denise Tobin and Helen Attar.
Providing Online Access to the HKUST University Archives: EAD to INNOPAC Sintra Tsang and K.T. Lam The Hong Kong University of Science and Technology 7th.
Metadata: An Introduction By Wendy Duff October 13, 2001 ECURE.
WMES3103 : INFORMATION RETRIEVAL
Linking Electronic Reserves and Library Database Articles in Blackboard John Burke Gardner-Harvey Library or November 3, 2004.
1 of 6 This document is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS DOCUMENT. © 2007 Microsoft Corporation.
Introducing Symposia : “ The digital repository that thinks like a librarian”
1 of 5 This document is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS DOCUMENT. © 2007 Microsoft Corporation.
Greenstone Digital Library Usage and Implementation By: Paul Raymond A. Afroilan Network Applications Team Preginet, ASTI-DOST.
Software Development Unit 2 Databases What is a database? A collection of data organised in a manner that allows access, retrieval and use of that data.
Research Methods & Data AD140Brendan Rapple 2 March, 2005.
Document Delivery Formats for the Web and Legal Digital Collections Kevin Reiss June 18 th, 2004 Law Library Rutgers-Newark School of Law.
Website Content, Forms and Dynamic Web Pages. Electronic Portfolios Portfolio: – A collection of work that clearly illustrates effort, progress, knowledge,
Introduction to digital libraries How to Build a Digital Library Ian H. Witten and David Bainbridge.
Cornell CS Bibliographic Concepts CS 502 – Carl Lagoze – Cornell University Acks to H. Van de Sompel.
8/28/97Organization of Information in Collections Introduction to Description: Dublin Core and History University of California, Berkeley School of Information.
OCLC Online Computer Library Center CONTENTdm ® Digital Collection Management Software Ron Gardner, OCLC Digital Services Consultant ICOLC Meeting April.
Connecticut History Online A digital library? By Todd Vandenbark.
Planning a digital library How to Build a Digital Library Ian H. Witten and David Bainbridge.
Cataloguing Electronic resources Prepared by the Cataloguing Team at Charles Sturt University.
Section 4.1 Format HTML tags Identify HTML guidelines Section 4.2 Organize Web site files and folder Use a text editor Use HTML tags and attributes Create.
LIS 506 (Fall 2006) LIS 506 Information Technology Week 11: Digital Libraries & Institutional Repositories.
7/14/09. Robert L. Maxwell RDA Lecture Series National Library of South Africa 22 July /14/09 Cataloging: Still a Professional Asset to Become Excited.
Business Software What is database software? p. 145 Allows you to create, access, and manage data Add, change, delete, sort, and retrieve data Next.
Information Retrieval and Knowledge Organisation Knut Hinkelmann.
The Library Cataloging Tradition Marty Kurth CS 431 February 9, 2005 [slides stolen from Diane Hillmann]
OpenURL Link Resolvers 101
System Analysis and Design
Chapter One Orientation: The world of digital libraries How to Build a Digital Library Ian H. Witten and David Bainbridge.
Library needs and workflows Diane Boehr Head of Cataloging National Library of Medicine, NIH, DHHS
Discovering Computers Fundamentals Fifth Edition Chapter 9 Database Management.
Software Project Planning Defining the Project Writing the Software Specification Planning the Development Stages Testing the Software.
Planning a digital library How to Build a Digital Library Ian H. Witten and David Bainbridge.
1 herbert van de sompel CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel
Chapter 1 Review Chapter 2 Whatcha Gonna Do???
PACSCL Consortial Survey Initiative Group Training Session February 12, 2008 at The Historical Society of Pennsylvania.
ITGS Databases.
Document Solutions Document Solutions Confidential Property of FileMark Corporation Document Solutions Document Solutions July 2009 Repository for Submission.
RDA DAY 1 – part 2 web version 1. 2 When you catalog a “book” in hand: You are working with a FRBR Group 1 Item The bibliographic record you create will.
Information in the Digital Environment Information Seeking Models Dr. Dania Bilal IS 530 Spring 2005.
Alternative Architecture for Information in Digital Libraries Onno W. Purbo
The physical parts of a computer are called hardware.
EndNote: The Next Steps Rebecca Starkey Reference Librarian The Joseph Regenstein Library
Intellectual Works and their Manifestations Representation of Information Objects IR Systems & Information objects Spring January, 2006 Bharat.
UoS Libraries 2011 EndNote X5 - basic graduate session.
Collecting History: Profiles in Science Alexa T. McCray National Library of Medicine Bethesda, MD Stanford University August 21, 1999.
FRBR: Cataloging’s New Frontier Emily Dust Nimsakont Nebraska Library Commission NCompass Live December 15, 2010 Photo credit:
Introduction to Information Retrieval Example of information need in the context of the world wide web: “Find all documents containing information on computer.
ENDNOTE X7 ….. Bibliographies Made Easy RESEARCH SUPPORT DIVISION PERPUSTAKAAN SULTANAH ZANARIAH.
IN THE NAME OF GOD. Reference Citing Software.
IMT530- Organization of Information Resources1 Feedback Lectures –More practical examples –Like guest lecturers –Generally helpful in understanding concepts.
1/16/2016I. Revels Digital Imaging Workshop 1 Selection Considerations For Digital Imaging Projects.
Project Planning Defining the project Software specification Development stages Software testing.
Chapter Three Presentation: User interface How to Build a Digital Library Ian H. Witten and David Bainbridge.
Presenting Documents How to Build a Digital Library Ian H. Witten and David Bainbridge.
The ___ is a global network of computer networks Internet.
Digitizing Historical Newspapers South Carolina Digital Newspaper Program's participation with the Library of Congress' Chronicling America: Historic American.
Catalogs, MARC and other metadata Kathryn Lybarger March 25, 2009.
DIGITIZATION IN THEORY AND PRACTICE WEBSITE: Helen Nneka Okpala Presentation done at University of.
Some basic concepts Week 1 Lecture notes INF 384C: Organizing Information Spring 2016 Karen Wickett UT School of Information.
1 Midterm Examination. 2 General Observations Examination was too long! Most people submitted by .
Theory, Tools, History: A Brief Introduction August 17, 2016.
7th Annual Hong Kong Innovative Users Group Meeting
Digital Stewardship Curriculum
About SharePoint Server 2007 My Sites
WHAT DOES THE FUTURE HOLD? Ann Ellis Dec. 18, 2000
Computer Literacy BASICS: A Comprehensive Guide to IC3, 3rd Edition
Presentation transcript:

Chapter Two Preliminaries: Sorting out the ingredients How to Build a Digital Library Ian H. Witten and David Bainbridge

Planning a Digital Library Responsibilities Responsibilities Technology to be used Technology to be used Greenstone software Greenstone software Metadata Metadata Summary information Summary information Types of access Types of access Digitizing documents Digitizing documents Majority of the work Majority of the work

Responsibilities Legal Issues Legal Issues Distributing information carries responsibilities Distributing information carries responsibilities Copyright Copyright Social Issues Social Issues Respect customs of the community Respect customs of the community Both source and use communities Both source and use communities Ethical issues Ethical issues

Fundamental Questions What is the purpose of the library? What is the purpose of the library? What are the principles for including documents? What are the principles for including documents? When does one document differ from another? When does one document differ from another?

Sources of Material Existing library to be converted to digital form Existing library to be converted to digital form An existing collection of material to be made available as a digital library An existing collection of material to be made available as a digital library Material already existing on the Web to be organized and presented via a portal Material already existing on the Web to be organized and presented via a portal

Sources of Material Ideology Ideology Converting an existing library Converting an existing library Building a new collection Building a new collection Virtual libraries Virtual libraries

Ideology Ideology – a clear conception of what you plan to achieve with the collection of information Ideology – a clear conception of what you plan to achieve with the collection of information Ideology of a Collection: Ideology of a Collection: Purpose Purpose Objectives Objectives Principles Principles guide what is to be included in the collection

Introduction to Digital Library State the purpose of the collection State the purpose of the collection Describe how the collection is organized Describe how the collection is organized

Document versus Work Work Work The disembodied content of a message The disembodied content of a message Pure information Pure information Document Document Traditional library: a physical object that embodies the work Traditional library: a physical object that embodies the work Digital library: a particular electronic encoding of a work Digital library: a particular electronic encoding of a work How are distinctions made between different manifestations of a single work? How are distinctions made between different manifestations of a single work?

Converting an Existing Library Digitizing an existing paper-based collection is the most expensive kind of project Digitizing an existing paper-based collection is the most expensive kind of project Consider whether it is worth the effort and expense Consider whether it is worth the effort and expense

Advantages of Digital Libraries Easier to access remotely than conventional libraries Easier to access remotely than conventional libraries Powerful search and browsing Powerful search and browsing Easier to add additional services Easier to add additional services

Questions Will the digital library coexist with an existing physical one? Will the digital library coexist with an existing physical one? What is the collection’s growth rate? What is the collection’s growth rate? How dynamic is the collection? How dynamic is the collection? Should you consider outsourcing the whole digital library operation? Should you consider outsourcing the whole digital library operation? Could user needs be satisfied in alternative ways? Could user needs be satisfied in alternative ways?

Prioritizing Materials Special collections and unique materials Special collections and unique materials Rare books and manuscripts Rare books and manuscripts High use items High use items Research and teaching materials Research and teaching materials Low-use items Low-use items

Criteria for Digital Conversion Intellectual content Intellectual content Scholarly value Scholarly value Desire to enhance access to information Desire to enhance access to information Funding available Funding available Educational value Educational value Classroom support Classroom support Background reading Background reading Distance education Distance education Institutional Institutional Resource sharing Resource sharing Promote strengths of an institution Promote strengths of an institution Reduce handling of fragile originals Reduce handling of fragile originals Cost and space savings Cost and space savings Copyright Copyright

Principles for Development Utility Utility Local imperative Local imperative Novelty Novelty Intertextuality Intertextuality Resources Resources Commitment to the transition Commitment to the transition

Building a New Collection New material New material The copyright holder may be the best one to create a digital collection The copyright holder may be the best one to create a digital collection Metadata Metadata Where will it come from? Where will it come from?

Virtual Libraries A portal to information that is in electronic form but located elsewhere on the Internet A portal to information that is in electronic form but located elsewhere on the Internet Source information is already available Source information is already available Some metadata is available Some metadata is available

Virtual Libraries Select the content Select the content Define a purpose or theme for the library Define a purpose or theme for the library Seek and filter information Seek and filter information Focused Web crawling Focused Web crawling Obtain additional metadata Obtain additional metadata Aids in the organization of the collection Aids in the organization of the collection The higher the educational value of a resource, the more time should be taken in generating its description The higher the educational value of a resource, the more time should be taken in generating its description

Generating Metadata in a Virtual Library Automatically generated Automatically generated URL URL Author supplied metadata Author supplied metadata Keyword extraction Keyword extraction Manual review Manual review Edit and enrich the automatically generated metadata Edit and enrich the automatically generated metadata Intensive description by a human expert Intensive description by a human expert Provides extensive metadata Provides extensive metadata

Bibliographic Organization Objectives of a bibliographic system Objectives of a bibliographic system Bibliographic entities Bibliographic entities

Original Objectives of a Bibliographic System Finding Finding User seeks a known document when information such as author, title or subject is known User seeks a known document when information such as author, title or subject is known Collocation Collocation “To place together or in proper order” “To place together or in proper order” Locating similar information by subject matter, author, etc. Locating similar information by subject matter, author, etc. Choice Choice User must choose between similar documents User must choose between similar documents Bibliographically in terms of edition Bibliographically in terms of edition Topically in terms of character Topically in terms of character

Current Objectives of a Bibliographic System Locate Locate Find entities in a file or database as the result of a search using attributes or relationships of the entities Find entities in a file or database as the result of a search using attributes or relationships of the entities Identify Identify Confirm entity described in a record is the one sought Confirm entity described in a record is the one sought Select Select Verify that entity is what the user needs Verify that entity is what the user needs Acquire Acquire Obtain access through purchase, loan or online access Obtain access through purchase, loan or online access Navigate Navigate Go through a bibliographic database Go through a bibliographic database Find works related by generalization, association, aggregation Find works related by generalization, association, aggregation Find attributes related by equivalence, association and hierarchy Find attributes related by equivalence, association and hierarchy

Documents in Digital Libraries Document Document A particular electronic encoding of a work A particular electronic encoding of a work Can be easily duplicated Can be easily duplicated Uncertain boundaries Uncertain boundaries Digital libraries should present users with an image of stability and continuity Digital libraries should present users with an image of stability and continuity as though electronic documents were identifiable, discrete objects like physical ones as though electronic documents were identifiable, discrete objects like physical ones

Bibliographic Entities Documents Documents Works Works Distinction between document and work Distinction between document and work Editions Editions Electronic documents use terms such as version, release and revision Electronic documents use terms such as version, release and revision Authors Authors Authority control – standardized names for authors Authority control – standardized names for authors Titles Titles Attributes of works Attributes of works

Bibliographic Entities Subjects Subjects Two approaches to automatically assign subject: Two approaches to automatically assign subject: Key-phrase extraction Key-phrase extraction Key-phrase assignment Key-phrase assignment Literary and artistic works Literary and artistic works Style, form, content, genre Style, form, content, genre Library of Congress Subject Headings (LCSH) Library of Congress Subject Headings (LCSH) Controlled vocabularies: 30,000 pages, 2,000,000 entries Controlled vocabularies: 30,000 pages, 2,000,000 entries Hierarchical relationship of broader and narrower topics Hierarchical relationship of broader and narrower topics Subject classifications Subject classifications Traditional libraries have a linear arrangement Traditional libraries have a linear arrangement Digital collection can be rearranged at the click of a mouse Digital collection can be rearranged at the click of a mouse

Modes of Access Web Web Terminal in physical library Terminal in physical library Standalone computer with CD-ROM or DVD Standalone computer with CD-ROM or DVD Distributed System Distributed System Restricting Access Restricting Access Firewalls Firewalls Password protection Password protection Watermarking Watermarking

Digitizing Documents Digitization Digitization The process of taking traditional library materials and converting them to electronic form The process of taking traditional library materials and converting them to electronic form Allows storage and manipulation by a computer Allows storage and manipulation by a computer The process is time-consuming and expensive The process is time-consuming and expensive

Stages of Digitization Scanning Scanning Creates a digitized image of each page Creates a digitized image of each page Usually presented to the user Usually presented to the user Optical Character Recognition (OCR) Optical Character Recognition (OCR) Creates a digital representation of the textual content of the pages Creates a digital representation of the textual content of the pages Necessary for full-text indexing Necessary for full-text indexing Allows searching Allows searching

Digitizing Documents Scanning Scanning Optical character recognition Optical character recognition Interactive OCR Interactive OCR Page handling Page handling Planning an image digitization project Planning an image digitization project Inside an OCR shop Inside an OCR shop An example project An example project

Scanning Produces a digitized image of each page Produces a digitized image of each page Resembles digitized photograph Resembles digitized photograph

Decisions in Scanning Black-and-white, grayscale or color Black-and-white, grayscale or color Resolution Resolution number of pixels per linear unit number of pixels per linear unit Bits per pixel Bits per pixel Monochrome display: 16 or 256 levels of gray Monochrome display: 16 or 256 levels of gray Color display: up to 24 or 32 bpi Color display: up to 24 or 32 bpi Quality Quality Increases storage space and time to access Increases storage space and time to access

Optical Character Recognition Produces a character-by-character representation of the document Produces a character-by-character representation of the document Transforms the scanned image into a digitized representation of the page content Transforms the scanned image into a digitized representation of the page content Manual cleanup is necessary Manual cleanup is necessary Less efficient than manual keying when error rate drops below 95 percent Less efficient than manual keying when error rate drops below 95 percent

Interactive OCR Optical character recognition should be done as an interactive process Optical character recognition should be done as an interactive process Acquisition Acquisition Input from scanner or read a file Input from scanner or read a file Cleanup Cleanup Filtering, skewing and manual cleanup of unwanted areas Filtering, skewing and manual cleanup of unwanted areas Page analysis Page analysis Examine layout Examine layout Recognition Recognition The “OCR” part The “OCR” part Checking Checking Saving Saving Plain text, HTML, RTF, PDF, MS Word Plain text, HTML, RTF, PDF, MS Word

Page Handling Unbinding Unbinding Microfiche or microfilm Microfiche or microfilm Two most expensive parts Two most expensive parts Handling the paper Handling the paper OCR OCR

Planning a Digitization Project Outsourcing Outsourcing Cost Cost $1 to $2 for scanning and OCR $1 to $2 for scanning and OCR Quality control Quality control Verification Verification