Presenting Documents How to Build a Digital Library Ian H. Witten and David Bainbridge.

Slides:



Advertisements
Similar presentations
ELibrary Curriculum Edition (CE) The ultimate K-12 curriculum and reference solution 2008.
Advertisements

ELIBRARY CURRICULUM EDITION The ultimate K-12 curriculum and reference solution.
Publishers Web Sites Standard Features. Objectives Access publishers websites Identify general features available on most publishers websites Know how.
History Study Center Primary and secondary sources documenting global history 2010.
Putting together a METS profile. Questions to ask when setting down the METS path Should you design your own profile? Should you use someone elses off.
Opening Up Worldwide Access to Key BC Historical Documents: BC Historical Newspapers Mike Conroy, Community Digital Projects Analyst UBC Library.
R2 Library Features and Functionality Overview. The R2 Library  The R2 Library is an electronic database that enables access to digital book content.
Cambridge University Press Our digital platforms for titles published by Cambridge University Press and our Partner Presses.
OnlineBooks and Blackwell Reference Online Nigel Thompson Account Development Manager.
Introduction to metadata for IDAH fellows Jenn Riley Metadata Librarian Digital Library Program.
An introduction to Cambridge Collections Online… Full online access to collections of classic and newly- published scholarly titles in PDF format Contains.
April 2001Division of Library Services IDEAL® is a collection of full text journal titles. Includes 173 journal titles from Academic Press. Abstracts and.
Web- and Multimedia-based Information Systems. Assessment Presentation Programming Assignment.
June 12, 2015 ©2005 Ovid Technologies Jörn Hope Ovid.
Introducing Symposia : “ The digital repository that thinks like a librarian”
Modern Information Retrieval Chapter 1 Introduction.
Getting Started Universal Navigation –Conveniently located at the top right of every page Quick Search –Found at the top left of every page Tools –Print.
Wiley Online Library. About Wiley Online Library Wiley Online Library hosts the world's broadest and deepest multidisciplinary collection of online resources.
A METS Application Profile for Historical Newspapers
By Carrie Moran. To examine the Metadata Object Description Schema (MODS) metadata scheme to determine its utility based on structure, interoperability.
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
Tutorial: Work with your Results and Citations. Journal articles are accessible in HTML and PDF formats. Click on an article’s title or the HTML symbol.
Planning a digital library How to Build a Digital Library Ian H. Witten and David Bainbridge.
Multimedia Digital Library Marcia Johnson. Collection 25 text documents 25 text documents In HTML, PDF, TXT formats (source: Project Gutenberg) In HTML,
Welcome to Cambridge Histories Online This unique historical reference compendium allows instant access to the renowned texts of the Cambridge Histories.
Metadata Xiangming Mu. What is metadata? What is metadata? (cont’) Data about data –Any data aids in the identification, description and location of.
1 XML as a preservation strategy Experiences with the DiVA document format Eva Müller, Uwe Klosa Electronic Publishing Centre Uppsala University Library,
LIS 506 (Fall 2006) LIS 506 Information Technology Week 11: Digital Libraries & Institutional Repositories.
Metadata Considerations Implementing Administrative and Descriptive Metadata for your digital images 1.
NCSU Libraries Kristin Antelman NCSU Libraries June 24, 2006.
1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)
Document management (aka ‘digital libraries’) The Greenstone Group: Professor Ian Witten (leader); David Bainbridge, Dave Nichols, S.J. Cunningham, Steve.
Planning a digital library How to Build a Digital Library Ian H. Witten and David Bainbridge.
Chapter Five Markup and Metadata: Elements of Organization How to Build a Digital Library Ian H. Witten and David Bainbridge.
[D2.5] Object model and metadata: Open issues Workgroups Kick-off meeting – 2 & 3 April 2009 Julie Verleyen.
Markup and Metadata How to Build a Digital Library Ian H. Witten and David Bainbridge.
Introduction to metadata
Linked Data by Dr. Barbara B. Tillett Chief, Policy and Standards Division Library of Congress For Texas Library Association Conference April 12, 2011.
Best Practices for Digital Imaging and Metadata Roy Tennant The Library, University of California, Berkeley
How do I find works in the Repository?. University of Texas Libraries UT DR Digital Repository Search in the Repository Keyword search from the Repository.
PAN-European Exploitation of the Results of the Libraries Programme - EXPLOIT German Libraries Institute Berlin EXPLOIT 1 Electronic library materials.
Accessibility : Designing the Interface and Navigation The Non-Designer’s Web Book Chapter 7 Robin Williams and John Tollett Presented by Sherie Loika.
Welcome to de Gruyter Reference Global. De Gruyter Reference Global provides you with comprehensive access to high quality academic content Run a quick.
The Indexer’s Legacy: Promoting Access to a Million Books Michael Huggett Edie Rasmussen ICDL 2010.
Ebooks? John Akeroyd Milano March 7 th Ebook Readers.
Introduction to Information Retrieval Example of information need in the context of the world wide web: “Find all documents containing information on computer.
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
Information Retrieval
Metadata and Meta tag. What is metadata? What does metadata do? Metadata schemes What is meta tag? Meta tag example Table of Content.
Welcome to Cambridge Histories Online This unique historical reference compendium allows instant access to the renowned texts of the Cambridge Histories.
HathiTrust: Possibilities Metadata Working Group Cornell University Library March 21, 2014.
Subject Description LIS 571 The Organization and Control of Recorded Information.
Chapter Three Presentation: User interface How to Build a Digital Library Ian H. Witten and David Bainbridge.
Introduction to metadata for IDAH fellows Jenn Riley Metadata Librarian Digital Library Program.
Digitizing Historical Newspapers South Carolina Digital Newspaper Program's participation with the Library of Congress' Chronicling America: Historic American.
Largest Academic Social Science and Humanities Reference Resource Online Authoritative - written by the leading experts in the field. Comprehensive - full.
Global Rangelands Data Entry Guidelines March 23, 2015.
Maya Sharsheeva, reference-librarian AUCA Library Effective information search in the Library e-Resources.
Some basic concepts Week 1 Lecture notes INF 384C: Organizing Information Spring 2016 Karen Wickett UT School of Information.
Slides Template for Module 3 Contextual details needed to make data meaningful to others CC BY-NC.
Newly updated World eBooks
Professional development training on cataloging at the University Wisconsin-Madison Memorial Library, USA 14th October -24th October, 2016 Aigerim Shurshenova.
Summon discovers contents from one search box!
DIGITAL ARCHIVES Into the Light
Cataloging the Internet
Metadata to fit your needs... How much is too much?
Introduction to Information Retrieval
Getting Started Universal Navigation Quick Search Tools
Aggregating Online Resources: Grolier Online as an Educational Portal
Getting Started Universal Navigation Quick Search Tools
Presentation transcript:

Presenting Documents How to Build a Digital Library Ian H. Witten and David Bainbridge

Questions What form are the documents in? What form are the documents in? What structure do the documents have? What structure do the documents have? Which kinds of access do you want to provide? Which kinds of access do you want to provide? What metadata is available? What metadata is available? How do you want to present the documents? How do you want to present the documents?

Presenting Documents Structured documents (hierarchy) Structured documents (hierarchy) Unstructured text documents Unstructured text documents Page images Page images Page images and extracted text Page images and extracted text Audio and photographic images Audio and photographic images Video Video Music Music Foreign Language Foreign Language

Hierarchically Structured Text Table of contents Table of contents Chapter, section, subsection, etc. Chapter, section, subsection, etc. Granularity of document? Granularity of document? Example: Humanity Development Library Example: Humanity Development LibraryHumanity Development LibraryHumanity Development Library

Unstructured Text Long scroll of plain text Long scroll of plain text Structure unknown to the digital library system Structure unknown to the digital library system Browsing is less convenient Browsing is less convenient Pages of document may not correspond to physical pages of book Pages of document may not correspond to physical pages of book Example: Project Gutenberg Collection Example: Project Gutenberg CollectionProject Gutenberg CollectionProject Gutenberg Collection

Page Images Digitized images of the document’s pages Digitized images of the document’s pages Document accuracy Document accuracy OCR is error-prone OCR is error-prone Duplicating layout is difficult Duplicating layout is difficult Space requirements Space requirements Requires 20 times more storage space than text Requires 20 times more storage space than text Increased download time Increased download time Need for text representation for searching Need for text representation for searching Difficult to highlight search terms on an image Difficult to highlight search terms on an image

Page Images and Extracted Text Provide page images and extracted text Provide page images and extracted text Search on extracted text Search on extracted text View image or extracted text View image or extracted text Example: Maori Newspaper Collection Example: Maori Newspaper CollectionMaori Newspaper CollectionMaori Newspaper Collection

Other Document Types Audio and photographic images Audio and photographic images Example: Oral History Collection Example: Oral History Collection Example: Oral History Collection Example: Oral History Collection Video Video Example: Music Video Collection Example: Music Video CollectionMusic Video CollectionMusic Video Collection Music Music Representations: printed notation, MIDI, synthesized performance, human performance Representations: printed notation, MIDI, synthesized performance, human performance Example: Music Digital Library Example: Music Digital LibraryMusic Digital LibraryMusic Digital Library Multiple Languages Multiple Languages Interface and/or documents Interface and/or documents Example: Arabic Collection Example: Arabic CollectionArabic CollectionArabic Collection

Metadata Provides information to facilitate access Provides information to facilitate access Structured Structured Standardized Standardized

Metadata Examples Conventional bibliographic listing Conventional bibliographic listing Title Title Author Author Date Date Publication Publication Volume Number Volume Number Issue Number Issue Number Page Numbers Page Numbers MARC MARC Dublin Core Dublin Core METS METS

Metadata Aspects Historical Historical Describes provenance and preservation history Describes provenance and preservation history Functional Functional Describes usage, condition and audience Describes usage, condition and audience Technical Technical Describes interoperability requirements Describes interoperability requirements Relational Relational Describes links and citations Describes links and citations Intellectual Intellectual Describes content or subject Describes content or subject

Searching Types of query Types of query Boolean Boolean Ranked Ranked Case-folding and stemming Case-folding and stemming Phrase searching Phrase searching

Browsing Based on metadata Based on metadata Browsing alphabetical lists Browsing alphabetical lists Chinese is not alphabetic Chinese is not alphabetic Browsing by date Browsing by date Browsing structures Browsing structures Hierarchical classification structures Hierarchical classification structures

Phrase Browsing Phrase: any sequence of words appearing more than once in the collection Phrase: any sequence of words appearing more than once in the collection Automatic phrase extraction Automatic phrase extraction Key phrases Key phrases Phrase browser Phrase browser Phrase hierarchy Phrase hierarchy Sorted by document and collection frequencies Sorted by document and collection frequencies Leaves are documents Leaves are documents Example: The Complete Works of Shakespeare Example: The Complete Works of ShakespeareThe Complete Works of ShakespeareThe Complete Works of Shakespeare

Browsing Using Extracted Metadata Acronyms Acronyms Example: Acronym Extraction Demo Example: Acronym Extraction DemoAcronym Extraction DemoAcronym Extraction Demo Language identification Language identification Example: Language Extraction Demo Example: Language Extraction DemoLanguage Extraction DemoLanguage Extraction Demo