Chapter Three Presentation: User interface How to Build a Digital Library Ian H. Witten and David Bainbridge.

Slides:



Advertisements
Similar presentations
History Study Center Primary and secondary sources documenting global history 2010.
Advertisements

R2 Library Features and Functionality Overview. The R2 Library  The R2 Library is an electronic database that enables access to digital book content.
OnlineBooks and Blackwell Reference Online Nigel Thompson Account Development Manager.
Introduction to metadata for IDAH fellows Jenn Riley Metadata Librarian Digital Library Program.
An introduction to Cambridge Collections Online… Full online access to collections of classic and newly- published scholarly titles in PDF format Contains.
April 2001Division of Library Services IDEAL® is a collection of full text journal titles. Includes 173 journal titles from Academic Press. Abstracts and.
Web- and Multimedia-based Information Systems. Assessment Presentation Programming Assignment.
June 12, 2015 ©2005 Ovid Technologies Jörn Hope Ovid.
Properties of Text CS336 Lecture 3:. 2 Information Retrieval Searching unstructured documents Typically text –Newspaper articles –Web pages Other documents.
Introducing Symposia : “ The digital repository that thinks like a librarian”
Information Access Douglas W. Oard College of Information Studies and Institute for Advanced Computer Studies Design Understanding.
Modern Information Retrieval Chapter 1 Introduction.
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
Getting Started Universal Navigation –Conveniently located at the top right of every page Quick Search –Found at the top left of every page Tools –Print.
WMES3103 : INFORMATION RETRIEVAL INDEXING AND SEARCHING.
Overview of Search Engines
Making sense of the data jumble Trinity College Library Dublin’s Discovery Solution Experience Arlene Healy & Charles Montague Digital Systems and Services.
Wiley Online Library. About Wiley Online Library Wiley Online Library hosts the world's broadest and deepest multidisciplinary collection of online resources.
Databases & Data Warehouses Chapter 3 Database Processing.
MyiLibrary® ‘Search & View’ Website Training June 8, 2010.
Introduction n Keyword-based query answering considers that the documents are flat i.e., a word in the title has the same weight as a word in the body.
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
Planning a digital library How to Build a Digital Library Ian H. Witten and David Bainbridge.
Multimedia Digital Library Marcia Johnson. Collection 25 text documents 25 text documents In HTML, PDF, TXT formats (source: Project Gutenberg) In HTML,
Welcome to Cambridge Histories Online This unique historical reference compendium allows instant access to the renowned texts of the Cambridge Histories.
Metadata Xiangming Mu. What is metadata? What is metadata? (cont’) Data about data –Any data aids in the identification, description and location of.
LIS 506 (Fall 2006) LIS 506 Information Technology Week 11: Digital Libraries & Institutional Repositories.
Support.ebsco.com EBSCOhost Basic Searching for Academic Libraries Tutorial.
Support.ebsco.com Basic Searching for K-12 School Libraries Tutorial.
Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009.
University of North Texas Libraries Building Search Systems for Digital Library Collections Mark E. Phillips Texas Conference on Digital Libraries May.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
NCSU Libraries Kristin Antelman NCSU Libraries June 24, 2006.
1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)
Document management (aka ‘digital libraries’) The Greenstone Group: Professor Ian Witten (leader); David Bainbridge, Dave Nichols, S.J. Cunningham, Steve.
Planning a digital library How to Build a Digital Library Ian H. Witten and David Bainbridge.
Chapter 6: Information Retrieval and Web Search
Information Architecture Navigation. Goals 1. Organization systems 2. Navigation: Conventions 3. Login & Forms Task | Dreamweaver 4. Client Project 2.
Markup and Metadata How to Build a Digital Library Ian H. Witten and David Bainbridge.
Introduction to metadata
Heuristic Approach for Automatic Metadata Capture of E- books/Journals ARD Prasad DRTC Indian Statistical Institute Bangalore.
How do I find works in the Repository?. University of Texas Libraries UT DR Digital Repository Search in the Repository Keyword search from the Repository.
PAN-European Exploitation of the Results of the Libraries Programme - EXPLOIT German Libraries Institute Berlin EXPLOIT 1 Electronic library materials.
Welcome to de Gruyter Reference Global. De Gruyter Reference Global provides you with comprehensive access to high quality academic content Run a quick.
The Indexer’s Legacy: Promoting Access to a Million Books Michael Huggett Edie Rasmussen ICDL 2010.
Introduction to Information Retrieval Example of information need in the context of the world wide web: “Find all documents containing information on computer.
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
Information Retrieval
Metadata and Meta tag. What is metadata? What does metadata do? Metadata schemes What is meta tag? Meta tag example Table of Content.
Welcome to Cambridge Histories Online This unique historical reference compendium allows instant access to the renowned texts of the Cambridge Histories.
1 CS 430: Information Discovery Lecture 8 Collection-Level Metadata Vector Methods.
Introduction to metadata for IDAH fellows Jenn Riley Metadata Librarian Digital Library Program.
Presenting Documents How to Build a Digital Library Ian H. Witten and David Bainbridge.
Using OpenRefine in Digital Collections: the Spencer Sheet Music Project Bruce J. Evans Cataloging & Metadata Unit Leader/Music and Fine Arts Catalog Librarian.
Digitizing Historical Newspapers South Carolina Digital Newspaper Program's participation with the Library of Congress' Chronicling America: Historic American.
Largest Academic Social Science and Humanities Reference Resource Online Authoritative - written by the leading experts in the field. Comprehensive - full.
Maya Sharsheeva, reference-librarian AUCA Library Effective information search in the Library e-Resources.
Some basic concepts Week 1 Lecture notes INF 384C: Organizing Information Spring 2016 Karen Wickett UT School of Information.
Slides Template for Module 3 Contextual details needed to make data meaningful to others CC BY-NC.
Professional development training on cataloging at the University Wisconsin-Madison Memorial Library, USA 14th October -24th October, 2016 Aigerim Shurshenova.
Summon discovers contents from one search box!
Building Search Systems for Digital Library Collections
CS 430: Information Discovery
Thanks to Bill Arms, Marti Hearst
Metadata to fit your needs... How much is too much?
IL Step 3: Using Bibliographic Databases
Introduction to Information Retrieval
Getting Started Universal Navigation Quick Search Tools
Aggregating Online Resources: Grolier Online as an Educational Portal
Getting Started Universal Navigation Quick Search Tools
Presentation transcript:

Chapter Three Presentation: User interface How to Build a Digital Library Ian H. Witten and David Bainbridge

Questions What form are the documents in? What form are the documents in? What structure do the documents have? What structure do the documents have? Which kinds of access do you want to provide? Which kinds of access do you want to provide? What metadata is available? What metadata is available? How do you want to present the documents? How do you want to present the documents?

Presenting Documents Structured documents (hierarchy) Structured documents (hierarchy) Unstructured text documents Unstructured text documents Page images Page images Page images and extracted text Page images and extracted text Audio and photographic images Audio and photographic images Video Video Music Music Foreign Language Foreign Language

Hierarchically Structured Text Table of contents Table of contents Chapter, section, subsection, etc. Chapter, section, subsection, etc. Granularity of document? Granularity of document? Example: Humanity Development Library Example: Humanity Development LibraryHumanity Development LibraryHumanity Development Library

Unstructured Text Long scroll of plain text Long scroll of plain text Structure unknown to the digital library system Structure unknown to the digital library system Browsing is less convenient Browsing is less convenient Pages of document may not correspond to physical pages of book Pages of document may not correspond to physical pages of book Example: Project Gutenberg Collection Example: Project Gutenberg CollectionProject Gutenberg CollectionProject Gutenberg Collection

Page Images Digitized images of the document’s pages Digitized images of the document’s pages Document accuracy Document accuracy OCR is error-prone OCR is error-prone Duplicating layout is difficult Duplicating layout is difficult Space requirements Space requirements Requires 20 times more storage space than text Requires 20 times more storage space than text Increased download time Increased download time Need for text representation for searching Need for text representation for searching Difficult to highlight search terms on an image Difficult to highlight search terms on an image

Page Images and Extracted Text Provide page images and extracted text Provide page images and extracted text Search on extracted text Search on extracted text View image or extracted text View image or extracted text Example: Maori Newspaper Collection Example: Maori Newspaper CollectionMaori Newspaper CollectionMaori Newspaper Collection

Other Document Types Audio and photographic images Audio and photographic images Example: Oral History Collection Example: Oral History Collection Example: Oral History Collection Example: Oral History Collection Video Video Example: Music Video Collection Example: Music Video CollectionMusic Video CollectionMusic Video Collection Music Music Representations: printed notation, MIDI, synthesized performance, human performance Representations: printed notation, MIDI, synthesized performance, human performance Example: Music Digital Library Example: Music Digital LibraryMusic Digital LibraryMusic Digital Library Multiple Languages Multiple Languages Interface and/or documents Interface and/or documents Example: Arabic Collection Example: Arabic CollectionArabic CollectionArabic Collection

Metadata Provides information to facilitate access Provides information to facilitate access Structured Structured Standardized Standardized

Metadata Examples Conventional bibliographic listing Conventional bibliographic listing Title Title Author Author Date Date Publication Publication Volume Number Volume Number Issue Number Issue Number Page Numbers Page Numbers MARC MARC Dublin Core Dublin Core

Metadata Aspects Historical Historical Describes provenance and preservation history Describes provenance and preservation history Functional Functional Describes usage, condition and audience Describes usage, condition and audience Technical Technical Describes interoperability requirements Describes interoperability requirements Relational Relational Describes links and citations Describes links and citations Intellectual Intellectual Describes content or subject Describes content or subject

Searching Types of query Types of query Case-folding and stemming Case-folding and stemming Phrase searching Phrase searching Different query interfaces Different query interfaces

Types of Queries Boolean Queries Boolean Queries Combine terms with AND, OR, and NOT Combine terms with AND, OR, and NOT Exact match Exact match Ranked Queries Ranked Queries List of terms to find List of terms to find Inexact match Inexact match Relevance ranking by some heuristic measure Relevance ranking by some heuristic measure

Case Folding and Stemming Case folding Case folding Upper case folded to lower case Upper case folded to lower case Not relevant to some languages Not relevant to some languages Stemming Stemming Reducing a word to its root form Reducing a word to its root form Morphological reduction Morphological reduction Not appropriate for all parts of documents Not appropriate for all parts of documents Language dependent Language dependent

Phrase Searching Searching for a contiguous group of words Searching for a contiguous group of words Two types of phrase searching: Two types of phrase searching: Postretrieval scan Postretrieval scan Determine if terms are consecutive by looking inside documents containing query terms Determine if terms are consecutive by looking inside documents containing query terms Smaller index, slower Smaller index, slower Proximity searching is more difficult Proximity searching is more difficult Word-level index Word-level index Index contains word number and document number Index contains word number and document number Determines if terms are consecutive by comparing indexes Determines if terms are consecutive by comparing indexes Larger index, faster Larger index, faster Phrases containing punctuation and white space? Phrases containing punctuation and white space?

Different Query Interfaces Ranked or boolean Ranked or boolean Fielded or non-fielded Fielded or non-fielded Case-folding and/or stemming Case-folding and/or stemming Ranked or natural order result list Ranked or natural order result list Use search history or not Use search history or not

Browsing Based on metadata Based on metadata Browsing alphabetical lists Browsing alphabetical lists Chinese is not alphabetic Chinese is not alphabetic Browsing by date Browsing by date Browsing structures Browsing structures Hierarchical classification structures Hierarchical classification structures

Phrase Browsing Phrase: any sequence of words appearing more than once in the collection Phrase: any sequence of words appearing more than once in the collection Automatic phrase extraction Automatic phrase extraction Key phrases Key phrases Phrase browser Phrase browser Phrase hierarchy Phrase hierarchy Sorted by document and collection frequencies Sorted by document and collection frequencies Leaves are documents Leaves are documents Example: The Complete Works of Shakespeare Example: The Complete Works of ShakespeareThe Complete Works of ShakespeareThe Complete Works of Shakespeare

Browsing Using Extracted Metadata Acronyms Acronyms Example: Acronym Extraction Demo Example: Acronym Extraction DemoAcronym Extraction DemoAcronym Extraction Demo Language identification Language identification Example: Language Extraction Demo Example: Language Extraction DemoLanguage Extraction DemoLanguage Extraction Demo