Building Collections Using Greenstone Tod A. Olson Sr. Programmer/Analyst Digital Library Development Center University of Chicago Library

Slides:



Advertisements
Similar presentations
IRRA DSpace April 2006 Claire Knowles University of Edinburgh.
Advertisements

Possibility in Digital Collection Management Introduction to CONTENTdm TM Hitoshi Kamada University of Arizona Presentation for OCLC-CJK Users Group Annual.
1 Metadata Tools for JISC Digitisation Projects of still images and text Ed Fay BOPCRIS, Hartley Library University of Southampton.
Putting together a METS profile. Questions to ask when setting down the METS path Should you design your own profile? Should you use someone elses off.
October 28, 2003Copyright MIT, 2003 METS repositories: DSpace MacKenzie Smith Associate Director for Technology MIT Libraries.
Standards showcase: MODS, METS, MARCXML ALA Annual 2006 Rebecca Guenther and Jackie Radebaugh Network Development and MARC Standards Office Library of.
Digital Libraries with Greenstone: an open source solution Tod Olson - University of Chicago Fred Miller - Illinois Wesleyan University Curtis Kelch -
The Documentum Team Lance Callaway, Brooke Durbin, Perry Koob, Lorie McMillin, Jennifer Song Missouri University of Science and Technology Rolla, Missouri.
ELPUB 2006 June Bansko Bulgaria1 Automated Building of OAI Compliant Repository from Legacy Collection Kurt Maly Department of Computer.
Joachim Bauer Senior System Engineer, CCS
Information Retrieval in Practice
Building Chopin Early Editions Tod A. Olson Graduate School of Library and Information Science University of Illinois at Urbana Champaign University of.
Ingest and Loading DigiTool Version 3.0. Ingest and Loading 2 Ingest Agenda Ingest Overview and Introduction Ingest activity steps Transformers Task Chains.
WMS: Democratizing Data
UCLA Digital Library UC Digital Library Forum August 5, 2002 UCLA Digital Library Presenter: Curtis Fornadley Senior Programmer/Analyst.
Introducing Symposia : “ The digital repository that thinks like a librarian”
Greenstone Digital Library Usage and Implementation By: Paul Raymond A. Afroilan Network Applications Team Preginet, ASTI-DOST.
Incompatible or Interoperable? A METS bridge for a small gap between two digital preservation software packages Lucas Mak Metadata & CatalogLibrarian
Overview of Search Engines
National Aeronautics and Space Administration Implementing DSpace at NASA Langley Research Center 1 Greta Lowe Librarian NASA Langley Research Center
Metadata: Its Functions in Knowledge Representation for Digital Collections 1 Summary.
OCLC Online Computer Library Center OCLC’s Digital Archive – Disseminating with METS Jay Goodkin Software Engineer Digital Collection and Preservation.
Digital Library Architecture and Technology
New Partnerships for Smarter Data Discovery, eBooks and Digital Asset Management Thailand IUG 2012 – Mahidol University.
Aurora: A Conceptual Model for Web-content Adaptation to Support the Universal Accessibility of Web-based Services Anita W. Huang, Neel Sundaresan Presented.
METS-Based Cataloging Toolkit for Digital Library Management System Dong, Li Tsinghua University Library
Dspace 1 Introduction to DSpace Mukesh Pund Scientist NISCAIR, New Delhi.
OCLC Online Computer Library Center CONTENTdm ® Digital Collection Management Software Ron Gardner, OCLC Digital Services Consultant ICOLC Meeting April.
MAHI Research Database Data Validation System Software Prototype Demonstration September 18, 2001
Copyright 2006, The Ohio State University Mary Manning Eric Schnell Using Greenstone Open-Source Digital Library Software at a Cultural Heritage Institution.
“Old Style” Libraries, Digital Libraries: Convergences, Divergences, And the Troubles in Between.
1 The Universal Object Format - A METS Profile for an archiving and exchange format for digital objects.
Multimedia Digital Library Marcia Johnson. Collection 25 text documents 25 text documents In HTML, PDF, TXT formats (source: Project Gutenberg) In HTML,
1 XML as a preservation strategy Experiences with the DiVA document format Eva Müller, Uwe Klosa Electronic Publishing Centre Uppsala University Library,
The Metadata Object Description Schema (MODS) NISO Metadata Workshop May 20, 2004 Rebecca Guenther Network Development and MARC Standards Office Library.
IUScholarWorks is a set of services to make the work of IU scholars freely available. Allows IU departments, institutes, centers and research units to.
ALCME: OAI at OCLC Jeffrey A. Young OCLC Online Computer Library Center, Inc.
National Park Service U.S. Department of the Interior Resource Information Management Division National Information Systems Center Office of the Chief.
Indo-US Workshop, June23-25, 2003 Building Digital Libraries for Communities using Kepler Framework M. Zubair Old Dominion University.
The DiVA System: Current Status and Ongoing Development Uwe Klosa Electronic Publishing Centre, Uppsala University, Sweden Eva Müller.
An Introduction to METS Morgan Cundiff Network Development and MARC Standards Office Library of Congress Metadata Encoding and Transmission Standard.
Overview of IU Digital Collections Search Hui Zhang Jon Dunn Indiana University Digital Library Program IU Digital Library Brown Bag October 19, 2011.
Document management (aka ‘digital libraries’) The Greenstone Group: Professor Ian Witten (leader); David Bainbridge, Dave Nichols, S.J. Cunningham, Steve.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
METS at UC Berkeley Generating METS Objects. Background Kinds of materials: –primarily imaged content & tei encoded content archival materials: manuscripts.
Northwestern University Transportation Library Menu Collection.
CBSOR,Indian Statistical Institute 30th March 07, ISI,Kokata 1 Digital Repository support for Consortium Dr. Devika P. Madalli Documentation Research &
CONTENT DISCOVERY, SERVICES, AND SUSTAINED ACCESS Timothy Cole, William Mischo, Beth Sandore, Sarah Shreeves ~ University of Illinois Library
Introduction to metadata
Best Practices for Digital Imaging and Metadata Roy Tennant The Library, University of California, Berkeley
IUScholarWorks Technical Overview Randall Floyd Digital Library Program Programmer/Database Administrator.
OAI Overview DLESE OAI Workshop April 29-30, 2002 John Weatherley
Sobek for Curators and Collection Managers Training Two: Submitting and Editing Resource Files and Metadata Mark Sullivan November 2013 University of Florida.
A Multi-Tiered Architecture for Distributed Data Collection and Centralized Data Delivery Stacy Kowalczyk and James Halliday April 28, 2008.
Greenstone Internals How to Build a Digital Library Ian H. Witten and David Bainbridge.
5. Applying metadata standards: Application profiles Metadata Standards and Applications Workshop.
Peking University Digital Library Programs Overview
DSpace System Architecture 11 July 2002 DSpace System Architecture.
Sharing Digital Scores: Will the Open Archives Initiative Protocol for Metadata Harvesting Provide the Key? Constance Mayer, Harvard University Peter Munstedt,
Collection Management Systems
1 CS 430: Information Discovery Lecture 26 Architecture of Information Retrieval Systems 1.
Digitizing Historical Newspapers South Carolina Digital Newspaper Program's participation with the Library of Congress' Chronicling America: Historic American.
Virtual Collections VIRTUAL COLLECTIONS LDI Architecture Meeting, Tuesday, July 19.
The world’s libraries. Connected. The Benefits of CONTENTdm Hosting Services OCLC’s Digital Lifecycle Webinar Series April 9, 2013.
Information Retrieval in Practice
A look at the digital initiatives of Laval University Library
VI-SEEM Data Repository
Introduction to DSpace
Sobek for Curators and Collection Managers
Presentation transcript:

Building Collections Using Greenstone Tod A. Olson Sr. Programmer/Analyst Digital Library Development Center University of Chicago Library talks/2003/dlf-greenstone/

Greenstone New Zealand Digital Library Project at the University of Waikato In cooperation with UNESCO, Human Info NGO International, every continent Examples: Academic –Digitization projects –Classes on digital libraries Non-academic –UNESCO humanitarian documentation

Greenstone features Works with existing documents –Imports several formats Searching: full text and metadata –Dublin Core, custom metadata Browse Structured documents –Indexing, access Extensible & customizable OpenSource software (GPL)

User Interface overview Finding documents –Search full text and metadata indexes –Classifiers: browse lists for navigating collections Navigating documents –Navigate hierarchical documents by logical structure –Simple page turning (not shown) –Single page for simple documents (not shown)

Greenstone Architecture Receptionist Collection Server DB & Indexes Redrawn from Witten & Bainbridge, How to Build a Digital Library, p. 356 Protocol Collection Import DB & Indexes Collection Import DB & Indexes Collection Import Receptionist

Greenstone Architecture Receptionist Provides user interface Accept user input Send to appropriate collection server Accept results Dynamic page generation Collection Server Handle collection content Search and filter information Return results multiple collections

DB & Indexes HTML PDF ImportBuild GSAF ??? Building Collections

Building collections Create a collection framework –or work with an old collection Select documents Import documents –Converts to internal XML format (GSAF) Build collection –creates search indexes and browse listings

[Text, images, links, etc.] … GSAF: internal XML format

Section: Description –Metadata fields Content –Text,internal markup, images Section –No limit in number or depth Hierarchical documents Sections nest, tree structure

Config file: collect.cfg Collection-specific configuration file, collect.cfg, specifies: file types to import Indexes and browse lists –Document or section level –paragraph (text index only) display of results and browse listings document displays

Chopin Early Editions Over 400 early edition Chopin scores 1830’s to 1880’s Target audience: music scholars & musicians. On web, page-turnable JPEG images. Online in March 2003 Currently 372 scores in online collection Usage: Nearly100 hits per day, > 30% of use is international.

Catalog records Scanned Images Structural metadata METS & MODS XSLT Greenstone Archive Format Greenstone Dig. Library Software Human processing XML-based automated processing Build overview

Catalog records Detailed MARC/AACR2 record for each score Luxury: few print music collections have this much metadata

Scanned score images Scanned by Preservation staff Archival TIFF images –400dpi, 24-bit color, uncompressed JPEG derivatives for web-based delivery

"chopin","108","001","","1","" "chopin","108","002","","1","" "chopin","108","003","1","1","Nocturne, no.15" "chopin","108","004","2","1","" "chopin","108","005","3","1","" Structural and other metadata

Structural metadata Identify each image –document (score) no. & sequencial image no. Image content: –page no. as printed, milestones Staff use familiar RDB product Export data in CSV format Technical metadata recorded, not yet used

Catalog records Scanned Images Structural metadata METS & MODS XSLT Greenstone Archive Format Greenstone Dig. Library Software Human processing XML-based automated processing Build overview

dmdSec MODS fileSec URL: page1.jpg URL: page2.jpg structMap div DMDID=1 div FILEID=1 div FILEID=2 Catalog record (MARC) Scanned images (JPEG) Structural metadata METS & MODS

Program uses structural metadata to: Generate structMap Generate image URLs for fileSec –Images stored by naming convention Structural md carries catalog record no. Extract MARC from catalog crosswalk to MODS Embed in dmdSec

GSAF XML format for internal storage Hierarchical document structure –Nested sections: e.g. part 1, chapt. 2 METS to GSAF via XSLT Natural mapping from METS to GSAF –Map structural hierarchy –Follow links Descriptive metadata File content

dmdSec MODS: Title, … fileSec page1.jpg page2.jpg structMap div: Score div: Page 1 div: Page 2 Section Description Metadata: Title, … Content: Title, … Section Content: Page 1 page1.jpg Section Content: Page 2 page2.jpg METS to GSAF

dmdSec MODS: Title, … fileSec page1.jpg page2.jpg structMap div: Score div: Page 1 div: Page 2 Section Description Metadata: Title, … Content: Title, … Section Content: Page 1 page1.jpg Section Content: Page 2 page2.jpg METS to GSAF

dmdSec MODS: Title, … fileSec page1.jpg page2.jpg structMap div: Score div: Page 1 div: Page 2 Section Description Metadata: Title, … Content: Title, … Section Content: Page 1 page1.jpg Section Content: Page 2 page2.jpg METS to GSAF

Walk structural metadata to create the tree of elements Descriptive metadata: – Crosswalk to desired metadata names – : Format metadata desired for display File data – : Inline text, link to images, etc.

Customizing Chopin collection Focus on navigation –Metadata for custom access E.g. genre, dedicatee not in MARC/AACR2 Can support with METS, MODS, Greenstone –Custom document navigation Separate description from scores Custom page navigation –Improves usability Branding in next phase

Comments on Chopin Early Editions Data created by staff using familiar tools –Structural md created in desktop application Catalog records a luxury Catalog is DB of record –Project IDs in 909 –POIs point into Greenstone METS/MODS assembled by program –Expect to repurpose METS for other applications Customization: navigation, not branding –Faster to bring up collection, get user reaction

Greenstone benefits for Chopin Robust, mature system Recovered time in project –Fast to bring up –UI out of the box –Dynamic page generation –Incremental customization XML compliant –Natural mapping from METS to GSAF

Future work: Chopin Add DjVu image format Repurpose METS for other applications –OAI Standardize new digitization production flow –Project was first for METS, MODS, GS, & 6 depts. –Standardize collection of structural metadata –Plug in descriptive metadata as appropriate Store archival descriptive metadata in METS object Repurpose via XSLT for delivery

Other custom UI examples Lehigh Digital Bridges –Extensive changes to look Washington Research Libraries Consortium (WRLC) –Custom page banner –Popup page turner in Perl –GS as component of DL suite

Ongoing work: Greenstone Greenstone Librarian Interface (GLI) Greenstone 3

Greenstone Librarian Interface (GLI) Collection management –Informed by work at GS sites –Assist collection designer –Support all phases of collection build process –Do not specify workflow Java-based GUI tool –Formerly called the “Gatherer” 2 yrs in development In beta outside of lab –Bangalore, other sites –in current distribution

GLI functions Establish new collection (or work on old) Select files to include in collection Enrich files with metadata Select indexes, classifiers Build collection Customize appearance Preview collection

Greenstone 3 GS2 mature, 5+ yrs., wide deployment –Constraints: support legacy systems –Other technologies have matured: Java, XML GS3: rewrite in Java, XML, XSLT Distributed architecture, SOAP METS as internal format –Group assembled for Greenstone METS profile(s) OAI support planned 1 year in dev; alpha testing in lab

Conclusion Positive experiences Good direction for development Strong user community Proven in real digital library projects

Links & Further Information Chopin Early Editions: Greenstone: Downloads, documentation, examples New Zealand Digital Library Project: UNESCO & related collections, many demos Witten & Bainbridge. How to Build a Digital Library. Morgan Kaufman, 2003.