Architecting an Extensible Digital Repository Anoop Kumar, Ranjani Saigal,Rob Chavez, Nikolai Schwertner Tufts University, Medford, MA.

Slides:



Advertisements
Similar presentations
October 28, 2003Copyright MIT, 2003 METS repositories: DSpace MacKenzie Smith Associate Director for Technology MIT Libraries.
Advertisements

1. The Digital Library Challenge The Hybrid Library Today’s information resources collections are “hybrid” Combinations of - paper and digital format.
VISUAL UNDERSTANDING ENVIRONMENT Ranjani Saigal and Anoop Kumar Academic Technology Tufts University.
Transformations at GPO: An Update on the Government Printing Office's Future Digital System George Barnum Coalition for Networked Information December.
Depositing e-material to The National Library of Sweden.
1 Archiving Workflow between a Local Repository and the National Library Archive Experiences from the DiVA Project Eva Müller, Peter Hansson, Uwe Klosa,
Building a Digital Library with Fedora International Conference on Developing Digital Institutional Repositories Hong Kong December 9, 2004.
The Fedora Project April 28-29, 2003 CNI, Washington DC Thornton Staples University of Virginia Sandy Payette Cornell Information Science.
The Fedora Project March 19, 2003 ISTEC Symposium, Brazil Sandy Payette Cornell Information Science.
Brief Overview of Major Enhancements to PAWN. Producer – Archive Workflow Network (PAWN) Distributed and secure ingestion of digital objects into the.
Robust Tools for Archiving and Preserving Digital Data Joseph JaJa, Mike Smorul, and Mike McGann Institute for Advanced Computer Studies Department of.
All Things to All People Combining Resources to Build an Integrated Digital Repository Preservation and Access for Electronic College and University Records.
WMS: Democratizing Data
Internet Resources Discovery (IRD) IBM DB2 Digital Library Thanks to Zvika Michnik and Avital Greenberg.
Building Personal Collections and Networks of Digital Objects in a Fedora Repository Using VUE Anoop Kumar Nikolai Schwertner Tufts University Fedora User.
Demonstration of repositories Fedora (Flexible Extensible Digital Object Repository Architecture) Marie Lagerwall MIDESS Partners Meeting February 9, 2007.
Content Management Systems: Enabling E- Teaching and Learning Anju Relan and Sally Krasne David Geffen School of Medicine at UCLA.
Digital Object: A Virtual Online Storage Solution 598C Course Project Huajing Li.
Cluj Napoca, 28 August IEEE International Conference on Intelligent Computer Communication and Processing Digital Libraries Workshop Towards.
Architecting Extensible Digital Repository Services Robert Chavez, Robert Dockins, Anoop Kumar, Matthew Mcvey, Ranjani Saigal, Nikolai Schwertner Tufts.
Digital Library Architecture and Technology
The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation SEASR Overview Loretta Auvil and Bernie Acs National.
OCLC Online Computer Library Center CONTENTdm ® Digital Collection Management Software Ron Gardner, OCLC Digital Services Consultant ICOLC Meeting April.
1. 2 introductions Nicholas Fischio Development Manager Kelvin Smith Library of Case Western Reserve University Benjamin Bykowski Tech Lead and Senior.
ISpheres Project. Project Overview iSpheresCore iSpheresImage Demonstration References.
Implementing an Integrated Digital Asset Management System: FEDORA and OAIS in Context Paul Bevan DAMS Implementation Manager
1 XML as a preservation strategy Experiences with the DiVA document format Eva Müller, Uwe Klosa Electronic Publishing Centre Uppsala University Library,
LIS 506 (Fall 2006) LIS 506 Information Technology Week 11: Digital Libraries & Institutional Repositories.
WS-Security: SOAP Message Security Web-enhanced Information Management (WHIM) Justin R. Wang Professor Kaiser.
Indo-US Workshop, June23-25, 2003 Building Digital Libraries for Communities using Kepler Framework M. Zubair Old Dominion University.
Web services at TRFIC TRFIC has developed the Access Technologies to achieve its goals of interoperability and provide access to data and information on.
Integrated Collaborative Information Systems Ahmet E. Topcu Advisor: Prof Dr. Geoffrey Fox 1.
PLoS ONE Application Journal Publishing System (JPS) First application built on Topaz application framework Web 2.0 –Uses a template engine to display.
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
UVa's Digital Library CSG - September 2005 Slides courtesy of: Leslie Johnston Director, Digital Access Services, UVA Library Tim Sigmon University of.
MTA SZTAKI Department of Distributed Systems The problems of persistent identifiers in the context of the National Digital Data Archives of Hungary András.
CBSOR,Indian Statistical Institute 30th March 07, ISI,Kokata 1 Digital Repository support for Consortium Dr. Devika P. Madalli Documentation Research &
Alternative Architecture for Information in Digital Libraries Onno W. Purbo
Selene Dalecky March 20, 2007 FDsys: GPO’s Digital Content System.
How to Implement an Institutional Repository: Part II A NASIG 2006 Pre-Conference May 4, 2006 Technical Issues.
Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet.
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
The NLW Digital Asset Management System Paul Bevan DAMS Implementation Manager
Metadata and Meta tag. What is metadata? What does metadata do? Metadata schemes What is meta tag? Meta tag example Table of Content.
DSpace - Digital Library Software
M-1 INGEST OVERVIEW Don Sawyer National Space Science Data Center NASA/GSFC October 13, 1999.
Tufts University Sciences Knowledgebase (TUSK) David A. Damassa Chair, TUSK Steering Committee Dean for information Technology Internet 2, Medical Middleware.
The library is open Digital Assets Management & Institutional Repository Russian-IUG November 2015 Tomsk, Russia Nabil Saadallah Manager Business.
A Project of the University Libraries Ball State University Libraries A destination for research, learning, and friends.
The Earth Information Exchange. Portal Structure Portal Functions/Capabilities Portal Content ESIP Portal and Geospatial One-Stop ESIP Portal and NOAA.
Eliot Wilczek University Records Manager Digital Collections and Archives Tufts University Institutional Repositories: Models & Approaches A NELINET Seminar.
Managing live digital content with DuraSpace services Bill Branan PASIG Spring 2015.
Physical Oceanography Distributed Active Archive Center THUANG June 9-13, 20089th GHRSST-PP Science Team Meeting GHRSST GDAC and EOSDIS PO.DAAC.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
5/29/2001Y. D. Wu & M. Liu1 Content Management for Digital Library May 29, 2001.
Eliot Wilczek University Records Manager Digital Collections and Archives Tufts University Repositories: How are They Evolving? A NERCOMP Workshop September.
Grid Services for Digital Archive Tao-Sheng Chen Academia Sinica Computing Centre
? What is Institutional Repository for Rutgers University
The Fedora Project March 19, 2003 ISTEC Symposium, Brazil
Joseph JaJa, Mike Smorul, and Sangchul Song
UNC Digital Library Project
Flexible Extensible Digital Object Repository Architecture
CS 501: Software Engineering Fall 1999
Flexible Extensible Digital Object Repository Architecture
An Architecture for Complex Objects and their Relationships
Implementing an Institutional Repository: Part II
Tufts University Sciences Knowledgebase (TUSK)
BUILDING A DIGITAL REPOSITORY FOR LEARNING RESOURCES
Implementing an Institutional Repository: Part II
How to Implement an Institutional Repository: Part II
Presentation transcript:

Architecting an Extensible Digital Repository Anoop Kumar, Ranjani Saigal,Rob Chavez, Nikolai Schwertner Tufts University, Medford, MA

Overview Background Information on the evolution of TDL Design Requirements TDL Architecture Applications that interface with TDL – Tufts DL search – VUE

History of Digital Collections at Tufts About Tufts – Interdisciplinary – Focus on teaching and learning Digital Collections at Tufts – Perseus (Classics) – Tufts University Science Knowledgebase (TUSK-Medicine) – Artifact (Art History) – Digital Collections and Archives (DCA) Bolles, etc – Other (Crime and Punishment)

ProjectsMaterialsTools Perseus DL50 million words, highly structured TEI encoded XML texts of many types. 50,000 images Perseus document management system and tools DCA13 million words, 35,000 images, geospatial datasets multimedia objects Perseus document management system and tools TUSK15,000 documents Includes full-text syllabi, digital slide images, lecture recordings (audio and video) and text notes and exam questions, evaluation forms, and bibliographies linked to full-text articles. Networked course management system interface Artifact2500 images links to the Art History slide collection database containing 120,000 entries. On-demand viewing and searching with Internet- based adaptations of traditional learning aids, such as flashcards, for review and study

Why TDL? (Tufts Digital Library) The collections were continuously expanding adding content in a variety of formats. The architecture of these libraries was not built to accommodate such expansion. Needed a university wide digital repository that can manage the ever increasing content while continuing to service the discipline specific needs and leveraging existing and new tools and service

Designing TDL Digital Collections and Archives partnered with Academic Technology to create a digital library that can manage the content while supporting teaching and learning. Commitment to comply with standards in the library and the open source community. Ensure Scalability, Flexibility, Reusability, Extensibility and Interoperability

Design Requirements Ingest: – Ability to enforce archival standards Management: – Use of information packages to facilitate storage and dissemination – Ability to incorporate content models Persistence: – Use of persistent identifiers – mapped URNs RequirementsSystem Services Unique and persistent identification of materials Naming Service Use of archival information packages (AIP) Digital Object Provider (DOP) Service -- Fedora Use of submission information Packages (SIP) Drop Box, Ingestion Service Use of Dissemination Information Packages (DIP) DOP Service Authentication and integrity checking DOP Service DisseminationDisseminators, Caching Service, Digital Library Application, Search Service AccessSearch Service and other applications

Tufts DL Architecture Fedora Drop Box Fedora Ingestion Service Application Creation Service Search Indexing Service Naming Service Search Index Search Interface Application Data Application Interface Fedora Client M U U A A U - Users M - Manager A - Administrators

Components of TDL ComponentRole Drop Box and Ingestion Service Validation, Tagging, Preprocessing, Ingestion Naming ServiceUnique persistent identifiers mapped to objects (“tufts:dca:central:MS ”) Fedora RepositoryManagement and access framework for digital objects Search and Indexing Service Provides search mechanism Application Creation Service Provides mechanism for external applications to interface with repository

TDL Architecture Drop Box and Ingestion Service Naming Service Fedora Repository Service at Tufts Indexing Service and Search Engine Application Creation Service

Drop Box and Ingestion Service

TDL Architecture Drop Box and Ingestion Service Naming Service Fedora Repository Service at Tufts Indexing Service and Search Engine Application Creation Service

Naming Service Assigns, reserves and resolves URNs URN Format tufts:school name:owner:[collection:]item name tufts:dca:central:MS URN Properties – Provides unique ID to objects deposited into repository – Service assures resolution to unique resource.

TDL Architecture Drop Box and Ingestion Service Naming Service Fedora Repository Service at Tufts Indexing Service and Search Engine Application Creation Service

Fedora Repository Fedora - Key Features Repository at Tufts Content Models at Tufts – Objects, Behaviors and Disseminator Implementation Challenges

Flexible Extensible Data Object Repository Architecture (Fedora) Support for heterogeneous data types Accommodation of new types as they emerge Aggregation of mixed, possibly distributed, data into complex objects The ability to specify multiple content disseminations of these objects The ability to associate rights management schemes with these disseminations.

Storage Device High Bandwidth (20Mb TIFF) HTTP Request Medium Bandwidth (20Mb TIFF) HTTP (200Kb JPEG) Medium Bandwidth Request Caching Service Fedora Processing Service HTTP Server stores URLs for User Applications (200Kb JPEG) Internet Bandwidth HTTP Request Repository Model

Content Model (CM) Hierarchy Specific Implementations (TEI text, EAD text, Encyclopedia, Directory, TIFF image, etc) Text CM getTOC getChunksList getChunk Etc. Image CM getThumbnail getAccessHigh getImageStats Etc. Binary CM getObject getMIME Etc. Collection CM getObjects getInfo Etc. VUE CM getConceptMap getResource Etc. Indexing Disseminators getIndexTerms getForIndexing Etc. Repository-Level Disseminators getArchivalCopy getPreview getClass Etc.

Implementation Challenges Processing Large XML Documents Transforming Large Images Modeling Collections Advanced Search Customized Search Caching Disseminations

TDL Architecture Drop Box and Ingestion Service Naming Service Fedora Repository Service at Tufts Indexing Service and Search Engine Application Creation Service

Indexing Service and Search Engine Indexing – Specialized Polymorphic Disseminators Implementation – Lucene Supported Types of Search – Basic Keyword – Advanced metadata based Accessing the service – HTTP GET/POST – SOAP

TDL Architecture Drop Box and Ingestion Service Naming Service Fedora Repository Service at Tufts Indexing Service and Search Engine Application Creation Service

An important design requirement for TDL was to allow current digital library applications to easily interface with TDL and provide access to the content in the digital library within their own environments in a seamless fashion. Current applications like Perseus can interface with this service to allow their tools to disseminate the content that resides in TDL The service has been designed not only to support current application but also to accommodate the needs of future yet-to- be-defined applications like course management systems, learning tools, portals etc.

Applications Accessing TDL Content Tufts DL Search Visual Understanding Environment (VUE)

Why TDL? (Tufts Digital Library) The collections are continuously expanding adding content in a variety of formats. The current architecture of these libraries is not built to accommodate such expansion. Need a university wide digital repository that can manage the ever increasing content while continuing to service the discipline specific needs and leveraging existing and new tools and service

Future Direction Authentication and authorization service Customization and enhancement to to address a wide variety of needs. Provide automated browsing service for Repository.