Presentation is loading. Please wait.

Presentation is loading. Please wait.

DAWG - April 9, 2003 1 An Interim Report from DAWG Digital Architecture and Infrastructure Working Group Chartered by Grace Agnew to: – Develop policies.

Similar presentations


Presentation on theme: "DAWG - April 9, 2003 1 An Interim Report from DAWG Digital Architecture and Infrastructure Working Group Chartered by Grace Agnew to: – Develop policies."— Presentation transcript:

1 DAWG - April 9, 2003 1 An Interim Report from DAWG Digital Architecture and Infrastructure Working Group Chartered by Grace Agnew to: – Develop policies and procedures to support an integrated, secure, and effective common infrastructure – Develop a digital library infrastructure to support an integrated, sustainable digital library initiative. Goals include: – Provide sustainability of the digital content and technology platform – Support of the RUL Data Architecture – Apply new interoperability protocols – Support state-wide initiatives

2 DAWG - April 9, 2003 2 DAWG Team Anne Butman Tom Frusciano Judy Gardner Michael Giarlo Nick Gonzaga Dave Hoover Patrick Huey Ron Jantz (chair) Sam McDonald Ann Montanaro Lynn Mullins Robert Nahory Jeffery Triggs Karen Wenk Yang Yu

3 DAWG - April 9, 2003 3 Challenges in Digital Libraries Integration across diverse digital collections Scale to millions of objects Flexibility to handle many digital formats Ability to customize by adding special tools and services Preservation of digital objects Sustainability and interoperability

4 DAWG - April 9, 2003 4 Initial Focus of DAWG Infrastructure  Evaluating and selecting a large mass storage system to accommodate millions of digital objects Architecture  Developing the architecture and prototype for an RU digital library network.

5 DAWG - April 9, 2003 5 Concepts and Terminology RU Digital Library Network (DLN) A system of people, standards, and software/hardware that provide the access, management, and preservation of digital repositories of interest to RU. RUL Digital Library Repository (DLR) A repository that is designed and managed by RUL to contain and provide access to digital resources created by RU and RUL. The DLR is part of the DLN. Digital Object Architecture – support of complex objects – multiple manifestations, e.g. a book represented as images, text, and digital sound – multiple formats, e.g. a map represented as tiff, djvu, and MrSid – multiple behaviors, e.g. display at different resolutions, rotate a 3D object, etc.

6 DAWG - April 9, 2003 6 Architecture Design Philosophy Design Principles: Interoperability, Sustainability, and Extensibility Informed by the Open Archive Information System (OAIS) Reference Model. Designed to contain the output of RU (both scholarly material and administrative data). Policy decisions will determine content and how distributed or centralized the repository will ultimately become. Will accomodate a virtual network of repositories enabling access to existing metadata repositories (IRIS, Luna) as well as providing a framework for accessing and searching external metadata resources. The technological framework and content must be sustainable. All information resources, on submittal to the repository, should have, at a minimum, a core set of metadata that can be mapped to RU Core. The architecture is flexible (customizable) and extensible. For example, discipline- specific portals can be developed.

7 DAWG - April 9, 2003 7 RU Digital Library Network - Features Large scale, stable, digital repository Searching across multiple repositories Searching and browsing using RU Core Flexible metadata support Access through portals by community, content, and format Easy to use submission process Digital preservation with persistent identifiers Flexible, digital object architecture Access to existing digital collections Sustainability through open-source, standards, and support of critical workflow processes.

8 DAWG - April 9, 2003 8 RU Digital Library Network Possible Content Maps (e.g. digitized historic New Jersey Maps) Historic documents Electronic Journals 3D objects (e.g. glass art, Roman coins, scrapbooks) Multimedia objects (e.g. digital video) Special ebook collections Numeric data Preprints, learning objects from RU faculty Dissertations Operational and Administrative RU Reports Object level access to existing digital collections (e.g. NJEDL) Searchable metadata collected through harvesting.

9 RU Digital Library Network Search and Browse Interface RU Digital Library Repository IRIS Federated (z39.50) (tightly coupled) NJEDLLUNA Other Nodes Harvested (OAI-PMH) (loosely coupled)

10 Cross-Repository Searching An Early DLN Prototype

11 DAWG - April 9, 2003 11 Digital Object Structure – Three Types Metadata Ptr to External Digital object Harvested Metadata Digital Objects METS Wrapper Metadata Byte stream Persistent ID

12 DAWG - April 9, 2003 12 Repository Architecture and Metadata METS (Metadata Encoding and Transmission Standard) will be used to encapsulate descriptive, preservation, structural and behavior metadata. For interoperability, all metadata schemas must map to NJCore and Dublin Core The architecture must support creation of simple (NJCore, Dublin Core) and complex metadata (FGDC, MPEG-7, IEEE LOM, etc.)

13 DAWG - April 9, 2003 13 Metadata and Dynamic Mapping An Example Object Repository FGDC – for maps Preservation Structure Object Input – METS Wrapper Global Search/Retrieval Via RU Core FGDC Search Via lat & long

14 DAWG - April 9, 2003 14 Open Source Digital Repositories Dspace – A digital library repository DSpace is a specialized type of digital asset management or content management system: it manages and distributes digital items, made up of digital files (or “bitstreams”) and allows for the creation, indexing, and searching of associated metadata to locate and retrieve the items. It is designed to support the long-term preservation of the digital material stored in the repository. (http://dspace.rutgers.edu)http://dspace.rutgers.edu Fedora – A digital object repository Fedora is a foundation upon which interoperable web-based digital libraries can be built. Fedora consists of APIs (application program interfaces) for creating access and management applications.

15 DAWG - April 9, 2003 15 Archival Storage and Preservation A physically separate archive is managed for preservation purposes. The archive is separate from the presentation form (website) and the daily backup. The intent of the archive is to capture all the required forms of the digital material in non-proprietary format. Each digital object would have preservation metadata and a persistent ID.

16 DAWG - April 9, 2003 16 Mass Storage System - Requirements Initial capacity of 10 to 20 Terabytes (TB) Extensible to 100, 200TB and beyond Low management overhead Information must survive migrations across software and platforms History/audit trails required for each object Mirroring to a remote cluster (e.g. a cluster in NB and one in Newark) to provide offsite backup. Global name space across all RUL locations Platforms required: Windows 2000, Unix, Linux

17

18 DAWG - April 9, 2003 18 Technologies and Standards Persistent ID – CNRI Handle System OAI-PMH – Protocol for metadata harvesting METS – Metadata Encoding and Transmission Standard OpenURL SCORM

19 DAWG - April 9, 2003 19 Detailed Requirements Ingest Administration Access Data Management Preservation Storage System Level

20 DAWG - April 9, 2003 20 Progress To Date Infrastructure – Commercial product discussions and quote from EMC for mass storage. Also examining ADIC and IBM’s Storage Tank (an open source product). – CamdenBase directories/permissions standardized for transfer to systems. – Developed initial criteria for an RUL server registry Architecture – Educating ourselves in various technologies: 1) OAI-PMH, 2) CNRI Handle System, 3) SCORM, 4) Z39.50/YAZ, 5) METS, 6) OpenURL – Draft for requirements and architecture – Cross-repository search prototype – Downloaded Dspace (from MIT) - started evaluation – UVa (Fedora) visit planned for early March

21 DAWG - April 9, 2003 21 Next Steps Fedora – visit UVa for half day tutorial Continue reviewing and select mass storage system Prepare interim communication package – High level architecture and requirements – Cross-repository searching prototype – Preliminary assessment of Dspace and Fedora Communicate and Get Feedback Begin more detailed evaluation of Dspace and Fedora Produce architecture/functional specification Develop prototype with sample content

22 DAWG - April 9, 2003 22 Tasks and Timeline for DAWG-A January, 2003 – Requirements/Architecture document February, 2003 – Discussion, Feedback with RUL and RU March – May, 2003 – Evaluation of candidate systems (Dspace, Fedora, et al) June – August, 2003 – Select system and prototype sample content September - November, 2003 – Prototype-trial of multiple repositories

23 DAWG - April 9, 2003 23 Tasks and Timeline for DAWG-I December, 2002 - Determine requirements for mass storage system December – January, 2003 – Transfer CamdenBase to Systems, test, and evaluate process December – January, 2003 - Research and evaluate possible mass storage products March, 2003 - Recommend mass storage solution March 2003 - Develop RUL server registry criteria


Download ppt "DAWG - April 9, 2003 1 An Interim Report from DAWG Digital Architecture and Infrastructure Working Group Chartered by Grace Agnew to: – Develop policies."

Similar presentations


Ads by Google