John OckerbloomJan.. 23, 2007 Planning for the next generation John Mark Ockerbloom Open Repositories 2007 January 23, 2007 A summary of recommendations.

Slides:



Advertisements
Similar presentations
Adding OAI-ORE Support to Repository Platforms Alexey Maslov, Adam Mikeal, Scott Phillips, John Leggett, Mark McFarland Texas Digital Library TCDL09.
Advertisements

Richard Jones, Systems Developer Technical Issues for Repository Software Theses Alive! Edinburgh University Library SHERPA Nottingham.
Introducing Vireo ETD Submittal and Management for DSpace Adam Mikeal, Scott Phillips, John Leggett, Mark McFarland Texas Digital Library.
IRRA DSpace April 2006 Claire Knowles University of Edinburgh.
October 28, 2003Copyright MIT, 2003 METS repositories: DSpace MacKenzie Smith Associate Director for Technology MIT Libraries.
Data Modeling and Database Design Chapter 1: Database Systems: Architecture and Components.
1 DB2 Access Recording Services Auditing DB2 on z/OS with “DBARS” A product developed by Software Product Research.
CNRIS CNRIS 2.0 Challenges for a new generation of Research Information Systems.
Funded by: © AHDS Sherpa DP – a Technical Architecture for a Disaggregated Preservation Service Mark Hedges Arts and Humanities Data Service King’s College.
Manakin Workshop DSpace User Group, February 2006 Scott Phillips Texas A&M University
© 2006 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice DSpace Architecture Progress and.
1 Archiving Workflow between a Local Repository and the National Library Archive Experiences from the DiVA Project Eva Müller, Peter Hansson, Uwe Klosa,
MIT’s DSpace A good fit for ETDs Margret Branschofsky Keith Glavash MIT LIBRARIES.
1 ISO – Metadata Next Generation International consensus being built on structured metadata within a broader Geomatics Standard under ISO Technical.
DSpace Rea Devakos and Gabriela Mircea University of Toronto Libraries.
Robust Tools for Archiving and Preserving Digital Data Joseph JaJa, Mike Smorul, and Mike McGann Institute for Advanced Computer Studies Department of.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation
Dspace – Digital Repository Dawn Petherick, University Web Services Team Manager Information Services, University of Birmingham MIDESS Dissemination.
Tools and Services for the Long Term Preservation and Access of Digital Archives Joseph JaJa, Mike Smorul, and Sangchul Song Institute for Advanced Computer.
Microsoft Office Open XML Formats Brian Jones Lead Program Manager Microsoft Corporation.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation Mike Smorul, Joseph JaJa, Yang Wang, and Fritz McCall.
Archival Prototypes and Lessons Learned Mike Smorul UMIACS.
DSpace XML UI Project Texas A&M University Digital Initiatives, Research and Technology Scott Phillips, Cody Green, Alexey Maslov, Adam Mikeal, Brian Surratt,
Massachusetts Institute of Technology Page 1 Open Knowledge Initiative CSG - Princeton, 05/07/03.
LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.
ETD Repositories Using DSpace Software Andrew Penman The Robert Gordon University 27 th September 2004.
Dr. Kurt Fendt, Comparative Media Studies, MIT MetaMedia An Open Platform for Media Annotation and Sharing Workshop "Online Archives:
Addressing Metadata in the MPEG-21 and PDF-A ISO Standards NISO Workshop: Metadata on the Cutting Edge May 2004 William G. LeFurgy U.S. Library of Congress.
MAHI Research Database Data Validation System Software Prototype Demonstration September 18, 2001
DSpace. TM 2 Agenda  Introduction to DSpace  DSpace community  Institutional Repository  Easy to add/find content in DSpace  Building Online Communities.
Architecture for a Database System
© 2005 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice The China Digital Museum Project.
Doug Tody E2E Perspective EVLA Advisory Committee Meeting December 14-15, 2004 EVLA Software E2E Perspective.
One Platform, Two Stories. Willamette University Oregon State University.
AIP Backup & Restore Sunita Barve NCRA, Pune. AIP The latest version of DSpace 1.7.0, supports backup and restore of all its contents as a set of AIP.
Design Analysis builds a logical model that delivers the functionality. Design fully specifies how this functionality will be delivered. Design looks from.
Managing the Impacts of Change on Archiving Research Data A Presentation for “International Workshop on Strategies for Preservation of and Open Access.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
John OckerbloomApr.. 16, 2007 Planning for the next generation John Mark Ockerbloom CNI Task Force Meeting April 16, 2007 A summary of recommendations.
This presentation describes the development and implementation of WSU Research Exchange, a permanent digital repository system that is being, adding WSU.
Richard Jones, July 2005 Integrating Local Developments to DSpace.
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
How to Implement an Institutional Repository: Part II A NASIG 2006 Pre-Conference May 4, 2006 Technical Issues.
Funded by: © AHDS Preservation in Institutional Repositories Preliminary conclusions of the SHERPA DP project Gareth Knight Digital Preservation Officer.
Digital Library Repositories and Instructional Support Systems: Repository Interoperability Working Group Leslie Johnston University of Virginia Library.
Digital Preservation across the technologies, strategies, open standards & interoperability aspects including the legal issues Pratik Shrivastava Scientist.
Firmware - 1 CMS Upgrade Workshop October SLHC CMS Firmware SLHC CMS Firmware Organization, Validation, and Commissioning M. Schulte, University.
DSpace - Digital Library Software
April 14, 2005MIT Libraries Visiting Committee Libraries Strategic Plan Theme III Work to shape the future MacKenzie Smith Associate Director for Technology.
DSpace System Architecture 11 July 2002 DSpace System Architecture.
ATLAS Database Access Library Local Area LCG3D Meeting Fermilab, Batavia, USA October 21, 2004 Alexandre Vaniachine (ANL)
Lifecycle Metadata for Digital Objects November 15, 2004 Preservation Metadata.
Collection Management Systems
Enterprise Library 3.0 Memi Lavi Solution Architect Microsoft Consulting Services Guy Burstein Senior Consultant Advantech – Microsoft Division.
A Semi-Automated Digital Preservation System based on Semantic Web Services Jane Hunter Sharmin Choudhury DSTC PTY LTD, Brisbane, Australia Slides by Ananta.
IPDA Registry Definitions Project Dan Crichton Pedro Osuna Alain Sarkissian.
Distributing Repository functions with DSpace Richard Jones.
Self-Contained Systems
Joseph JaJa, Mike Smorul, and Sangchul Song
DSpace Architecture Review Meeting
Introduction, Features & Technology
Flexible Extensible Digital Object Repository Architecture
Flexible Extensible Digital Object Repository Architecture
API Documentation Guidelines
Introduction to DSpace
Implementing an Institutional Repository: Part II
Health Ingenuity Exchange - HingX
Open Archival Information System
Implementing an Institutional Repository: Part II
How to Implement an Institutional Repository: Part II
Presentation transcript:

John OckerbloomJan.. 23, 2007 Planning for the next generation John Mark Ockerbloom Open Repositories 2007 January 23, 2007 A summary of recommendations for the next DSpace architecture

John OckerbloomJan.. 23, 2007 Why a new DSpace architecture? Use, scale, and dependence on DSpace growing –New applications continue to develop –Repositories growing older (with preservation needs growing too) –Patching, ad-hoc development only gets you so far (and may lead you into dead ends) Architectural needs –Set priorities for DSpace development, functionality –Handle the variety of content, metadata institutions manage –Make it easier to develop, customize, compose with other systems –No DSpace is an island Set directions for an evolutionary, practical system design –Serves community needs for several years, but take no more than a couple of years to produce

John OckerbloomJan.. 23, 2007 Architecture review group John Mark Ockerbloom, Penn (chair) Tim Di Lauro, Johns Hopkins Mark Diggory, MIT John Erickson, Hewlett Packard Jim Downing, Cambridge University Henry Jerez, CNRI Richard Jones, Imperial College Gabriela Mircea, University of Toronto Scott Phillips, Texas A&M University Richard Rodgers, MIT Mackenzie Smith, MIT Robert Tansley, Google Graham Triggs, Biomed Central

John OckerbloomJan.. 23, 2007 The process DSpace 2 discussions started in 2004 In summer 2006, group chosen to review complete architecture –From DSpace committers, major developers, other stakeholders and architectural experts Online discussion –On Wiki, DSpace-devel and review group list –Manifesto, issues lists, survey Week-long “summit” –October 2006, Cambridge, MA –Came up with recommendations, proposals for DSpace 2 Follow-on activity –Subgroups to address workflow, extension framework issues –Further development of data model, development roadmap –Report and presentation <-- (You Are Here)

John OckerbloomJan.. 23, 2007 Survey Questions and comments about use and customization of DSpace repositories –Reponses solicited on DSpace mailing lists –116 responses in one week Adaptation common –Many customize metadata –About 1/4 change database schema –1/2 made significant code changes –Problems keeping customizations, new versions in sync Commonly desired –Better modularity –More customizable UI –Complex objects –Versioning Full results, comments on DSpace Wiki

John OckerbloomJan.. 23, 2007 DSpace architecture manifesto: Part 1: DSpace nature 1.DSpace is primarily open source software for building digital repositories. -- Avoid scope creep (e.g. into general purpose CMS, Wiki…) 2. DSpace will be usable based purely on free and open source software. -- Avoid proprietary dependencies -- May still support closed source as option (e.g. Oracle) 3. DSpace will have a decoupled, stable, and application-neutral core, -- Not the full distribution -- Applications and extensions built on it 4. While usable for a variety of applications, DSpace will retain useful "out-of-the-box" functionality for common use cases. -- Standard distribution includes full open access archive

John OckerbloomJan.. 23, 2007 DSpace architecture manifesto: Part 2: DSpace development 5. DSpace will employ and support existing, open standards where possible and practical. -- Makes DSpace easier to develop, interoperate -- May make it easier to integrate other open source SW 6. DSpace releases should be minimally disruptive. -- Keep repositories stable -- Ease customization, maintenance 7. DSpace will support an exit strategy for content. -- “It’s the content, stupid.” 8. DSpace will continue to evolve. -- Because what DSpace users do and need to do evolves

John OckerbloomJan.. 23, 2007 Scalability Three scale dimensions of primary concern –Size of repository –Intensity of use –Rate of ingestion, other processing Group did not see major architectural limits to scale –but revisions need to accommodate large scale use Desired performance goals: –10M items –10 simultaneous depositors, 100 simultaneous users –1 sec addition overhead at full size scale –Accommodation of clusters, unlimited size files –Ponies for everyone! (Okay, maybe not that…)

John OckerbloomJan.. 23, 2007 Interoperability Aspects of interoperability –Data interoperability (can I reuse, import, export info) –Service interoperability (protocols others can invoke) –API-level interoperability (extensions, new implementations) Needed for this: –Published concrete data model for content and metadata, fully exportable and importable –Published, documented, stable core interface »Designed with extensions in mind –Common, standard protocols supported in standard distribution

John OckerbloomJan.. 23, 2007 DSpace 2 highlights More powerful, flexible data model Shift in user interface model Core overhaul, documentation –to make extensions, customizations easier to add, maintain Focus on extended lifecycle of content More reuse of third-party development –Extension frameworks, workflow managers

John OckerbloomJan.. 23, 2007 Revised Item data model Manifestations: Replace Bundles, used for content only (some old Bundles would become metadata) Content Files: Replace Bitstreams Can have multiple metadata records, attached to Items or sub-components

John OckerbloomJan.. 23, 2007 Identifiers Handles still default identifiers for content, but system should support others Persistent identifiers for Epeople Components within items should also have persistent identifiers –Proposal: URIs based on the Item identifier, with various qualifiers for Manifestation, Content File, and Version

John OckerbloomJan.. 23, 2007 Versioning Used for non-semantic revisions of content and metadata –Format migrations –Revised metadata –Possibly minor content corrections (typos, etc.) Semantic revisions (e.g. published vs. pre-print) can be separate items with Relation metadata to link –Not enforced by the system, but makes citation clearer –Will need metadata, UI support for ease of use Versions have their identifiers –No version specified = use latest version Retention of old versions matter of repository policy –But can be cheap to retain if not much changes

John OckerbloomJan.. 23, 2007 An example Item Version and its identifiers

John OckerbloomJan.. 23, 2007 Further data model recommendations Metadata made more flexible, preservable –Managed and preserved in the persistent store –Multiple records supported –Serializable –Not constrained to be flat –Default schemas for Items, Content files… –Views of metadata can be projected into DB schemas for efficiency of access Separate abstract data model from concrete data storage Generalize Collections, Communities –But not yet recommending mixed-content Containers

John OckerbloomJan.. 23, 2007 User interface We like Manakin –XML-based interface makes it easier to customize, pipeline DSpace Recommend adding it to DSpace 1 standard distribution Should become standard UI for DSpace 2 Requires an add-on mechanism to integrate –There’s a simple one now published for this purpose –But a more generalized approach could make it easier to add new DSpace applications, customizations…

John OckerbloomJan.. 23, 2007 Extension frameworks Extension or add-on mechanisms needed to integrate certain components (like Manakin) Reusing existing one preferred over doing one from scratch –There are a number of possible candidates (OSGi and Spring have come up as possibilities) –Requirements, discussion on Wiki –Implementation work, community input helpful in settling on one In the meantime, simple add-on mechanism released for handling Manakin and similar packages

John OckerbloomJan.. 23, 2007 Event mechanism Core should include event notification mechanism –Allows loosely coupled, open-ended components –Can be used to support history mechanism, view maintenance, UI How it works –Listeners register interest in certain types of events –Changes in data, other phenomena, raise events that notify appropriate listeners Prototype system developed under DSpace 1 –Led by Larry Stone at MIT Details of DSpace 2 implementation may depend on decisions made for implementation frameworks

John OckerbloomJan.. 23, 2007 Workflow It’s not just for ingestion any more Again, can be supported by existing packages instead of rolling our own –Open WFE, OSWorkflow, jBPM identified as promising candidates –Other profiles, discussion on DSpace Wiki Better tools for specifying, modifying repository workflows needed in DSpace –Like user interfaces for non-programmers

John OckerbloomJan.. 23, 2007 The road to DSpace 2 Core group (~3 people) does detailed specs of core, documentation, reimplementation –Not from scratch, but review needed of all core interfaces in light of new design –Goal: Have working DSpace 2 core within 2 years. Would not be responsible for entire standard distribution Architectural oversight committee (different from review group, but some overlap) monitors progress Wider community supports DSpace distribution effort –developing extensions, applications, supporting and giving feedback to core group’s specs and docs DSpace 1 continues to evolve in the meantime –Manakin, Events, etc. Help build the next generation! –See full report, discussion on