Foundations of Excellence DSpace vs Fedora: Or what I do on my summer vacation
TRLN: Staff Enrichment Series: 8 Nov, 2007 Objectives Background: Why we even considered a digital repository Background: Why we even considered a digital repository FOE – version 1 FOE – version 1 DSpace & Fedora: 50,000 foot view DSpace & Fedora: 50,000 foot view FOE – version 2 FOE – version 2 FOE – version 3 FOE – version 3 Where to from here? Where to from here?
TRLN: Staff Enrichment Series: 8 Nov, 2007 Background
75 th Anniversary Duke University School of Medicine established in 1930 Duke University School of Medicine established in – year-long celebration 2005 – year-long celebration New published history New published history Articles, videos, speeches Articles, videos, speeches Alumni weekend gala event Alumni weekend gala event Josiah C. Trent Foundation Grant Josiah C. Trent Foundation Grant
TRLN: Staff Enrichment Series: 8 Nov, 2007 Digitization Project 500 images documenting the first 3 decades of the School of Medicine and Hospital 500 images documenting the first 3 decades of the School of Medicine and Hospital Image groups: Image groups: Buildings Buildings Education Education Events Events Clinical Clinical People People Technology Technology
TRLN: Staff Enrichment Series: 8 Nov, 2007 Digitization Project (cont.) Selection – Whole staff Selection – Whole staff Digitization – Outsourced to University Photography Digitization – Outsourced to University Photography Description – Technical services and Reference coordinators Description – Technical services and Reference coordinators Subject terms – Technical services coordinator, Head, Cataloging services. Subject terms – Technical services coordinator, Head, Cataloging services. Controlled vocabulary – Notetab templates and libraries Controlled vocabulary – Notetab templates and libraries
FOE1.0 XML, XSLT, and Postgresql
TRLN: Staff Enrichment Series: 8 Nov, 2007 FOE images = 600 xml files = 2 xslt stylesheet 600 images = 600 xml files = 2 xslt stylesheet Xml = EAD2002 Xml = EAD2002EAD2002 XSLT = 1) convert xml to html; 2) convert xml to SQL statements XSLT = 1) convert xml to html; 2) convert xml to SQL statements Postgresql database used only for search Postgresql database used only for search Result html Result html html html
TRLN: Staff Enrichment Series: 8 Nov, 2007 Issues SQL search statements worked…not SQL search statements worked…not No indexing by search engines No indexing by search engines JDBC JDBC I am not a programmer I am not a programmer Definite need for improvements Definite need for improvements
TRLN: Staff Enrichment Series: 8 Nov, 2007 DSpace & Fedora: A Birds-eye View
TRLN: Staff Enrichment Series: 8 Nov, 2007 Need for a Digital Repository DSpace DSpace First released in Developed by MIT Libraries and Hewlett-Packard (USA Today) First released in Developed by MIT Libraries and Hewlett-Packard (USA Today)USA TodayUSA Today Current version (download) Current version (download)download Optimal performance in a *nix environment, but should operate in any environment Optimal performance in a *nix environment, but should operate in any environment Written in Java Written in Java VERY active listservs VERY active listservs Manakin – TAMU created “front-end” which makes for easier UI localization Manakin – TAMU created “front-end” which makes for easier UI localization
TRLN: Staff Enrichment Series: 8 Nov, 2007 Need for a Digital Repository (cont.) FEDORA (Flexible Extensible Digital Object and Repository Architecture) FEDORA (Flexible Extensible Digital Object and Repository Architecture) Began as a DARPA and NSF-funded research project at Cornell in 1997 Began as a DARPA and NSF-funded research project at Cornell in , UVA and Cornell: $1M Mellon grant 2001, UVA and Cornell: $1M Mellon grant 1.0 released released 2003 Current version (download) Current version (download)download Optimal performance in a *nix env, but will run on Windows based systems Optimal performance in a *nix env, but will run on Windows based systems Written in Java Written in Java Several front-end tools developed. (more in a moment) Several front-end tools developed. (more in a moment)
TRLN: Staff Enrichment Series: 8 Nov, 2007 Side by side testing Testing environment: Testing environment: Lenovo T60, 120 G hard drive, 2 G memory, Fedora 7, kernel, java 1.5 Lenovo T60, 120 G hard drive, 2 G memory, Fedora 7, kernel, java 1.5
TRLN: Staff Enrichment Series: 8 Nov, 2007 Requirements DSpace DSpace Java1.4 + Java1.4 + Apache Ant Apache Ant Postgresql (or Oracle 9 +) Postgresql (or Oracle 9 +) Jakarta Tomcat 4.x/5.x (I used 6.x) Jakarta Tomcat 4.x/5.x (I used 6.x) Can also run on Jetty or Caucho Resin Can also run on Jetty or Caucho Resin Fedora Fedora JDK JDK Optional Optional MySQL MySQL Postgresql Postgresql Oracle 9 Oracle 9 Jakarta Tomcat Jakarta Tomcat Ant if building from source code Ant if building from source code
TRLN: Staff Enrichment Series: 8 Nov, 2007 File Size & Download times DSpace DSpace 16 mb 16 mb 1:43 over a T1 line 1:43 over a T1 line 1:13 on a T line 1:13 on a T line Fedora Fedora 72 mb 72 mb 7:49 over a T1 line 7:49 over a T1 line 1:53 over a T line 1:53 over a T line
TRLN: Staff Enrichment Series: 8 Nov, 2007 Installation time DSpace DSpace Postgresql installation and set up: 8 minutes Postgresql installation and set up: 8 minutes Ant build and configuration: 8 minutes Ant build and configuration: 8 minutes DSpace/Tomcat configuration and deployment: 8 minutes DSpace/Tomcat configuration and deployment: 8 minutes Total time to live: 24 minutes Total time to live: 24 minutes Fedora Fedora Postgresql installation and set up: 8 minutes Postgresql installation and set up: 8 minutes Fedora install: 5 minutes Fedora install: 5 minutes Total time to live: 13 minutes Total time to live: 13 minutes
TRLN: Staff Enrichment Series: 8 Nov, 2007 Initial Live View DSpace DSpace Front Page Front Page Front Page Front Page Fedora Fedora Front Page Front Page Front Page Front Page
FOE2.0 Choosing our Digital Repository
TRLN: Staff Enrichment Series: 8 Nov, 2007 Deciding Factors DSpace DSpace Off-the-shelf view Off-the-shelf view Workflow process Workflow process Individual submitters, one project admin Individual submitters, one project admin Item submission form (link here) Item submission form (link here) Bulk load script (dc, item, mapfile) Bulk load script (dc, item, mapfile) Searchbot harvestable Searchbot harvestable OAI harvestable OAI harvestable Fedora Fedora Off-the-shelf view Off-the-shelf view One submitter One submitter Item submission not intuitive (link) Item submission not intuitive (link) Bulk load script (foxml) Bulk load script (foxml) Content Models (will return) Content Models (will return) Dissemenators Dissemenators Behavior Definitions Behavior Definitions Would require extensive programming Would require extensive programming
TRLN: Staff Enrichment Series: 8 Nov, 2007 FOE2.0 = DSpace Cup is Half Full March 2006 March 2006 Foundations new home Foundations new home Data submission form Data submission form Item View bld00012 Item View bld00012bld00012 Item Update Item Update Access Restrictions Access Restrictions Handle server Handle server
TRLN: Staff Enrichment Series: 8 Nov, 2007 FOE2.0 = DSpace Cup is Half Empty Object is entered as one item Object is entered as one item DSpace is self-contained DSpace is self-contained No real way to show complex relationships No real way to show complex relationships All or nothing metadata All or nothing metadata Access Restrictions Access Restrictions Handle server Handle server Searchbot indexing: Searchbot indexing: Item 2193/77 Title:, A. Jack Tannenbaum. Issue Date:, 10-Nov Abstract:, A. Jack Tannenbaum received his medical degree from Duke University in Item 2193/77 Title:, A. Jack Tannenbaum. Issue Date:, 10-Nov Abstract:, A. Jack Tannenbaum received his medical degree from Duke University in Item 2193/77 Item 2193/77
FOE3.0 “Our goal is to never be satisfied”
Content Models Reusing datastreams (next 2 slides borrowed from EDUCASE 2004 presentation by Grizzle, Wayland, and Wilper)
TRLN: Staff Enrichment Series: 8 Nov, 2007 Atomistic Model
TRLN: Staff Enrichment Series: 8 Nov, 2007 Compound Model
TRLN: Staff Enrichment Series: 8 Nov, 2007 An old favorite blanket Fedora minimally utilized Fedora minimally utilized Primarily used for archiving Library Administrative documents (Council and Management Team minutes, and Policies and procedures) Primarily used for archiving Library Administrative documents (Council and Management Team minutes, and Policies and procedures) Use of XACML policies to restrict access (156\.16\.\d{1,3}\.\d{1,3} lock down) Use of XACML policies to restrict access (156\.16\.\d{1,3}\.\d{1,3} lock down) Began looking at front-end GUIs Began looking at front-end GUIs
TRLN: Staff Enrichment Series: 8 Nov, 2007 Front End tools Fez – A web front-end management system for Fedora that is developed in PHP. Fez functionality includes: Web-based browsing and searching; Semi- advanced searching; Complex security; Basic image handling; Dublin Core. espace.library.uq.edu.au/documentation/ Fez – A web front-end management system for Fedora that is developed in PHP. Fez functionality includes: Web-based browsing and searching; Semi- advanced searching; Complex security; Basic image handling; Dublin Core. espace.library.uq.edu.au/documentation/ Elated - ELATED is a lightweight, general-purpose application for managing digital files. ELATED is built on top of the Fedora Repository system, and can be used as a digital assets management system, an institutional repository, or to meet other collection archiving, publishing and searching needs. Dublin Core metadata entry and search; Custom metadata by collection; Automatic previews for images; Collections with simple editorial workflow; Indexing and searching of content; User feedback, enabled by collection; Select and import existing Fedora objects Elated - ELATED is a lightweight, general-purpose application for managing digital files. ELATED is built on top of the Fedora Repository system, and can be used as a digital assets management system, an institutional repository, or to meet other collection archiving, publishing and searching needs. Dublin Core metadata entry and search; Custom metadata by collection; Automatic previews for images; Collections with simple editorial workflow; Indexing and searching of content; User feedback, enabled by collection; Select and import existing Fedora objects Both require extensive programming for localization Both require extensive programming for localization
TRLN: Staff Enrichment Series: 8 Nov, 2007 External Forces at play Fall 2006 we began a project to digitize 10,000+ cytopathology slides. Fall 2006 we began a project to digitize 10,000+ cytopathology slides. Images converted to JPEG2000 to increase user experience (example) Images converted to JPEG2000 to increase user experience (example)example Archives purchased Aware JPEG2000 Image Server Archives purchased Aware JPEG2000 Image Server History of Medicine image database, Historical Images in Medicine (HIM) needed new platform History of Medicine image database, Historical Images in Medicine (HIM) needed new platform
TRLN: Staff Enrichment Series: 8 Nov, 2007 Call out of the blue VTLS – Vital VTLS – Vital Open Repositories Open Repositories
TRLN: Staff Enrichment Series: 8 Nov, 2007 FOE3.0 = Fedora/Vital Cup is Half Full June 2007 June 2007 Foundations new home (link) Foundations new home (link) Data submission (3 ways to enter items) Data submission (3 ways to enter items) Item View bld00012 Item View bld00012bld00012 Object is entered as many datastreams (fedora view) Object is entered as many datastreams (fedora view)fedora viewfedora view Vita/Fedora/Aware…interoperability Vita/Fedora/Aware…interoperability Complex relationships Complex relationships Multiple metadata streams Multiple metadata streams Handle server Handle server Searchbot indexing: Searchbot indexing: A. Jack Tannenbaum. | MeDSpace Description: A. Jack Tannenbaum received his medical degree from Duke University in per00165, A. Jack Tannenbaum kB, JPEG 2000 Image... A. Jack Tannenbaum. | MeDSpace Description: A. Jack Tannenbaum received his medical degree from Duke University in per00165, A. Jack Tannenbaum kB, JPEG 2000 Image... A. Jack Tannenbaum. | MeDSpace A. Jack Tannenbaum. | MeDSpace
TRLN: Staff Enrichment Series: 8 Nov, 2007 FOE3.0 = Fedora/Vital Cup is Half Empty Fedora is open source, Vital is not Fedora is open source, Vital is not Customization possible with programming knowledge Customization possible with programming knowledge No way at this time to implement xacml policies (work arounds exist) No way at this time to implement xacml policies (work arounds exist) Vital upgrades require full software installation Vital upgrades require full software installation Local customization can cause breaks in certain functions Local customization can cause breaks in certain functions
Conclusions and obligatory links
TRLN: Staff Enrichment Series: 8 Nov, 2007 Selected Links DSpace – Manakin Fedora – Elated Fez Vital – – MeDSpace –