Presentation is loading. Please wait.

Presentation is loading. Please wait.

Automation, Virtualization, and Integration of a Digital Repository Server Architecture or How to Deploy Three Production DSpaces in One Night.

Similar presentations


Presentation on theme: "Automation, Virtualization, and Integration of a Digital Repository Server Architecture or How to Deploy Three Production DSpaces in One Night."— Presentation transcript:

1 Automation, Virtualization, and Integration of a Digital Repository Server Architecture or How to Deploy Three Production DSpaces in One Night and Be Home for Dinner TAMU Libraries Digital Initiatives James Creel, Micah Cooper, Jeremy Huff TCDL 2015 Austin, TX

2 Talk Outline The OAK Trust digital repository Technical Debts
The cost of customization The evolution of server architecture A Trio of Innovations Automation of deployments Virtualization of infrastructure Modularization of customizations Lessons Learned for IT in libraries

3 The OAK Trust Digital Repository
A brief overview and history

4 OAK Trust A branded, customized DSpace instance hosted in-house at TAMU Libraries Launched as “The Texas A&M University Digital Repository” in 2005 with an eye toward archiving ETDs Rebranded with launch of the OAK (Open Access to Knowledge) Fund Has grown to host ~70,000 items, including articles, books, maps, and photography from diverse sources which underwrites TAMU researchers’ publication fees for open-access journals if they agree to submit their articles to the repository

5 OAK Trust - Hosting From inception to 2013, hosted on dedicated Solaris hardware. Database, assetstore, and SSO authentication all hosted on their own separate hardware.

6 OAK Trust Customizations
Over half a dozen XMLUI themes Extensive custom Java Expanding/collapsing Community/collection browser Links to collection handles from group listing on profile page Record context to keep page on login Metadata-tree browser within collection Export metadata from search results TAMU , Image Gallery, Primeros Libros, Periodicals, Geofolios, ESL, Capstone, Fanzine

7 Cumulative Costs of Customization
Technical Debt Cumulative Costs of Customization

8 Manually Upgrading DSpace Preparations in Development Environment
Compare old and new configuration files line by line Get a realistic duplicate of the production db mvn package, ant install Tweak configurations as needed Test basic things Themes look good, widgets work, search and browse work, webapps run ok This can be an iterative process instead of three steps; you might end up having to go back into development after problems become apparent in pre-production Configuration tweaks include server addresses and directory paths

9 Problems with the Development Deployment
Configuration files are big, and the old config and new config must be compared line by line Java files reference each other’s contents in structurally and nominally particular ways A change to core code on which your customization depends requires that the customization be rewritten Coding to Java interfaces helps, but interfaces change too

10 Manually Upgrading DSpace Preparations in Pre-production
Mount assetstore and log directory mvn package, ant install Tweak configurations and environment as needed Test more extensive things Authentication works, communication with other servers works Configuration tweaks include server addresses and directory paths Environment changes may include java version, tomcat version, build tools versions

11 Problems with the Pre-production Deployment
Pre-production environment on a physically provisioned machine is rather different from your development one Surprises in the tweaks (e.g. “Oh, we need Java 1.7 not 1.6”) must be meticulously recorded in anticipation of the ultimate production deployment We develop typically on Macs, and historically were deploying to Solaris.

12 Manually Upgrading DSpace in Production
Announce plans for downtime to customers and family members Mount assetstore and log directory mvn package, ant install Tweak configurations and environment as needed Test even more things Authentication still works, handle server works, statistics still showing up, all the webapps reply Configuration tweaks include server addresses and directory paths Environment changes may include java version, tomcat version, build tools versions

13 Problems with the Production Deployment
The expanded to-do list for the pre-production deployment may be lengthy – the team is then expected to perform the procedure identically on the production box with minimal downtime Production environment on a physically provisioned machine is always at least a little different from the pre-production one

14 Summary of Problems Rewriting features to work after changes in the stock code base Hardware and software environment differences Reproducing an extensive, detailed process perfectly, by hand, late into the evening

15 Three Remedies to the Deployment Problems
Relieving the burden of technical debt Modularization of Code Virtualization of Infrastructure Automation of Deployment

16 Modularization of code
Problem: Rewriting custom features to work after changes in the stock code base Solution: separate out customizations and cleanly integrate them with core code Solution in context: DSpace modules modularizing XSL pull requests to DuraSpace

17 Modularization of code
Dspace Modules: Since DSpace 3x customizations to core DSpace are possible by overriding core files with custom files placed in a modules directory. Adding your own customization need not disturb the core code base.

18 Modularization of code
Modularizing XSL: We have continued in this principle of hierarchical modularization by putting empty placeholders in stock XSL, enabling extension in sub-themes. This has lead to an extreme reduction in redundant code—with some file being reduced in excess of 90% BEFORE AFTER

19 Modularization of code
Pull requests to DuraSpace: Technical debt can be further reduced by adopting the open source mindset of developing for the larger community first, as opposed to an institutionally centric approach If a custom feature is integrated into the core code, it need not be locally rewritten when upgrading

20 Virtualization of Infrastructure
Problem: Server environments are inevitably unique and idiosyncratic on physically provisioned hardware for development, pre-production, and production Solution: Deploy virtual machines with standardized environments, abstracting away hardware concerns Solution in context: Open Stack, vmware, Vagrant

21 Virtualization of Infrastructure
VMware: a framework for the creation and management of completely virtualized sets of hardware. Vagrant: lightweight, reproducible and portable virtual development environment.

22 Automation of Deployment
Problem: People make mistakes when forced to execute detailed procedures in a hurry, and it’s stressful anyway! Solution: Script the deployment so it is programmatically identical with each execution Solution in context: Chef

23 Virtualization of Infrastructure
Chef: “Code as Infrastructure” – Chef is a framework for the scripted automation of application deployment. It is: Version-able Testable repeatable

24 Amusing Anecdotes and Takeaways

25 Big Technical Changes are Expensive
Implementing a virtual infrastructure and automating deployment is a huge cultural and technical shift. Many stakeholders have to buy-in to the long-term investment. Lots of work has to be done before any benefit is realized.

26 Production deployments of DSpace at TAMU is now fast!
The work is nearly all front-loaded The production deployment is a “one-click” process, undertaken with a higher degree of certainty

27 Thanks for coming! Any questions?
TAMU Libraries Digital Initiatives James Creel, Micah Cooper, Jeremy Huff TCDL 2015 Austin, TX


Download ppt "Automation, Virtualization, and Integration of a Digital Repository Server Architecture or How to Deploy Three Production DSpaces in One Night."

Similar presentations


Ads by Google