Presentation is loading. Please wait.

Presentation is loading. Please wait.

Building a hosted repository service on DSpace Matthew Cockerill Director of Operations BioMed Central Ltd. Open Repository.

Similar presentations


Presentation on theme: "Building a hosted repository service on DSpace Matthew Cockerill Director of Operations BioMed Central Ltd. Open Repository."— Presentation transcript:

1 Building a hosted repository service on DSpace Matthew Cockerill Director of Operations BioMed Central Ltd. Open Repository

2 What is Open Repository  A hosted repository service  Based on DSpace  Operated by BioMed Central

3 Outline  Background on BioMed Central  Why is there a need for a hosted repository service?  Why build it on DSpace?  Why choose Open Repository?  Technical implementation challenges  Other challenges

4 Background on BioMed Central  Scientific publisher,founded in 1999  All research articles Open Access  130+peer-reviewed journals  10,000+ articles published  Continuing to grow rapidly

5 Open Access research  All research distributed under the Creative Commons Attribution License:  Allows –Redistribution –Reuse –Creation of derivative works –Commercial or non-commercial

6 Institutional repositories and Open Access publishing  Sometimes seen as alternative roads to Open Access  In fact roads are very complementary  Repositories can contain both: –Manuscript copies of articles from 'traditional journals' –Final, structured versions of articles from open access journals  We expect growth in repositories to go hand in hand with growth in Open Access publishing

7 Outline  Background on BioMed Central  Why is there a need for a hosted repository service?  Why build it on DSpace?  Why choose Open Repository?  Technical implementation challenges  Other challenges

8 Why is there a need for a hosted repository service?  Not all institutions want to operate, maintain and customize their own repository  Small institutions –Hosted solution can offer better value, due to economies of scale –Alternative 'shoestring' solutions are possible but do not give reliability of flexibility  Large institutions –Hosted solution may give greater flexibility

9 BioMed Central's track-record as a service provider  Has developed and operated a 24/7 web-based journal workflow system for thousands of authors, reviewers, and journal editors since 2000  25,000+ manuscripts have been submitted to BioMed Central journals to date

10 Outline  Background on BioMed Central  Why is there a need for a hosted repository service?  Why build it on DSpace?  What does OR offer compared to regular DSpace  Technical implementation challenged  Other challenges

11 Why was DSpace chosen as the foundation for Open Repository  Java-based  Large, active and diverse community of developers  Designed with the big issues in mind –Modularity/extensibility –Scalability –Interoperability –Long term digital preservation  BSD-licensed

12 BSD License Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. Neither the name of the nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

13 Outline  Background on BioMed Central  Why is there a need for a hosted repository service?  Why build it on DSpace?  Why choose Open Repository?  Technical implementation challenges  Other challenges

14 Why choose Open Repostory?  Does not require extensive in house IT skills/resources  Flexible customization  High availability, for a fraction of the price of a dedicated HA solution  Additional features compared to standard DSpace software

15 Why not to choose OR?  Not for every institutions  Some institutions choose to make a major investment in developing and extending the repository platform  In return for greater investment of staff and resources, an institution can – arbitrarily customize DSpace to its precise needs – steer the overall direction of the DSpace platform

16 Impact of RCUK position statement  The draft position statement on Open Access from RCUK proposes to mandate deposition of articles in an Open Access repository if available  Only a small minority of UK institutions currently have repositories  RCUK policy likely to encourage many smaller institutions to consider setting up repositories

17 High Availability  Commercial Tier-1 network datacentre  24x7 monitoring, troubleshooting and fault resolution  Fully redundant infrastructure: power / internet / firewall / LAN etc  High-end fibre-channel/RAID storage  DSpace Tomcat servers configured as an active/passive cluster  Oracle database - 2-node RAC cluster + offsite standby database

18 Examples of functionality added to core DSpace platform  Automatic population of repository with Open Access content  Improvements to ease-of-use of submission system  Automated conversion of proprietary file formats to PDF suitable for archiving  XML markup of submitted articles  Enhanced usage reporting tools

19 Enhanced access statistics

20 Additional access stats reporting

21 Easy entry of metadata for items that are in PubMed

22

23

24

25 Keeping track of DOI/PubMed for items

26

27 XML full text rendering

28

29 Outline  Background on BioMed Central  Why is there a need for a hosted repository service?  Why build it on DSpace?  Why choose Open Repository?  Technical implementation challenges  Other challenges

30 Tomcat application  Running multiple instances of DSpace within Tomcat is fairly straightforward and works OK  Ultimately may need to tweak DSpace code to allow single DSpace application instance to have many 'faces' (different repositories) i.e. break the 1:1 relationship between application instance and repository  That is the approach we use to operate our 70 independent journal websites

31 Database issues  Each Repository needs it's own database schema (for metadata etc.)  Don't want to have to independently manage (dozens or hundreds) of database schemas  Need to maintain good performance  Also would like all DSpace instances to effectively share a pool of connections – difficult if each connection is tied to a different user/schema

32 Database solution: Part 1 1.Partition all tables, by a new repos_id column 2.Create a series of schemas, one for each Open Repository, identified by repos_id 3.Generate a set of views in each schema, which filter the underlying tables by the relevant repos_id 4.End result:  Schema appears to DSpace code to be indistinguishable from a dedicated schema  Single set of tables provide easy manageability  Partitioning ensures high performance

33 Database solution: Part 2 1.To allow efficient sharing of database connections, all connections use same username 2.ALTER SESSION SET CURRENT_SCHEMA used to point at correct schemaALTER SESSION SET CURRENT_SCHEMA 3.Oracle's connection attribute functionality is used to ensure that connections already pointing at the correct session are reused when possibleconnection attribute

34 Each DSpace instance has own connection pool OR1OR2OR3OR4OR5 Tomcat applications Database connections Database Webserver Active Inactive INEFFICIENT

35 DSpace instances share a connection pool OR1OR2OR3OR4OR5 Tomcat applications Database connections Database Webserver Active Inactive Shared connection pool EFFICIENT

36 Contributing code back to DSpace  BioMed Central intends to contribute many of its tweaks to the core DSpace code back to the DSpace project  Where possible, all proprietary functionality is being added as distinct modules  DSpace's architectural evolution will hopefully make this easier to achieve  BioMed Central's goal is for Open Repository to remain in sync, as far as possible, with the core DSpace code

37 Outline  Background on BioMed Central  Why is there a need for a hosted repository service?  Why build it on DSpace?  Why choose Open Repository?  Technical implementation challenges  Other challenges

38 Biggest challenge  Persuading authors to contribute content to the repository  Not trivial  Need to: –Make it as easy as possible –Carrots and sticks

39 Ease of use of BioMed Central’s manuscript submission system 96.8% rate ease of use as "good" or "very good"

40 End-to-end service  The Open Repository service is not just about providing the technology  Provision of training and ongoing technical support to the institution's repository administrators  Provide guidelines on best practice for successfully launching a repository

41 First live customer - INSERM

42 INSERM’s Open Repository

43 Acknowledgements  Open Repository team –Mark Merifield –Liam Lynch –Tom Mowlam –Marie Martens


Download ppt "Building a hosted repository service on DSpace Matthew Cockerill Director of Operations BioMed Central Ltd. Open Repository."

Similar presentations


Ads by Google