Presentation is loading. Please wait.

Presentation is loading. Please wait.

European digital repositories: an overview ELAG 2006, Bucharest 26.-28.4.2006 Juha Hakala Helsinki University Library.

Similar presentations


Presentation on theme: "European digital repositories: an overview ELAG 2006, Bucharest 26.-28.4.2006 Juha Hakala Helsinki University Library."— Presentation transcript:

1 European digital repositories: an overview ELAG 2006, Bucharest 26.-28.4.2006 Juha Hakala Helsinki University Library

2 Contents  The problem  The solution: legal, organisational and technical aspects  Standards – metadata and other stuff  Open source applications Web archives (IIPC) Web archives (IIPC) Institutional repositories Institutional repositories  Commercial applications

3 The problem  ”…digital preservation is going to be an enormous issue – a very fundamental problem at all levels from the nation-state to the individual. In my view, it’s going to attract increasing commercial interest, as well as growing unease and concern from the general public, over the next decade.” Clifford Lynch, Where do we go from here? The next decade for digital libraries. D-Lib Magazine, July/August 2005

4 The problem (2)  Digital preservation is not only a technical, but also legal and organisational problem There has to be legal / contractual basis for preserving e.g. publications, governmental publications and research data There has to be legal / contractual basis for preserving e.g. publications, governmental publications and research data These laws must assign responsibility to a limited set of organisations, which must have sufficient human and other resources for taking care of the task These laws must assign responsibility to a limited set of organisations, which must have sufficient human and other resources for taking care of the task  How to decide what deserves to be preserved, because not everything can be kept?

5 The solution: legal aspects  In Europe, many countries have revised their legal deposit acts so that they cover broad array of digital assets This trend started from Norway (I think) in early 90’s This trend started from Norway (I think) in early 90’s  New legal deposit acts share features, due to co-operation between the law makers E.g. harvesting the national Web space E.g. harvesting the national Web space  Some aspects of preservation relate to copyright act; it has to be revised as well A digital archive must be able to copy & migrate the documents (and remove copy protection if needed) A digital archive must be able to copy & migrate the documents (and remove copy protection if needed) EU Copyright directive makes this possible EU Copyright directive makes this possible

6 The solution: organisational aspects  Digital preservation requires co-operation across traditional organisational borders Libraries, archives, and museums must join forces with other relevant players in the public sector, publishers, book sellers, IT business, etc.: all the money and support we can get will be needed Libraries, archives, and museums must join forces with other relevant players in the public sector, publishers, book sellers, IT business, etc.: all the money and support we can get will be needed  In the national level, there nevertheless should be one organisation to co-ordinate the effort  International co-operation has already shown its usefulness, but we need more of it International Internet Preservation Consortium, IIPC International Internet Preservation Consortium, IIPC

7 The solution: technical aspects  Digital archive must guarantee continuous access to and usage of the archived assets. Therefore, it must: Keep the bits, and Keep the bits, and Migrate the assets to new SW platforms, and/or Migrate the assets to new SW platforms, and/or Emulate the original HW/SW environment in the new technical environment Emulate the original HW/SW environment in the new technical environment  Migration and emulation must both be used; the best choice depends on the asset There is no agreement on the best overall strategy There is no agreement on the best overall strategy

8 The solution: technical aspects (2)  We do not know how often we need to migrate assets and how hard that is going to be, or how often we must build a new emulator and how complicated that will be It is impossible to make an estimate of the cost of digital archiving It is impossible to make an estimate of the cost of digital archiving  Digital archaeology: develop means of helping the users to access ”out-of-date” resources An expensive means of accessing data which has not been preserved properly An expensive means of accessing data which has not been preserved properly

9 The solution: technical aspects (3)  There isn’t much serious research on emulation & migration; among the national libraries, Koninklijke Bibliotheek is probably unique on its investment on this  Luckily the results from the KB will most likely be universally applicable Plenty of room for European / global co-operation Plenty of room for European / global co-operation  Proving that an archive works for 1000 years will be difficult; much harder than proving with 100 % certainty that it does not…

10 Standards: metadata and other stuff  Preservation method as such is only a backbone of an operational system, other things are needed, such as: Overall architecture of the digital archive Overall architecture of the digital archive Preservation metadata Preservation metadata  Open Archival Information System (OAIS) gives us a good starting point for the former, although it has been extended in various ways in real life digital archiving projects like NEDLIB Interestingly, we lack a similar standard for digital asset management systems (and an agreement on who should develop it) Interestingly, we lack a similar standard for digital asset management systems (and an agreement on who should develop it)  We still lack proper understanding of preservation metadata ISBD(ER) is definitely not sufficient from preservation point of view ISBD(ER) is definitely not sufficient from preservation point of view  New identifiers and identifier systems capable of covering large extent of digital assets must be designed as well.

11 Applications  There is no – and will not be – one digital archive to fit all purposes; instead, there will be domain / content specific modules, coupled with generic preservation tools  There will be both open source and commercial applications, which can be utilised side by side in a broader digital library environment  Popularity of Open source may slow down development of commercial tools Institutional repositories Institutional repositories

12 Web archiving  Initiated by the Royal Library of Sweden in Mid- 90’s; work continued in the NEDLIB project and finally in IIPC (International Internet Preservation Consortium)  Within a decade, both legal and technical aspects of Web archiving have been solved, at least for the time being  Many legal deposit acts incorporate Web harvesting; it is seen as the only feasible way of extending legal deposit to the Web

13 Web archiving (2)  The market for Web archiving applications is very small; therefore some (primarily) European national libraries formed IIPC with the Internet Archive  IIPC has built a Web harvester (Heritrix) and special tools for indexing and accessing the harvested resources  All tools are available for free, and they are still being developed by the growing consortium

14 Web archiving: practice  The Internet Archive has harvested 60 billion Web pages globally since ~1996  Numerous countries have harvested their domestic Web spaces, either selectively or as much of it as possible; there is increasing awareness that this makes sense Europe: about 30-40 % of countries doing something? Europe: about 30-40 % of countries doing something?  Sustainability of the archives is not fully proven yet; exotic resources and the sheer size of the archive may become a problem eventually

15 Institutional repositories  Not really digital archives, but are used as ”short term substitutes” of them  There are a few open source software packages available (such as Fedora, EPrints and DSpace), and under the recent years they have developed faster than at least some of their (more generic) commercial counterparts  We do not know yet how archiving-related software modules such as migration tools can be linked to the repository tools

16 Commercial applications  Two interrelated problems: Target user group Target user group Functionality and links to other (library) applications Functionality and links to other (library) applications  Digital archives will not be built (only) for libraries, but (national) libraries will probably be among key customers Will our needs be similar enough to those of e.g. national archives or pharmaceutical companies? Will our needs be similar enough to those of e.g. national archives or pharmaceutical companies?  It is expensive to build a digital archive SW Will library system vendors be able to make it? Will library system vendors be able to make it?

17 Commercial applications: functionality  Should two national libraries write an RFP for a digital archive application, there would be plenty of disagreement in details  Vendors will have problems understanding what the libraries and other customers want; the situation will only get better when our understanding of what digital archiving is, gets better

18 Commercial applications: present  IBM DIAS The only digital archive application available now The only digital archive application available now Architecture based on the OAIS model Architecture based on the OAIS model Functionality tuned to the requirements of the Koninklijke Bibliotheek; their system will contain more than 9 million scientific articles by the end of 2006 Functionality tuned to the requirements of the Koninklijke Bibliotheek; their system will contain more than 9 million scientific articles by the end of 2006 The only (?) other customer for now is DDB; more are needed to guarantee DIAS survival over long term The only (?) other customer for now is DDB; more are needed to guarantee DIAS survival over long term  Others? None, yet, but it is likely that there will be at least a few more (just like Cliff Lynch predicted) None, yet, but it is likely that there will be at least a few more (just like Cliff Lynch predicted) What kind of content will they deal with, and how? What kind of content will they deal with, and how?

19 Conclusion  The train is moving, but we are not far from the station we took off  The key problem will be (lack of) funding: National libraries have problems with the cost of storing even printed materials (or at least my library has); the cost of storing digital assets will come on top of the cost of traditional storage National libraries have problems with the cost of storing even printed materials (or at least my library has); the cost of storing digital assets will come on top of the cost of traditional storage How do we prove that digital archiving is so important that the additional funding is justified? How do we prove that digital archiving is so important that the additional funding is justified? How much can we cut the costs via further IIPC-like cooperation in developing applications and best practices? How much can we cut the costs via further IIPC-like cooperation in developing applications and best practices?


Download ppt "European digital repositories: an overview ELAG 2006, Bucharest 26.-28.4.2006 Juha Hakala Helsinki University Library."

Similar presentations


Ads by Google