Presentation is loading. Please wait.

Presentation is loading. Please wait.

Netarkivet RESAW seminar, Dec 2-3, 2013 Day 1. Who are we today □Birgit N. Henriksen, head of digital preservation, KB □Bjarne Andersen, head of digital.

Similar presentations


Presentation on theme: "Netarkivet RESAW seminar, Dec 2-3, 2013 Day 1. Who are we today □Birgit N. Henriksen, head of digital preservation, KB □Bjarne Andersen, head of digital."— Presentation transcript:

1 netarkivet RESAW seminar, Dec 2-3, 2013 Day 1

2 Who are we today □Birgit N. Henriksen, head of digital preservation, KB □Bjarne Andersen, head of digital preservation, SB □Eld Zierau, developer and researcher, KB □Ditte Laursen, curator and researcher, SB □Henrik Smith-Sivertsen, researcher, KB

3 Organization □a virtual center (SB/KB – IT development, IT operation, Collection department) □steering committee □daily manager □editorial advisory board

4 Collection policy □Legal deposit law 2005: ”Materials made public via electronic communication network” □Danish materials  Websites on the.dk TLD  Websites minded on a Danish audience / written in Danish  Websites about Danish people (Hans Christian Andersen etc.)  More or less any site of interest to Denmark

5 Collection strategies □4 strategies ■ 4 annual snapshots (KB) □ensure the wide picture ■ Selective harvesting of 80 domains (SB) □ensure frequently updated websites ■ Event-harvesting of 2-3 national events per year (KB/SB) □2013: Teachers’ lockout, International Melodi Grandprix, Danish local elections, Election of the pope (IIPC) … ■ Special havests (KB/SB), ie. wikileaks, kriseinfo.dk, nyalliance.dk …

6 Collection strategies coverage time snapshot selective event special

7 Access □The archive contains sensitive personal data, therefore the entire archive is considered sensitive ■ only researchers including PhD students can be granted access □if research on sensitive personal data, the Data Protection Agency assesses the application □if not, the library assesses the application □the Copyright Act defines research as being from PhD level and up □the Privacy Act defines research as something with a ’scientific purpose’ □Netarkivet is working on a wider access ■ for students and for the general public ■ small corpus

8 Use of the archive □Only a handful active researchers ■ no user friendly way of accessing the archive ■ lack of knowledge about the archive ■ new kind of data source □Research projects – examples ■ dr.dk’s history 1996-2006 ■ the history of internet newspapers ■ the mediation of art in the network society ■ the digital music revolution – the case of Sys Bjerre ■ Danish parlimentary elections 2007-2011 …

9 Technical setup □NetarchiveSuite (open source) □44 servers, 260 running java apps □WayBack-machine □Batch-jobs □Full-text indexing experiments □ARC/WARC

10 Some numbers □Total: 414 TB – 13 billion objects  Snapshots: 353 TB  Selective: 47 TB  Events: 13 TB □One snapshot: approx. 30 TB (2006: 9 TB)

11 Current challenges □wider access □better access (free text search) □inclusion of older net collections □collection of websites with restricted access □advanced websites, ie. with sound/video/live interaction (chat, virtual worlds …) □electronic communication networks ≠ the web □long-term preservation □documentation

12 2013-2014  Tools  search - free text indexes  harvesting - the use of Heritrix3 and Live Archiving proxy  Infrastructure  web archives as part of a research infrastructure  access to archived material using Persistant Identifiers  Archiving methods  capturing online games  automatic methods to locate relevant Danish web materials outside the Danish TLD.dk

13 Ongoing activites related to RESAW’s topics □API improvement / so-called service layer □corpus building □documentation □full-text search □statistics □legal aspects (ie. broader access, data mining policy)

14 What is the RESAW project in 10 years? □a very strong partner to IIPC □common infrastructure across borders (ERIC / ESFRI status) □coordinated european collection building


Download ppt "Netarkivet RESAW seminar, Dec 2-3, 2013 Day 1. Who are we today □Birgit N. Henriksen, head of digital preservation, KB □Bjarne Andersen, head of digital."

Similar presentations


Ads by Google