László Drótos – Márton Németh National Széchényi Library Department of Electronic Library Services Web archiving Planning a new pilot project
A brief outlook Why do we have to archive the web? Web archiving pilot project at the National Széchényi Library Plans to the future
A national reading campaign project by huge spendings- Why? -1 A national reading campaign project by huge spendings- you cannot find any details nowadays…
Why - 2 The first Hungarian online news portal is only living in our memories….
Why - 3 IWIW-When a whole social network portal is disappearing from the web….
General Principles Mission of public collections: Preserving the digital culture in their own domain for the future and make it researchable in the present (national, public and research libraries, museums, archives, audiovisual archives) Selective (Institution type, genre, topic, event, celebrities etc.) Comprehensive, without selection (sub-domain, country-domain, national web space, global web space) Regulated (legal deposit law, copyright law, personal and business privacy) Organized (Sharing tasks by public collections, cooperation with content providers, standard solutions, interconnected archives etc.) Sustainable (pl. EU and national funds, research grants, sponsorship, project grants, value added commercial services)
Global overview Early projects: Several generation-shifts (hardware, software, methodology) IIPC consortium for stakeholders in web archiving: http://netpreserve.org Internet Archive: http://web.archive.org (since 1996, 286 billion webpages) Approximately 40 national web archives on national level Small-scale projects: public collections, public administration bodies, research and higher education institutions, private companies
Software background Solutions recommended by IIPC (International Internet Preservation Consortium) mainly on single or multiple linux servers Heritrix harvesting software (crawler, spider, web robot) OpenWayback display interface NutchWAX search engine Web Curator Tool or NetarchiveSuite framework systems WARC storage format
The current situation in Hungary No current web-archive in Hungary First proposal made in 2006 for a Hungarian Internet Archive Early 2010’s: A pilot project at ELTE University, Department of Philosophy of Science and History of Science: Approx. 400 webpages, scientific and education institutions, news portals National Audiovisual Archives collects news for public media (approx. 2 million records)
NSZL and web archiving A general national government project at NSZL: March 2017- December 2018 – update the complete e-service library infrastructure Web archiving pilot project is a segment of the general project Testing technologies, Get international experiences, sufficient theoretical, practical skills and competences Hardware background served by the Government Informatics Agency (KIFÜ-NIIF) Establishing partnership with pubic collections and other stakeholders
Aims and goals Establish a Hungarian Web Archive Regularly harvesting thousands of Hungarian websites Event-based harvesting Comprehensive harvest from the Hungarian web space 2x a year Managing digital legal deposit and other online contents Long term preservation and display Providing services: general users, content providers, scientific, education, government and business sectors. Sufficent staff with proficiency to manage on-demand small scale web archiving projects to different stakeholders Proper legal environment enabling web archiving to Hungarian public collections and presenting the archived content to the general public according to privacy and copyright rules
Activities PR Education Finding partners Selecting websites and getting permissions to archive Metadata enrichment Quality-check Software testing and other IT tasks
Temporary project website: http://mekosztaly.oszk.hu/mia
Wiki: http://mekosztaly.oszk.hu/mia/MIA_wiki.html
E-mail list: http://mekosztaly.oszk.hu/cgi-bin/mailman/listinfo/mia-l
Pilot harvests from the Budapest Zoo website
Archived website on the Open WayBack interface
References György Kampis, László Gulyás. “Big is small and changes slowly in Hungary” 2013 IEEE 4th International Conference on Cognitive Infocommunications (CogInfoCom) http://mekosztaly.oszk.hu/mia (in Hungarian with English summary) http://www.netpreserve.org http://web-archive.org
Thank you! Further information László Drótos project coordinator mekdl@iif.hu Márton Németh web librarian mnemeth@oszk.hu