László Drótos – Márton Németh National Széchényi Library Department of Electronic Library Services Web archiving Planning a new pilot project.

Slides:



Advertisements
Similar presentations
Recent developments in digital archiving and preservation Jan Fullerton Director General National Library of Australia.
Advertisements

Libraries for Future Generations Martha Anderson Director National Digital Information Infrastructure and Preservation Program The Library of Congress.
Near East Plant Protection Network for Regional Cooperation & Knowledge Sharing Food and Agriculture Organization of the United Nations An Overview on.
Online Government June/2002 Public FTAA.ecom/inf/141/Add.3 June 4, 2002 Original: Spanish Translation: FTAA Secretariat.
Providing collections, tools and services for digital humanities A national library perspective Clément Oury Head of Digital Legal Deposit Bibliothèque.
1 What is the Internet Archive We are a Digital Library Mission Statement: Universal access to human knowledge Founded in 1996 by Brewster Kahle in San.
EU-funded Digital Preservation Research APA 2014 Conference Brussels, 22 October 2014 Dr. Manuela Speiser European Commission DG CONNECT, unit "Creativity"
Bibliothèque nationale de France Tallinn, BnF update: production and development priorities in 2015.
BUILDING DIGITAL WEB ARCHIVES FOR FUTURE SCHOLARS Jani Stenvall
Maria Teresa Natale Giza, 4 April 2006 Quality web communication according to MINERVA Maria Teresa Natale Ministerial NEtwoRk for Valorising Activities.
Digitisation projects and preserving digital documents in Hungary Current trends in digitisation DELOS, Turin, 3-4. febr István Moldován Hungary,
New organisational perspectives in 'library business' in the future – case study Finland Kristiina Hormia-Poutanen National Library of Finland.
National Digital Library of Finland: How to Enable Access to Digital Cultural Material to Users of Today and to Future Generations Minna Karvonen IASSIST.
1 EuropeanaLocal- Europeana Knowledge Sharing Workshop EuropeanaLocal- Europeana Knowledge Sharing Workshop 13/14 January 2009 Rob Davies, Scientific Co-ordinator.
1 Archiving and Preserving the Web Kristine Hanna Internet Archive April 2006.
Facilitate Open Science Training for European Research Where Librarians can learn and teach Open Science for European Researchers LIBER 2015 London,
1 WEB ARCHIVING IN THE BRITISH LIBRARY John Tuck Head of British Collections February 2004.
The capture and preservation of websites at the National Library of New Zealand Gillian Lee Alexander Turnbull Library.
1 Archive-It Training University of Maryland July 12, 2007.
Canadian Research Libraries: A History of Cooperation Canadian Research Libraries: A History of Cooperation Gwendolyn Ebbett Dean of the Library University.
Promoting Digital Preservation Partnerships at the U.S. Library of Congress April 2004.
Annick Le Follic Bibliothèque nationale de France Tallinn,
Bibliography in the Digital Age - IFLA Satellite Meeting Warsaw, 9 August Online materials published in Austria collecting, archiving and metadata.
WebArchiv Czech Web Archive IIPC 2007, Paris.
1 Archiving and Preserving the Web Dan Avery Kristine Hanna Merrilee Proffitt Internet Archive RLG April 2006.
Web Capture team Office of strategic initiatives February 27, 2006 Selecting Content from the Web: Challenges and Experiences of the Library of Congress.
Digital Library Initiatives in Hungary The Brief History of the Hungarian Electronic Library István Moldován National Széchényi Library
European digital repositories: an overview ELAG 2006, Bucharest Juha Hakala Helsinki University Library.
HathiTrust Digital Library. Overview ›Began in 2008 ›Large scale digital preservation repository ›Partnership of major research libraries ›Focus on both.
Annick Le Follic Bibliothèque nationale de France Tallinn,
Digitization Panel August 12, 2010 Christopher C. Brown, coordinator Mike Culbertson, Colorado State U. James Mauldin, GPO.
CNI Fall Task Force, December 2007 International Internet Preservation Consortium Abbie Grotke IIPC Communications Officer Library of Congress & George.
ICT PSP Infoday Brussels Call 2011 – Theme 2 Digital Content ICT-PSP Call Theme 2: Digital Content Federico Milani, Marc Röder Infso E6/eContent.
27. August Kyung-Ho Choi Manager of Digital Archiving Division The National Library of Korea Sang-hoon Oh Secretary of General in.
EUscreen: Examining An Aggregator ’ s Role in Digital Preservation Samantha Losben Digital Preservation - Final Project December 15, 2010.
1 Archive-It: Archiving and Preserving Born Digital Content NDIIPP June 2009 Molly Bragg Partner Specialist Internet Archive.
Towards a European network for digital preservation Ideas for a proposal Mariella Guercio, University of Urbino.
Preserving Digital Culture: Tools & Strategies for Building Web Archives : Tools and Strategies for Building Web Archives Internet Librarian 2009 Tracy.
Netarkivet RESAW seminar, Dec 2-3, 2013 Day 1. Who are we today □Birgit N. Henriksen, head of digital preservation, KB □Bjarne Andersen, head of digital.
IFAP Special Event: Information and Knowledge for All, Emerging Trends and Challenges Information Preservation 4000 Years of Traditions Challenged by Digital.
Electronic publications in the Swiss National Library ELAG 2005 CERN, Geneva, June 1-3, 2005 Barbara Signori Swiss National Library (SNL)
LaProf: Language Learning for Professionals in ICT and Agriculture The ELSTI and LaProf projects have been carried out with the support of the European.
MTA SZTAKI Department of Distributed Systems The problems of persistent identifiers in the context of the National Digital Data Archives of Hungary András.
NetarchiveSuite Meeting, BnF, Austria Updates and Plans for 2012 Michaela Mayr, Andreas P. Austrian National Library
David Carr The Wellcome Trust Data management and sharing: the Wellcome Trust’s approach Economic & Social Data Service conference.
Tsinghua University Library Yang Zhao & Airong Jiang Tsinghua University Library, Beijing China 4 June, 2004 Electronic Thesis and Dissertation System.
The MICHAEL Project is funded under the European Commission eTEN Programme The multilingual catalogue of digital cultural heritage in Europe.
Symposium on Global Scientific Data Infrastructures Panel Two: Stakeholder Communities in the DWF Ann Wolpert, Massachusetts Institute of Technology Board.
Unesco / WSIS+1026 February 2013 ENUMERATE: Measuring the progress of digital heritage in Europe Marco de Niet (DEN Foundation, NL) Unesco WSIS+10 Review.
China July 2004 The European Union Programmes for EU-China Cooperation in ICT.
Digital Library Program Forum March 31, 2003.
Archives, Libraries, Museums: Possibilities of Co-operation within the Enwirinment of the Global Information Infrastructure - Croatian experience Vlatka.
Grant Writing for Digital Projects September 2012 IODE Project Office IODE Project Office Oostende, Belgium Oostende, Belgium Sustainability and.
Strategies for archiving the Danish web space Bjarne Andersen Head of Digital Resources State and University Library, Aarhus
Copenhagen 11 March 2015 Dias 1 Theme 2a: Media Tools — NetLab, a Research Infrastructure for Internet Studies Niels Brügger, Aarhus University Advisory.
Finnish web-archive and digital legal deposit copies
Exploring Europe’s Television Heritage in the Digital Age
Who saves the memories for the Future? Libraries in the 21st century
IR implementation at the University of Venda, South Africa
DIGITAL RESOURCES Webharvesting and e-Born Archiving
Libraries, archives and museums working together. Learning by doing
Richard Waller NOF Technical Advisor UKOLN is supported by:
EOSCpilot Skills Landscape & Framework
Preserving Our Collective Digital History
Márton Németh – László Drótos (National Széchényi Library, Hungary)
Objectives, activities, and results of the database Lituanistika
Márton Németh – László Drótos How to catalogue a web archive?
Web archives as a research subject
New Platform to Support Digital Humanities in the Czech Republic
Metadata supported full-text search in a web archive
Presentation transcript:

László Drótos – Márton Németh National Széchényi Library Department of Electronic Library Services Web archiving Planning a new pilot project

A brief outlook Why do we have to archive the web? Web archiving pilot project at the National Széchényi Library Plans to the future

A national reading campaign project by huge spendings- Why? -1 A national reading campaign project by huge spendings- you cannot find any details nowadays…

Why - 2 The first Hungarian online news portal is only living in our memories….

Why - 3 IWIW-When a whole social network portal is disappearing from the web….

General Principles Mission of public collections: Preserving the digital culture in their own domain for the future and make it researchable in the present (national, public and research libraries, museums, archives, audiovisual archives) Selective (Institution type, genre, topic, event, celebrities etc.) Comprehensive, without selection (sub-domain, country-domain, national web space, global web space) Regulated (legal deposit law, copyright law, personal and business privacy) Organized (Sharing tasks by public collections, cooperation with content providers, standard solutions, interconnected archives etc.) Sustainable (pl. EU and national funds, research grants, sponsorship, project grants, value added commercial services)

Global overview Early projects: Several generation-shifts (hardware, software, methodology) IIPC consortium for stakeholders in web archiving: http://netpreserve.org Internet Archive: http://web.archive.org (since 1996, 286 billion webpages) Approximately 40 national web archives on national level Small-scale projects: public collections, public administration bodies, research and higher education institutions, private companies

Software background Solutions recommended by IIPC (International Internet Preservation Consortium) mainly on single or multiple linux servers Heritrix harvesting software (crawler, spider, web robot) OpenWayback display interface NutchWAX search engine Web Curator Tool or NetarchiveSuite framework systems WARC storage format

The current situation in Hungary No current web-archive in Hungary First proposal made in 2006 for a Hungarian Internet Archive Early 2010’s: A pilot project at ELTE University, Department of Philosophy of Science and History of Science: Approx. 400 webpages, scientific and education institutions, news portals National Audiovisual Archives collects news for public media (approx. 2 million records)

NSZL and web archiving A general national government project at NSZL: March 2017- December 2018 – update the complete e-service library infrastructure Web archiving pilot project is a segment of the general project Testing technologies, Get international experiences, sufficient theoretical, practical skills and competences Hardware background served by the Government Informatics Agency (KIFÜ-NIIF) Establishing partnership with pubic collections and other stakeholders

Aims and goals Establish a Hungarian Web Archive Regularly harvesting thousands of Hungarian websites Event-based harvesting Comprehensive harvest from the Hungarian web space 2x a year Managing digital legal deposit and other online contents Long term preservation and display Providing services: general users, content providers, scientific, education, government and business sectors. Sufficent staff with proficiency to manage on-demand small scale web archiving projects to different stakeholders Proper legal environment enabling web archiving to Hungarian public collections and presenting the archived content to the general public according to privacy and copyright rules

Activities PR Education Finding partners Selecting websites and getting permissions to archive Metadata enrichment Quality-check Software testing and other IT tasks

Temporary project website: http://mekosztaly.oszk.hu/mia

Wiki: http://mekosztaly.oszk.hu/mia/MIA_wiki.html

E-mail list: http://mekosztaly.oszk.hu/cgi-bin/mailman/listinfo/mia-l

Pilot harvests from the Budapest Zoo website

Archived website on the Open WayBack interface

References György Kampis, László Gulyás. “Big is small and changes slowly in Hungary” 2013 IEEE 4th International Conference on Cognitive Infocommunications (CogInfoCom) http://mekosztaly.oszk.hu/mia (in Hungarian with English summary) http://www.netpreserve.org http://web-archive.org

Thank you! Further information László Drótos project coordinator mekdl@iif.hu Márton Németh web librarian mnemeth@oszk.hu