UKOLN is supported by: Iniciativas de preservación de la Web: una visión actual Michael Day Digital Curation Centre, UKOLN, University of Bath, UK

Slides:



Advertisements
Similar presentations
ACHIEVING OPEN ACCESS TO UK RESEARCH : THE WORK OF THE JOINT INFORMATION SYSTEMS COMMITTEE Frederick J. Friend OSI Open Access Advocate JISC Consultant.
Advertisements

OA-Forum 1 st Workshop: Summing up & way forward Leona Carpenter (UKOLN) with Donatella Castelli (IEI-CNR) & Susanne Dobratz (HUB) Open Archives Forum.
Recent developments in digital archiving and preservation Jan Fullerton Director General National Library of Australia.
August 2005IFLA - CDNL1 The International Internet Preservation Consortium (IIPC)
The Economic and Social Data Service (ESDS) Kevin Schürer ESDS/UKDA ESDS Awareness Day 5 December 2003.
Issues and approaches to preservation metadata Michael Day UKOLN: UK Office for Library and Information Networking University of Bath
A survey of Web preservation initiatives Michael Day UKOLN, University of Bath 7 th European Conference on Research and Advanced Technology.
The PREMIS Data Dictionary Michael Day Digital Curation Centre UKOLN, University of Bath JORUM, JISC and DCC.
JISC/BL Workshop Digital Libraries and their services March 6, 2006 Richard Boulderstone Director eStrategy, The British Library.
Collection-level description & the Information Landscape: users evaluate strategies for resource discovery Collection Description Focus Workshop 5 Cambridge,
A centre of expertise in data curation and preservation Archiving Web-based recordsIWMW June 2006 Funded by: This work is licensed under the Creative.
A centre of expertise in data curation and preservation London :: ARK Group Workshop: Archiving the Web :: 28 Sept 2006 Funded by: This work is licensed.
A centre of expertise in data curation and preservation SoA Annual Conference::York::August 2008 Funded by: This work is licensed under the Creative Commons.
Providing collections, tools and services for digital humanities A national library perspective Clément Oury Head of Digital Legal Deposit Bibliothèque.
1 What is the Internet Archive We are a Digital Library Mission Statement: Universal access to human knowledge Founded in 1996 by Brewster Kahle in San.
BUILDING DIGITAL WEB ARCHIVES FOR FUTURE SCHOLARS Jani Stenvall
Digital Collections: Use, Value and Impact Lorna Hughes University of Wales Chair in Digital Collections, National Library of Wales Aberystwth University.
1 Technical Developments Related to Quality Issues Brian Kelly UK Web Focus UKOLN University of Bath Bath, BA2 7AY
APSR Forum on Long-Term Repositories National Library of Australia, 31 August – 1 September, Trust and the Web: Can the audit criteria apply to.
The Role of the Public Library in the Digital Age Sarah Ormes UKOLN University of Bath Bath, BA2 7AY UKOLN is funded by the Library and Information Commission,
1 Archiving and Preserving the Web Kristine Hanna Internet Archive April 2006.
Cultivate Interactive Web Magazine - What Is It? Cultivate Interactive is a new pan-European Web magazine which is funded under the Digital Heritage and.
Elizabeth Newbold and Samantha Tillett GL8 New Orleans, December 2006
1 WEB ARCHIVING IN THE BRITISH LIBRARY John Tuck Head of British Collections February 2004.
The capture and preservation of websites at the National Library of New Zealand Gillian Lee Alexander Turnbull Library.
1 Archive-It Training University of Maryland July 12, 2007.
Annick Le Follic Bibliothèque nationale de France Tallinn,
1 WebWatch: Monitoring Web Developments In The UK Brian Kelly UK Web Focus UKOLN University of BathURL Bath, BA2 7AY
Digital | Curation | Centre The UK Digital Curation Centre Michael Day UKOLN, University of Bath (with thanks to Peter Burnhill, Chris Rusbridge, et al.)
Bibliography in the Digital Age - IFLA Satellite Meeting Warsaw, 9 August Online materials published in Austria collecting, archiving and metadata.
WebArchiv Czech Web Archive IIPC 2007, Paris.
1 Archiving and Preserving the Web Dan Avery Kristine Hanna Merrilee Proffitt Internet Archive RLG April 2006.
A centre of expertise in digital information managementwww.ukoln.ac.uk Digital Preservation / UK Web Focus Brian Kelly UKOLN University of Bath Bath, BA2.
Web Capture team Office of strategic initiatives February 27, 2006 Selecting Content from the Web: Challenges and Experiences of the Library of Congress.
Ask A Librarian and QuestionPoint: Integrating Collaborative Digital Reference in the Real World (and in a really big library) Linda J. White Digital Project.
Interoperable Digitised Content “Discover, search, extract, link, associate, and view digitised content” Les Carr.
Web Archiving Dr. Frank McCown Intro to Web Science Harding University This work is licensed under Creative Commons Attribution-NonCommercial 3.0Attribution-NonCommercial.
1 If I Could Start All Over Again: Lessons To be Learnt From The HE Community Brian Kelly UK Web Focus UKOLN University of Bath Bath, BA2 7AY UKOLN is.
Annick Le Follic Bibliothèque nationale de France Tallinn,
Wellcome Library & JISC Web Archiving Project Presented by Michael Day, UKOLN, University of Bath [Author of the Web Archiving feasibility study] Digital.
Supporting further and higher education The UK FAIR Programme: OAI in context Chris Awre OAI3, CERN, February 2004.
CNI Fall Task Force, December 2007 International Internet Preservation Consortium Abbie Grotke IIPC Communications Officer Library of Congress & George.
1 CS 502: Computing Methods for Digital Libraries Lecture 28 Current work in preservation.
1 Archive-It: Archiving and Preserving Born Digital Content NDIIPP June 2009 Molly Bragg Partner Specialist Internet Archive.
Preserving Digital Culture: Tools & Strategies for Building Web Archives : Tools and Strategies for Building Web Archives Internet Librarian 2009 Tracy.
1 The Technical Standards and Your Bid Sarah Ormes UKOLN University of Bath Bath, BA2 7AY UKOLN is funded by Resource: The Council for Museums, Archives.
The Library of Congress Martha Anderson Program Officer, NDIIPP Office of Strategic Initiatives Library of Congress April 2005 LC Perspective : Preservation.
An Introduction. Aspiration To begin the process of adding significant value to those emerging repositories in which.
European Commission on Preservation and Access Preservation of digital heritage Yola de Lusenet Lisbon, November
COMBINING ACCESS TO CULTURAL HERITAGE AND INTELLECTUAL PROPERTY RIGHTS Brussels November 2010 Victor Vazquez Senior Legal Counsellor, Digital Future.
Introducing Intute: Social Sciences Your Guide to the Best of the Web.
1 Collection Development and Web Publications at the British Library John Tuck Head of British Collections Digital Memory, Session 2, Tallinn 24 th November.
The MICHAEL Project is funded under the European Commission eTEN Programme The multilingual catalogue of digital cultural heritage in Europe.
From small beginnings: Developing collection level description Mapping the Information Landscape Showcase day British Library Conference Centre, London,25.
Metadata for digital preservation: a review of recent developments Michael Day UKOLN, University of Bath ECDL2001, 5th European Conference.
Development of Electronic Services in Public Libraries: Issues and Possibilities Sally Criddle UKOLN University of Bath Bath, BA2 7AY UKOLN is funded by.
Research Information Management: Continuity, Change and Impact Michael Jubb Research Information Network UUK Workshop 5 December 2007.
1 BCS, Oxfordshire, 19 February, 2004 WEB ARCHIVING issues and challenges Deborah Woodyard Digital Preservation Coordinator.
UKOLN is supported by: Introduction to UKOLN Dr Liz Lyon, Director UKOLN, University of Bath, UK Grand Challenge Meeting, June a centre.
Preservation metadata and the Cedars project Michael Day UKOLN: UK Office for Library and Information Networking University of Bath
A centre of expertise in digital information management UKOLN priorities: ●Provide advice and services to the library, education.
Surveying the landscape: collection-level description & resource discovery JISC/NSF DLI Projects meeting, Edinburgh, 24 June 2002 Pete Johnston UKOLN,
Cedars work on metadata Michael Day UKOLN, University of Bath Cedars Workshop Manchester, February 2002.
Collection-level description: from theory to practice Minerva project meeting Paris, 24 January 2003 Pete Johnston UKOLN, University of Bath Bath, BA2.
UKOLN is supported by: Library futures in the new research landscape. Dr Liz Lyon, UKOLN, University of Bath, UK CURL Members Meeting October 2004, London.
CLARIN EUDAT2020 uptake plan Dieter Van Uytvanck CLARIN ERIC EUDAT User Forum, Rome.
Strategies for archiving the Danish web space Bjarne Andersen Head of Digital Resources State and University Library, Aarhus
Use cases for BnF broad crawls Annick Lorthios. 2 Step by step, the first in-house broad crawl The 2010 broad crawl has been performed in-house at the.
László Drótos – Márton Németh National Széchényi Library Department of Electronic Library Services Web archiving Planning a new pilot project.
Visually Impaired Community
Presentation transcript:

UKOLN is supported by: Iniciativas de preservación de la Web: una visión actual Michael Day Digital Curation Centre, UKOLN, University of Bath, UK Archivo de la Internet española: Webs y archivos personales, Madrid, Spain, 12 December 2005 A centre of expertise in digital information management

Archivo de la Internet española, Madrid, 12 December Presentation overview Reasons for collecting and preserving the Web Main approaches to collection: –Whole-domain harvesting –Selective capture or deposit –Combined approaches –International Internet Preservation Consortium (IIPC) Issues: –Conceptual, legal, technical (size and dynamic nature), preservation and curation

Archivo de la Internet española, Madrid, 12 December The World Wide Web (1) Origins in scientific community –CERN (early 1990s) –Now part of the common 'cyberinfrastructure' of science and scholarship –Scientists 'increasingly reliant' on Web for supporting research activities (James Hendler, 2003) –Helps to promotes 'open access' principles (peer-reviewed publications, data resulting from publicly-funded research) –Other educational roles - e.g., e-learning

Archivo de la Internet española, Madrid, 12 December The World Wide Web (2) Scholarly concern with the longevity of Internet references –Link rot problem –A study of three leading peer-reviewed journals showed that 13 percent of links were inactive after 3 years (Dellavalle, et al., 2003) –Same trends demonstrated in biomedicine, computer science, information science, … –Wallace Koehler's longitudinal studies show that after seven years, just 33.8 percent of a sample of Web pages persisted at their original URL

Archivo de la Internet española, Madrid, 12 December The World Wide Web (3) The Web now widely used across many different communities: –Commerce, marketing, publishing –Government information (e-government) –Personal communication e.g., 44 percent of US Internet users in a 2003 survey had contributed some kind of content to the Internet –"The information source of first resort for millions of readers" - Peter Lyman (2002)

Archivo de la Internet española, Madrid, 12 December Why preserve the Web? (1) Cultural importance –National Library of Australia noted its responsibility to develop collections of library materials, regardless of format –Many national libraries have now developed operational or pilot Web archives, e.g. Australia, Austria, China, Czech Republic, Denmark, Finland, France, Iceland, Japan, New Zealand, Norway, Slovenia, UK, USA, etc. –Some have made changes to legal deposit laws to accommodate Web content

Archivo de la Internet española, Madrid, 12 December Why preserve the Web (2) Cultural importance –Internet Archive not-for-profit organisation, based in San Fransciso Acquired Web content from Alexa Internet and its own Web crawls, provides access through the Wayback Machine ( Co-operates with memory institutions on developing special collections, e.g. Library of Congress, The National Archives (UK) Part of International Internet Preservation Coalition Mirror of Wayback Machine at Bibliotheca Alexandrina (Egypt)

Archivo de la Internet española, Madrid, 12 December 20058

Archivo de la Internet española, Madrid, 12 December Why preserve the Web? (3) Web content are records of evidence –National archives guidance for Web managers –Some collection of Web sites has started The National Archives UK Government Web Archive, joint project with Internet Archive US National Archives and Records Administration collected snapshot of federal agency Web sites at end of the Clinton Administration Scholarly interest –Politics (Archipol), social history (Occasio), Chinese studies (DACHS)

Archivo de la Internet española, Madrid, 12 December Why preserve the Web? (4) Joint approaches –The UK Web Archiving Consortium Led by the British Library Partners include The National Archives, the national libraries of Wales and Scotland, the Joint Information Systems Committee, and the Wellcome Trust Sharing costs, risks and experiences Each partner focuses on sites relevant to their own interests

Archivo de la Internet española, Madrid, 12 December Approaches (1) Automatic harvesting –Web crawler programs –National libraries tend to focus on national Web domains, e.g. Kulturarw 3 (Sweden) –Harvester fed set of links, pages fetched, analysed, etc., etc. –Internet Archive uses same approach for whole Web, since 1996 has generated >1 petabyte Problems with functionality and country representation (but still a very valuable resource) –Development of Heritrix crawler program

Archivo de la Internet española, Madrid, 12 December Approaches (2) Selective capture or deposit –Pioneered by National Library of Australia (PANDORA) –Development of selection guidelines, selection of sites, negotiation with site owners, capture using gathering or mirroring tools –Used by UK Web Archiving Consortium –Sites can also be captured and deposited by Web site owners e.g., NARA 2001

Archivo de la Internet española, Madrid, 12 December Approaches (3) Combined approaches –Some selective capture, periodic whole domain harvesting –Reflects relative strengths of the two approaches Harvesting approach much cheaper per terabyte, enables large collections to be built up More detailed attention can be paid to complex sites, e.g. database driven (deep Web) sites –Approach pioneered by Bibliothèque nationale de France (BnF) –Recent Australian whole domain harvest

Archivo de la Internet española, Madrid, 12 December Approaches (4) International Internet Preservation Consortium (IIPC) –Group of national libraries and the Internet Archive, led by BnF –Co-operation on coverage and access - a global distributed collection –Development of tools Harvesting - Heritrix, DeepArc Storage - ARC, BAT Search and navigation - NutchWAX, WERA, Zinq Web Archiving Metadata Set

Archivo de la Internet española, Madrid, 12 December Issues (1) What is the Web? –A conceptual problem –Components of the Web easier to understand than the whole –What is is that we want to preserve? Content? - easy for HTML pages, more difficult for databases Interfaces? –Personalisation features

Archivo de la Internet española, Madrid, 12 December Issues (2) Legal problems –Legal environment in many countries does not take Web archives into account (Charlesworth, 2003) –Problems with: Copyright Archives could be deemed to be the "publishers" of defamatory or otherwise illegal content, or held responsible for breaches of data protection legislation –Remedies = select content or restrict access

Archivo de la Internet española, Madrid, 12 December Issues (3) Scale –Web is large (and growing) –Regular snapshots grow even bigger –Internet Archive: >1 petabyte, growing at >20 terabytes a month –Differences in Web archive size depending on domain: Finland (2002) 500 gigabytes Portugal (2003) 78 gigabytes Australia (2005) 6.69 terabytes

Archivo de la Internet española, Madrid, 12 December Issues (4) Dynamic nature of the Web –Pages, sites, domains, constantly changing e.g. new top level domains Web content disappearing (link rot) –Some ad hoc focus on the ephemeral Political elections, sports events, 9/11, Hurricanes Katrina and Rita –Changes in Web technologies Personalised delivery of content Increased interactivity, Web 2.0, etc.

Archivo de la Internet española, Madrid, 12 December Issues (5) Access –Problem of linking content stored in multiple, distributed archives –Need for co-operation –Role for IIPC? Digital preservation and curation –What this might mean for the Web has not been explored in detail –Web archives need to fit into the wider landscape of digital preservation and curation

Archivo de la Internet española, Madrid, 12 December Conclusions –The Web is culturally important –To date, Web archiving initiatives have collected a significant amount of content –Different capture techniques compliment each other –There has been a major improvement in the tools being used to harvest and manage content, e.g. the IIPC toolkit –Co-operation - the IIPC provides one venue for this. Are others needed? –Some significant issues remain to be solved

Archivo de la Internet española, Madrid, 12 December

Archivo de la Internet española, Madrid, 12 December

Archivo de la Internet española, Madrid, 12 December [From December 1998]

Archivo de la Internet española, Madrid, 12 December Thank you / gracias

Archivo de la Internet española, Madrid, 12 December Acknowledgements UKOLN is funded by the Museums, Libraries and Archives Council, the Joint Information Systems Committee (JISC) of the UK higher and further education funding councils, as well as by project funding from the JISC, the European Union and other sources. UKOLN also receives support from the University of Bath, where it is based: The Digital Curation Centre is funded by the JISC and the UK e- Science Programme: