1999-06-04 Kulturarw³ The Swedish WWW Archive Eller, att fånga den V ärlds V ida V även

Slides:



Advertisements
Similar presentations
Kulturarw³ Capturing the web The Swedish experience
Advertisements

Harvesting and archiving the Web Nordunet2000, Juha Hakala Helsinki University Library.
Daedalus Service Development Stephen Gallacher Lesley Drysdale.
Preserving for the Future Mike King Systems Manager UK Data Archive (University of Essex)
Chapter 4 Storing Information in a Computer Peter Nortons Introduction to Computers.
Tertiary Storage Devices
The Office Procedures and Technology
Digital Collections: Storage and Access Jon Dunn Assistant Director for Technology IU Digital Library Program
BUILDING DIGITAL WEB ARCHIVES FOR FUTURE SCHOLARS Jani Stenvall
| IFLA2010. Newspaper Section | Newspaper Resources in transition: Digital Preservation and Access - keynote - IFLA International Newspaper.
Welcome to CMPE003 Personal Computer Concepts: Hardware and Software Winter 2003 UC Santa Cruz Instructor: Guy Cox.
Archiving the Web: why bother ? LA Times (March 2000)
1 Strategies for Collecting and Preserving Open Access Materials on the Web William Y. Arms Cornell University Federal Library and Information Center Committee.
Library IT Task Force Open Forum Dec. 4, 2008 Library Strategies.
1 Archival Storage for Digital Libraries Arturo Crespo Hector Garcia-Molina Stanford University.
1 CS 502: Computing Methods for Digital Libraries Lecture 27 Preservation.
1 Minerva The Web Preservation Project. 2 Team Members Library of Congress Roger Adkins Cassy Ammen Allene Hayes Melissa Levine Diane Kresh Jane Mandelbaum.
1 Stanford Archival Repository Project Brian Cooper Arturo Crespo Hector Garcia-Molina Department of Computer Science Stanford University.
1 Data Management (2) Data Management (2) “Application of Information and Communication Technology to Production and Dissemination of Official statistics”
Introduction to Computers Section 5B. home Two Ways to Measure Storage Device Performance Average Access Time Data Transfer Rate.
Introduction to Computers Essential Understanding of Computers and Computer Operations.
Storage Devices and Media
Data Preservation Best Practices for preserving your research data for future reuse The goal of data preservation is to ensure that your data is in a sustainable.
Chapter 3: Storage Devices & Media ALYSSA BAO 1. 2 Solid State controls movements of electrons within a microchip Optical uses precision lasers to access.
Standard Grade Computing STORAGE DEVICES CHAPTER 18 COMPUTER STUDIES Standard Grade.
A Seminar report On Electronic Resources :An Overview
Computer Technology Correct Keyboarding Technique Eyes on copy Fingers curved Correct fingers Key smooth Proper sitting posture.
Alternative Input Devices. Digital Camcorder View recordings on a regular TV or copy them to VHS tape Send MPEG video clips by way of to a mobile.
Records Management: It’s Not Just Paper
Danish Legal Deposit on the Internet National Diet Library, Tokyo, January 2002 by Birgit N. Henriksen Head of Digitization and Web Department The Royal.
Svein Arne Brygfjeld National Library of Norway Nordic Web Archive.
Computer Hardware and Software Jinchang Wang. Hardware vs. Software Hardware is something tangible. Computer hardware includes electronic circuitry and.
Internet Basics A management-level overview of the Internet, its architecture, capabilities, and protocols. Copyright 2011 SPMI / Online Development.
Computers: Information Technology in Perspective By Long and Long Copyright 2002 Prentice Hall, Inc. Computers: Information Technology in Perspective.
DATA DEDUPLICATION By: Lily Contreras April 15, 2010.
Publisher’s Perspective: Digitization of print resources, and archiving of digital resources Judy Best, June 13, 2006.
Computer memory. Bits and bytes  Data can be stored and measured in bytes  One bytes can contains 8 bytes  A bits can only be 0 or 1  A series of.
Lecture No 11 Storage Devices
GCSE Information Technology Storing data Data storage devices can be divided into 2 main categories: Backing storage is used to store programs and data.
 Secondary storage (or external memory) - is not directly accessible by the CPU. Secondary storage does not loose the data when the device is powered.
Digital Archiving in the Hungarian Széchényi Library The story and the plans of the Hungarian Electronic Library Rome, 21. Oct István Moldován OSZK,
Digital library projects in the Nordic national libraries Juha Hakala Helsinki University Library – The National Library of Finland.
Overview of Physical Storage Media
Getting to know Storage Media 1.Stores information 2.Retrieve information for later use.
Chidambaranathan C.M SRM University,Haryana. Memory:- As the word implies “memory” means the place where we have to store any thing, this is very essential.
Storage of digital objects Adolf Knoll National Library of the Czech Republic
CSCI-100 Introduction to Computing Hardware Part I.
The Ultimate Backup Solution.
The Computer System CS 103: Computers and Application Software.
Storage Why is storage an issue? Space requirements Persistence Accessibility Needs depend on purpose of storage Capture/encoding Access/delivery Preservation.
Storage Devices.
Storage devices 1. Storage Storage device : stores data and programs permanently its retained after the power is turned off. The most common type of storage.
Objective of this lesson Understand data archives and back-ups.
Chapter 6 Discovering Computers Fundamentals Storage.
Levi Garner. Topics  Computer Storage Devices  Storage Media and Storage Devices  Memory and Storage.
Indira Gandhi National Open University presents. A Video Lecture Course: Computer Platforms.
STORAGE DEVICES Storage devices are categorized by the method they use to store files.
4-1 Copyright Prentice-Hall, Inc Disks & Tape Backup Units CHAPTER4.
What do you mean by external storage devices? External storage devices are the devices that temporarily store information for transporting from computer.
By Jason Swoyer.  Computer forensics is a branch of forensic science pertaining to legal evidence found in computers and digital storage mediums.  Computer.
Windows 7 and file management
OPERATE A WORD PROCESSING APPLICATION (BASIC)
Chapter 7.
CS 321: Human-Computer Interaction Design
The Ultimate Backup Solution.
Emulation: Good or Bad? Emulation as a Digital Preservation Strategy – Stewart Granger Reality and Chimeras in the Preservation of Electronic Records –
The Office Procedures and Technology
Unit 1.3 Storage Lesson 1: Storage Devices
Unit 1.3 Storage Lesson 1: Storage Devices
5 Backing Storage Backing storage is used to store programs and data when they are not being used or when a computer is switched off. When programs and.
Presentation transcript:

Kulturarw³ The Swedish WWW Archive Eller, att fånga den V ärlds V ida V även

Background Legal deposit, 1661 Latest revision 1993 –only electronic documents in fixed form –CD-ROM, diskettes New law under way –secretary from BIBSAM/KB –Proposal: SOU 1998:111 First Swedish web newspaper lost – printed newspapers 1645 Kulturarw 3 started summer 1996

Goals All www and gopher pages in Sweden –pictures, video etc –.se and generic TLD’s –suecana All articles in electronic journals All Swedish newsgroups / mailing lists Limitations versus RA and ALB

Organisation Project group: two persons Steering group: four persons Reference group: representatives from ALB, RA, Lund Univ, SUNET etc International cooperation –NWA - Nordic Web Archive –Nat. Libraries

Strategy Selection? How to know what is important? Labour intense Collect everything using automatic software Gets everything Less labour intense Computer memory is cheap

Strategy Take snapshots of the Swedish web a few times a year. In the future, take newspapers every day, others every month etc What is Sweden?.se.com,.org and.net with Swedish address/telephone number Swedish.nu (Niue) Suecana

Robot, Software Modified version of Nordic Web Index’s robot software (NetLab, Lunds univ.) Important! indexing is not archiving! Save data in MIME format Temporary storage, media: DLT-tape –Data rate, 5MB/s –Capacity, 20 GB –Durability, pass –Data integrity, error detection and correction –Low cost per GB stored –Access time too long (?)

Statistics 15 MURL (including duplicates) 240 Gbytes sites – se – com,.org,.net and.edu –~100 suecana –6 800 Niue Compare, legal deposit –Printed materiel: 1,7 km/year –The web: approx. 50 km on swedish web.

Statistics 363 different MIME types found. Many the same, some garbage. 7.0 M text/html 4.2 M image/gif 3.0 M image/jpeg 0.3 M text/plain 0.5 M others text/html + image/gif + image/jpeg + text/plain comprises 97% of the documents.

The Archive Goals –Create copies of the swedish web at several times (compare index services) –Surf the web in space and time –Search –Accessible in the future  migration

The Archive Disk (Optical disk) Magnetic tape HSM: Most data on magnetic tapes, staged to disk when needed

The Archive What are we archiving? –Magnetic tapes? –Bits and bytes ? –Intellectual content?