Client-Side Preservation Techniques for ORE Aggregations Michael L. Nelson & Sudhir Koneru Old Dominion University, Norfolk VA OAI-ORE Specification Roll-Out.

Slides:



Advertisements
Similar presentations
Web Servers How do our requests for resources on the Internet get handled? Can they be located anywhere? Global?
Advertisements

Introduction Web Development II 5 th February. Introduction to Web Development Search engines Discussion boards, bulletin boards, other online collaboration.
1 JavaScript & AJAX CS , Spring JavaScript.
Lazy Preservation: Reconstructing Websites from the Web Infrastructure Frank McCown Advisor: Michael L. Nelson Old Dominion University Computer Science.
Promote your website and get top listed in search engines Section E2 Andreas Livadiotis.
Search Engines and their Public Interfaces: Which APIs are the Most Synchronized? Frank McCown and Michael L. Nelson Department of Computer Science, Old.
Web Project Methodology Move It Up Marketing Web Project Methodology in six steps to ensure quality and efficient projects.
Chapter 9 Collecting Data with Forms. A form on a web page consists of form objects such as text boxes or radio buttons into which users type information.
1 Web Servers (IIS and Apache) Outline 9.1 Introduction 9.2 HTTP Request Types 9.3 System Architecture 9.4 Client-Side Scripting versus Server-Side Scripting.
1 CS 131 Wrap Up Fall 2008 What Good is Programming?
July 25, 2012 Arlington, Virginia Digital Preservation 2012warcreate.com WARCreate Create Wayback-Consumable WARC Files from Any Webpage Mat Kelly, Michele.
Website Reconstruction using the Web Infrastructure Frank McCown Doctoral Consortium June.
Copyright © cs-tutorial.com. Introduction to Web Development In 1990 and 1991,Tim Berners-Lee created the World Wide Web at the European Laboratory for.
Dynamic Web Pages (Flash, JavaScript)
Student Learning Environment on the World Wide Web l CGI-programming in Perl for the connection of databases over the Internet. l Web authoring using Frontpage.
Chapter 6 The World Wide Web. Web Pages Each page is an interactive multimedia publication It can include: text, graphics, music and videos Pages are.
Systems Used for Collaboration When to achieve a common goal, result or work product.
Using a Web Browser What does a Web Browser do? A web browser enables you to surf the World Wide Web. What are the most popular browsers?
HT'061 Evaluation of Crawling Policies for a Web-Repository Crawler Frank McCown & Michael L. Nelson Old Dominion University Norfolk, Virginia, USA Odense,
Chapter 1: The Internet and the WWW CIS 275—Web Application Development for Business I.
TEMPORAL SPREAD IN ARCHIVED COMPOSITE RESOURCES (WORK IN PROGRESS) SCOTT G. AINSWORTH MICHAEL L. NELSON OLD DOMINION UNIVERSITY COMPUTER SCIENCE WADL 2013.
Indo-US Workshop, June23-25, 2003 Building Digital Libraries for Communities using Kepler Framework M. Zubair Old Dominion University.
Scott Ainsworth, Ahmed AlSum, Hany SalahEldeen, Michele C. Weigle, Michael L. Nelson Old Dominion University, USA {sainswor, aalsum, hany, mweigle,
Here you are at your computer, but you don’t have internet connections. Your ISP becomes your link to the internet. In order to get access you need to.
HTML ~ Web Design.
Objective Understand concepts used to web-based digital media. Course Weight : 5%
© 2010 Delmar, Cengage Learning Chapter 8 Collecting Data with Forms.
Dynamic Web File Format Transformations with Grace Daniel S. Swaney, Frank McCown, and Michael L. Nelson Old Dominion University Computer Science Department.
Audacity Audacity web address:
Chapter 8 Collecting Data with Forms. Chapter 8 Lessons Introduction 1.Plan and create a form 2.Edit and format a form 3.Work with form objects 4.Test.
McLean HIGHER COMPUTER NETWORKING Lesson 7 Search engines Description of search engine methods.
My Website Was Lost, But Now It’s Found Frank McCown CS 110 – Intro to Computer Science April 23, 2007.
Customer Interface for wuw.com 1.Context. Customer Interface for wuw.com 2. Content Our web-site can be classified as an service-dominant website. 3.
Non-tracking Web Analytics Istemi Ekin Akkus, Ruichuan Chen, Michaela Hardt, Paul Francis, Johannes Gehrke Presentation by David Ferreras.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 14 Database Connectivity and Web Technologies.
Repository Synchronization Using NNTP and SMTP Michael L. Nelson, Joan A. Smith, Martin Klein Old Dominion University Norfolk VA
EPA Enterprise Data Architecture Metadata Framework Assessment Kevin J. Kirby, Enterprise Data Architect EPA Enterprise Architecture Team
Internet Architecture and Governance
MOSS Design Presentation -Senior Project-. MOSS MOSS Server System 1. MOSS Application 2. Server 3. Client SIU-E Code Cop System 1. SIU-E.
Client-Side Preservation Techniques for ORE Aggregations Michael L. Nelson & Sudhir Koneru Old Dominion University, Norfolk VA OAI-ORE Specification Roll-Out.
Ph.D. Progress Report Frank McCown 4/14/05. Timeline Year 1 : Course work and Diagnostic Exam Year 2 : Course work and Candidacy Exam Year 3 : Write and.
Lazy Preservation, Warrick, and the Web Infrastructure Frank McCown Old Dominion University Computer Science Department Norfolk, Virginia, USA JCDL 2007.
Security fundamentals Topic 5 Using a Public Key Infrastructure.
A centre of expertise in digital information management 1 UKOLN is supported by: Approaches to Archiving Professional Blogs Hosted in the.
Organisations and Data Management 1 Data Collection: Why organisations & individuals acquire data & supply data via websites 2Techniques used by organisations.
Factors Affecting Website Reconstruction from the Web Infrastructure Frank McCown, Norou Diawara, and Michael L. Nelson Old Dominion University Computer.
Lazy Preservation: Reconstructing Websites by Crawling the Crawlers Frank McCown, Joan A. Smith, Michael L. Nelson, & Johan Bollen Old Dominion University.
COM: 111 Introduction to Computer Applications Department of Information & Communication Technology Panayiotis Christodoulou.
PHP and AJAX. Servers and Clients For many years we tried to move as much as possible to the server. Weak clients, poor bandwidth, browser compatibility..
The Internet What is the Internet? The Internet is a lot of computers over the whole world connected together so that they can share information. It.
A s s i g n m e n t W e e k 7 : T h e I n t e r n e t B Y : P a t r i c k O b i s p o.
Brass: A Queueing Manager for Warrick Frank McCown, Amine Benjelloun, and Michael L. Nelson Old Dominion University Computer Science Department Norfolk,
 Internet –INTERnational NETwork is the network of computer networks.  It is a Wide Area Network(WLAN).You can have unlimited access to internet. 
Google Analytics Graham Triggs Head of Repository Systems, Symplectic.
Introduction to Digital Libraries Week 15: Lazy Preservation Old Dominion University Department of Computer Science CS 751/851 Spring 2010 Michael L. Nelson.
1 Introduction to Digital Libraries Week 15: Web Infrastructure for Preservation Old Dominion University Department of Computer Science CS 751/851 Fall.
Some from Chapter 11.9 – “Web” 4 th edition and SY306 Web and Databases for Cyber Operations Cookies and.
Transparent Format Migration of Preserved Web Content D. S. H. Rosenthal, T. Lipkis, T. S. Robertson, S. Morabito Lib Magazine, 11(1), 2005
Introduction to Digital Libraries Week 13: Lazy Preservation Old Dominion University Department of Computer Science CS 751/851 Spring 2011 Michael L. Nelson.
Crawling When the Google visit your website for the purpose of tracking, Google does this with help of machine, known as web crawler, spider, Google bot,
By: Michael Meehan & Robert Shogren ITEC December 4, 2007
Lazy Preservation, Warrick, and the Web Infrastructure
Agreeing to Disagree: Search Engines and Their Public Interfaces
Just-In-Time Recovery of Missing Web Pages
Characterization of Search Engine Caches
Unit# 5: Internet and Worldwide Web
If You Harvest arXiv.org, Will They Come?
The Internet and Electronic mail
WEB DESIGN Cross 11, Tapovan Enclave Nala pani Road, Dehradun : ,
Old Dominion University Computer Science IIPC New Member
Presentation transcript:

Client-Side Preservation Techniques for ORE Aggregations Michael L. Nelson & Sudhir Koneru Old Dominion University, Norfolk VA OAI-ORE Specification Roll-Out Open Repositories 2008 Southampton UK, April 4, 2008 Research Supported by the Andrew Mellon Foundation

Outline Background: Let the “Web Infrastructure” preserve your information Premise: ReMs are critical for preservation purposes Client-side vs. Server-side approaches to preservation Sketch of a possible framework for client-side preservation techniques

Web Infrastructure slide from Frank McCown

preservation = refreshing + migration

Web Repository Contributions Frank McCown, Joan A. Smith, Michael L. Nelson, Johan Bollen, Lazy Preservation: Reconstructing Websites by Crawling the Crawlers, Proceedings of WIDM 2006,pp

Overlap with Internet Archive Frank McCown, Michael L. Nelson, Characterization of Search Engine Caches, Proceedings of IS&T Archiving 2007, pp

Warrick -- A Service to Recover Lost Websites warrick.cs.odu.edu

How Much Did We Reconstruct? A “Lost” web site Reconstructed web site BC DEF A B’C’ GE F Missing link to D; points to old resource G F can’t be found Four categories of recovered resources: 1) Identical: A, E 2) Changed: B, C 3) Missing: D, F 4) Added: G slide from Frank McCown

Resource Maps Unambiguously Define an Aggregation The “manifest” nature of ReMs allow us to know “if we got it all” –“known knowns” –“known unknowns” –“unknown unknowns” Assuming the ReM is recovered, the implications for preservation are clear: –defines members of the aggregations –defines relationships between them

Server-Side Techniques Repository A uses ReMs for their aggregations. Repository B harvests ReMs to ensure total coverage of Repository A. Repository A can use its ReMs to validate transfer to Repository B. Third parties use ReMs to audit B’s preservation of A. New ReMs created to reflect migration, refreshing of aggregations. Repo A

Can We Involve End-Users in the Preservation Process? Leverage the actions of end users? –“people helping robots…” Make preservation more accessible? –light-weight and easy like Google Analytics and reCAPTCHA? … hello world … resourcemap=” ; webReposToCheck=”google,yahoo,internetArchive”; checkMirrors=”yes”; writeBack=” <script type=”text/javascript” src=” …

Client-Side Techniques Operations on the ReM and Aggregated Resources (ARs) –validation, http status, ReM visualization, etc. Interacting with the Web Infrastructure –checking for ReM, ARs in Internet Archive, search engine caches, etc. –reconstructing aggregation for a given time interval –submitting ReM, ARs to WI Inter-client communication –my client updates/repairs ReM -- how to communicate that to other clients and servers?

One Reason Why We Need Humans in the Loop

A Possible Scenario… resourcemap=” webReposToCheck=”google,yahoo,internetArchive”; checkMirrors=”yes”; writeBack=” <script type=”text/javascript” src=” ore.cs.odu.edu wiki.somewhere.org

Wikis Would Make a Nice Inter- Client Message Store Function as a publicly (computers + humans) readable revision control system for ReMs

“Help Preserve This Object”

Current Status Hierarchical view of ReM Finds copies of Aggregated Resources in Internet Archive, Google, Yahoo Next up: –use Simile time line software ( to display ARs in time –chose a time interval for reconstruction –send edited ReMs to a wiki or public service –write a program to read & vet edited ReMs from public store