Archive What I See Now Mat Kelly, Michael L. Nelson, Michele C. Weigle Old Dominion University Web Science and Digital.

Slides:



Advertisements
Similar presentations
1 What is the Internet Archive We are a Digital Library Mission Statement: Universal access to human knowledge Founded in 1996 by Brewster Kahle in San.
Advertisements

Update on the SWORD Protocol & Future Directions.
Integrated Digital Event Web Archive and Library (IDEAL) and Aid for Curators Archive-It Partner Meeting Montgomery, Alabama Mohamed Farag & Prashant Chandrasekar.
Looking Ahead Archive-It Partner Meeting November 18, 2014.
HTML 5 and CSS 3, Illustrated Complete Unit K: Incorporating Video and Audio.
Looking Ahead Archive-It Partner Meeting November 12, 2013.
Georgios Kontaxis, Michalis Polychronakis Angelos D. Keromytis, Evangelos P. Markatos Siddhant Ujjain (2009cs10219) Deepak Sharma (2009cs10185)
The Online Library Environment Projects and Challenges at The University of Alabama Libraries Jason J. Battles Head, Web Services Department.
Building a Digital Library with Fedora International Conference on Developing Digital Institutional Repositories Hong Kong December 9, 2004.
Presentation Outline  Project Aims  Introduction of Digital Video Library  Introduction of Our Work  Considerations and Approach  Design and Implementation.
Kaitlin Moran Software Brief. What is picnik? Picnik is a free program that allows you to create your photos into a master piece, through a variety of.
Page 1 Building Reliable Component-based Systems Chapter 18 - A Framework for Integrating Business Applications Chapter 18 A Framework for Integrating.
Workshop on Implementing Scriblio The Next-Generation Library Catalog 25 June 2008 The Hong Kong University of Science and Technology Library Interface.
11 WARC standard revision workshop Clément Oury IIPC General Assembly open workshops Stanford, April 28th, 2015 IIPC General Assembly – Stanford – April.
Archive-It Architecture Introduction April 18, 2006 Dan Avery Internet Archive 1.
1 Archiving and Preserving the Web Kristine Hanna Internet Archive April 2006.
Feeds Computer Applications to Medicine NSF REU at University of Virginia July 27, 2006 Paul Lee.
1 Archive-It Training University of Maryland July 12, 2007.
1 Advanced Archive-It Application Training: Archiving Social Networking and Social Media Sites.
Web Archiving Life Cycle Model Archive-It Partner Meeting December 3, 2012 Molly Bragg
How To Get Insanely Organized With Evernote Unleash the Power of the Trunk Michelle Lindsey, Technology Instructor Ozark Upper Elementary
Joanne Archer University of Maryland Kate Odell Archive-It Abbie Grotke Library of Congress Tessa Fallon Columbia University Creating and Maintaining Web.
July 25, 2012 Arlington, Virginia Digital Preservation 2012warcreate.com WARCreate Create Wayback-Consumable WARC Files from Any Webpage Mat Kelly, Michele.
2014: A Celebratory Year World Wide Web’s 25 Year Anniversary W3C’s 20 Year Anniversary 10 June
An Extensible Framework for Creating Personal Web Archives of Content Behind Authentication Mat Kelly Director:Michele C. Weigle Committee:Michael L. Nelson.
Tool Academy: Web Archiving Nicholas Digital Cultural Heritage DC Meetup December 20, 2012 “cobwebbed screw driver” by Flickr user Colby.
IIPC GA Curator Tools Fair May 2014 WEB CURATOR TOOL Nicola Bingham Web Archivist.
1 Session 1: Introduction to HTML Spring Today’s Agenda Cover useful terminology for today’s session HTML, browsers, servers, etc. HTML Tags Get.
Extending the Scope of Learning Objects with XML Bill Tait COLMSCT Associate Teaching Fellow The Open University ALT-C Conference Sep 2007.
Metadata Considerations Implementing Administrative and Descriptive Metadata for your digital images 1.
Online Classrooms Katherine Carswell Holly Academy Middle School
Indo-US Workshop, June23-25, 2003 Building Digital Libraries for Communities using Kepler Framework M. Zubair Old Dominion University.
1 Archive-It: Archiving and Preserving Born Digital Content NDIIPP June 2009 Molly Bragg Partner Specialist Internet Archive.
Was.cdlib.org California Digital Library University of California Rosalie Lack
Topaz : A GridFTP extension to Firefox M. Taufer, R. Zamudio, D. Catarino, K. Bhatia, B. Stearn University of Texas at El Paso San Diego Supercomputer.
Preserving Digital Culture: Tools & Strategies for Building Web Archives : Tools and Strategies for Building Web Archives Internet Librarian 2009 Tracy.
Visualizing Digital Collections at Archive-It Michele C. Weigle, Michael L. Nelson Web Sciences and Digital Libraries (WS-DL) Lab Department of Computer.
Distributed Software Development QR Marks The Spot Beta Prototype Vadym Khatsanovskyy, Nicolas Jacquemoud.
Use & Access 26 March Use “Proof of Concept” Model for General Libraries & IS faculty Model for General Libraries & IS faculty Test bed for DSpace.
Archive What I See Now Mat Kelly, Michael L. Nelson, Michele C. Weigle Old Dominion University Web Science and Digital.
XNAT Workshop 2012 Project Configuration Tim Olsen
1 A Very Large Digital Library Technology Demonstration William Y. Arms Cornell University.
CyberCemetery Preserving At-Risk Government Web Content.
Sakaibrary: Integrating Licensed Library Resources with Sakai 29 November 2006 Steve Smail Mark Notess.
1 Implementing LEAP2A using the Argotic library in.NET Andrew Everson Extensions for Argotic version can be downloaded from:
The New DRS Introduction. What is DRS? Digital repository for preservation and access – Maintains integrity of deposited content – Preserves content for.
Make it, Don’t Fake it Leap Forward with Eyeblaster Workshop™ for Flash December 20,2007.
Archive Facebook Matthew Kelly Old Dominion University.
IRS Tax Map Electronic Research Tool David Brown Internal Revenue Service Media and Publications Division David Brown Internal Revenue Service Media and.
Classical Model: Web Harvesting W/ARC - GET / HTTP/ OK text/css image/gif image/jpg video JavaScript Pull from queue.
Digital Archives You Can Do It! The Collective - March 2016 Paul Kelly - Digital Archivist - The Catholic University of America.
1 PSI/PhUSE Single Day Event – SAS Applications – June 11, 2009 SAS Drug Development from the Inside Magnus Mengelbier Director.
Introduction to Digital Libraries Assignment #1 Old Dominion University Department of Computer Science CS 751/851 Spring 2015 Michael L. Nelson 01/22/15.
Archive Facebook Matthew Kelly Department of Computer Science Old Dominion University Norfolk, Virginia.
 GEETHA P.  Originally coined by Tim O’Reilly Publishing Media  Second generation of services available on www.  Lets people collaborate and share.
Objectives  Encourage education through a new format of utilizing social media within a primary care residency program.  Lead discussions through multiple.
Web Archiving Workshop Mark Phillips Texas Conference on Digital Libraries June 4, 2008.
Scholarly Workflow: Federal Prototype and Preprints
Institution update KB DK
Web 2.0 and Library 2.0 A Brief Overview
“Real Simple Syndication” (RSS)
Creating Web Collections with Archive-It
Michele C. Weigle and Michael L. Nelson
László Drótos – Márton Németh National Széchényi Library Department of Electronic Library Services Web archiving Planning a new pilot project.
VI-SEEM Data Repository
Archivist By Scott Selinger, Charles Gilliam, Dung Mai, Robert Edwards, and Tyler Boswell Stakeholder: Lorraine Richards Advisor: Rosina Weber Scott.
5 things you didn’t know you can BUILD with Microsoft Edge
Visualizing Digital Collections at Archive-It
Introduction to Digital Libraries Assignment #2
Introduction to Digital Libraries Assignment #2
Presentation transcript:

Archive What I See Now Mat Kelly, Michael L. Nelson, Michele C. Weigle Old Dominion University Web Science and Digital Libraries Research Group ws-dl.blogspot.com

Web archives capture a lot but not everything Individuals’ interests may not be captured Timely capture is important Capture capability must be enabled for all What’s the Problem? 2 November 12, 2013 Salt Lake City, Utah 2013 Archive-It Partner Meeting

November 12, 2013 Salt Lake City, Utah 2013 Archive-It Partner Meeting Timely Capture Is Important Use Case: Capturing Breaking Stories 3

November 12, 2013 Salt Lake City, Utah 2013 Archive-It Partner Meeting Timely Capture Is Important Use Case: Capturing Breaking Stories 4

November 12, 2013 Salt Lake City, Utah 2013 Archive-It Partner Meeting 5 Timely Capture Is Important Use Case: Capturing Breaking Stories

November 12, 2013 Salt Lake City, Utah 2013 Archive-It Partner Meeting 6

Users take ad hoc approaches 1. 2.Screenshots of Pages 3.Other sub-optimal approaches The Amateur Archivist’s Approach to Just-In-Time capture 7 November 12, 2013 Salt Lake City, Utah 2013 Archive-It Partner Meeting

Acknowledge the problem: – THE TOOLS ARE DIFFICULT! Resolve the problem: – Build more accessible tools (make it EASY) – Appeal to standards (e.g., WARC) – Make interoperable Enabling The Amateur Web Archivist 8 November 12, 2013 Salt Lake City, Utah 2013 Archive-It Partner Meeting 28500:2009

Safety of Archives Requires $ Institutions Require Funding Users’ Hard Drives Fail – No Access to Save-As files and Screenshots Hybrid approach needed – Leverage institutional safety, formats, and tech – allow direct user deposits The Institutional Dilemma 9 November 12, 2013 Salt Lake City, Utah 2013 Archive-It Partner Meeting

So we built it! 10 November 12, 2013 Salt Lake City, Utah 2013 Archive-It Partner Meeting WARCreate – Google Chrome extension Create web archives from browser Capture personalized content Preserve on a whim 1.Mat Kelly and Michele C., "WARCreate - Create Wayback-Consumable WARC Files from Any Webpage," In Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL 2012). Washington, DC, June 2012, pp Mat Kelly, Michele C. Weigle, Michael Nelson. "WARCreate - Create Wayback-Consumable WARC Files from Any Webpage," Digital Preservation 2012, Tools Demo Session: Web Archiving; 2012 Jul 25; Washington, DC.

WARCreate – How it Works 11 November 12, 2013 Salt Lake City, Utah 2013 Archive-It Partner Meeting

Preserving the Original Context Use Case: Capturing Facebook 12 Facebook-Supplied Data Dump Archive created from WARCreate in Wayback November 12, 2013 Salt Lake City, Utah 2013 Archive-It Partner Meeting Liberated Data Doesn’t Give The Whole Picture

Preserving the Original Context Use Case: Capturing Facebook 13 Using Scraping Tools (e.g. wget) Archive created from WARCreate in Wayback November 12, 2013 Salt Lake City, Utah 2013 Archive-It Partner Meeting The Target Controls What is Allowed

Preserving the Original Context Use Case: Capturing Facebook 14 A Crawler Has No Context Archive created from WARCreate in Wayback November 12, 2013 Salt Lake City, Utah 2013 Archive-It Partner Meeting No Credentials  No Entry  No Archiving

Preserving the Original Context Use Case: Capturing Facebook 15 IA/HERITRIX OBEY ROBOTS Archive created from WARCreate in Wayback November 12, 2013 Salt Lake City, Utah 2013 Archive-It Partner Meeting No Means No, if They Say and you Obey

So we built it! 16 November 12, 2013 Salt Lake City, Utah 2013 Archive-It Partner Meeting WARCreate – Google Chrome extension Create web archives from browser Capture personalized content Preserve on a whim

Users don’t know WHAT TO DO with WARC files Users can now create WARCs! 17 November 12, 2013 Salt Lake City, Utah 2013 Archive-It Partner Meeting WARCreate – Google Chrome extension Create web archives from browser Capture personalized content Preserve on a whim CAVEAT:

So, again, we built it! 18 November 12, 2013 Salt Lake City, Utah 2013 Archive-It Partner Meeting Web Archiving Integration Layer (WAIL) Heritrix, Wayback, etc. packaged for PC GUI front-end allows “One-Click Preservation” Provides means to replay WARCs 1.Mat Kelly, Michele C. Weigle, Michael Nelson. "Making Enterprise-Level Archive Tools Accessible for Personal Web Archiving," Personal Digital Archiving 2013, Poster Session; 2013 Feb 21; College Park, MD. 2.Mat Kelly, Michael Nelson and Michele C. Weigle. "WARCreate and WAIL: WARC, Wayback and Heritrix Made Easy," Digital Preservation 2013, Workshops and Sessions: Web Archiving; 2013 Jul 24; Alexandria, VA

So, again, we built it! 19 November 12, 2013 Salt Lake City, Utah 2013 Archive-It Partner Meeting Web Archiving Integration Layer (WAIL) Heritrix, Wayback, etc. packaged for PC GUI front-end allows “One-Click Preservation” Provides means to replay WARCs

The Archive What I See Now Project 20 November 12, 2013 Salt Lake City, Utah 2013 Archive-It Partner Meeting

The Archive What I See Now Project: Three Goals 1.Port 2.Add functionality in: … to upload WARCs to: 3.Implement Sequential Archiving 21 November 12, 2013 Salt Lake City, Utah 2013 Archive-It Partner Meeting & &

Disjoint extension/add-on APIs – Little logic can be re-used Problems with HTTP header capture in Chrome are trivial in Firefox – Chrome = highly asynchronous fetching Code to save WARC to PC from browser reusable in Firefox Porting WARCreate to Firefox 22 November 12, 2013 Salt Lake City, Utah 2013 Archive-It Partner Meeting

The Archive What I See Now Project: Three Goals 1.Port 2.Add functionality in: … to upload WARCs to: 3.Implement Sequential Archiving 23 November 12, 2013 Salt Lake City, Utah 2013 Archive-It Partner Meeting & & ✓ In βeta now!

The Archive What I See Now Project: Three Goals 1.Port 2.Add functionality in: … to upload WARCs to: 3.Implement Sequential Archiving 24 November 12, 2013 Salt Lake City, Utah 2013 Archive-It Partner Meeting & &

Working with Archive-It to determine feasibility of user-provided WARCs Consideration of data integrity Should data be merged with A-IT crawled WARCs? – How do we account for your vs. my Privacy? Uploading WARCs: An Open Question 25 November 12, 2013 Salt Lake City, Utah 2013 Archive-It Partner Meeting

The Archive What I See Now Project: Three Goals 1.Port 2.Add functionality in: … to upload WARCs to: 3.Implement Sequential Archiving 26 November 12, 2013 Salt Lake City, Utah 2013 Archive-It Partner Meeting & &

personal streamwallpostsmy tweets global streamnews feedstreamsfollowees’ tweets multimedia-photosphotos N/A multimedia-videosvideos N/A photo collectionalbumsN/A postsnotesN/A friends circlesfollowing Sequential Archiving? 27 November 12, 2013 Salt Lake City, Utah 2013 Archive-It Partner Meeting The Digital Libraries Approach ★ Discovery & Scraping: The Information Retrieval Approach - versus -

Only (and optionally) applied on recognized sites – scraping as fallback for establishing hierarchy Not limited to social media – CNN.com, MSNBC.com, etc have similar hierarchies Lives online, tools allude to and are always updated Standardized spec* prototype is live online Online Hierarchy Definition 28 November 12, 2013 Salt Lake City, Utah 2013 Archive-It Partner Meeting * M. Kelly, An Extensible Framework for Creating Personal Archives of Web Resources Requiring Authentication, Aug 2012

Firefox WARCreate in Beta – Chrome WARCreate Users Can Currently Archive What They See Now with Sequential Archiving Implemented in Chrome WARCreate, needs porting Next Big Hurdle: Working with Archive-It in WARC upload logistics Summary 29 November 12, 2013 Salt Lake City, Utah 2013 Archive-It Partner Meeting &

Download Our Archiving Tools! Share Your Use Cases for Capturing the Unpreserved and the Unpreservable Help Us Improve Our Tools, Give Feedback! Archive What I See Now 30 November 12, 2013 Salt Lake City, Utah 2013 Archive-It Partner Meeting Web Archiving Integration Layer (WAIL) One-Click Preservation Heritrix, Wayback and Others On Your PC! WARCreate for Chrome Create WARC files form any web page from your browser version in beta Available Soon!