4/19-21/2006J. M. Deken Future Proof II SLAC Archives & History Office 1 Continuing Persistence: The Persistent Archives Test-bed (PAT) Project at SLAC.

Slides:



Advertisements
Similar presentations
A centre of expertise in data curation and preservation DCC Workshop: Curating sApril 24 – 25, 2006 Funded by: This work is licensed under the Creative.
Advertisements

Michigan Department of Education School Improvement Plan SIP February 2011.
Provenance-Aware Storage Systems Margo Seltzer April 29, 2005.
“The Honeywell Web-based Corrective Action Solution”
The Electronic Office Some supplementary information Corporate websites Office automation Company intranet.
Fedora Users’ Conference Rutgers University May 14, 2005 Researching Fedora's Ability to Serve as a Preservation System for Electronic University Records.
Digital Preservation - Its all about the metadata right? “Metadata and Digital Preservation: How Much Do We Really Need?” SAA 2014 Panel Saturday, August.
Metalogix – Confidential Professional Archive Manager for SharePoint.
Washington State Archives February 2010 Presented by: Russell Wood – State Records Manager Managing Public Records of Agency Websites.
Providing Online Access to the HKUST University Archives: EAD to INNOPAC Sintra Tsang and K.T. Lam The Hong Kong University of Science and Technology 7th.
EAD in A2A Bill Stockting, Senior Editor A2A and EAD Working Group: Central Archives of Historical Records, Warsaw, 26 April 2003.
XP Adding Hypertext Links to a Web Page. XP Objectives Create hypertext links between elements within a Web page Create hypertext links between Web pages.
21 ST Century Tools For Archivists: A Progress Report on Persistent Archives at the Stanford Linear Accelerator Center (SLAC) Jean Marie Deken Archivist.
1 Computing for Todays Lecture 22 Yumei Huo Fall 2006.
Introducing Symposia : “ The digital repository that thinks like a librarian”
Persistent Digital Archives and Library System (PeDALS) A Guide for Wisconsin State Agencies.
Basic Electronic Records Management
1 Archive-It Training University of Maryland July 12, 2007.
AIIM Presentation Selecting and Implementing A Records Management System June 5, 2008.
Section 2.1 Compare the Internet and the Web Identify Web browser components Compare Web sites and Web pages Describe types of Web sites Section 2.2 Identify.
DSpace, CyberCemeteries and Other Active Sites for Community Networking Records Maria Esteva and Sue Soy School of Information, UT Austin Austin History.
8/28/97Organization of Information in Collections Introduction to Description: Dublin Core and History University of California, Berkeley School of Information.
Web Capture team Office of strategic initiatives February 27, 2006 Selecting Content from the Web: Challenges and Experiences of the Library of Congress.
Tutorial 1: Getting Started with Adobe Dreamweaver CS4.
Persistent Digital Archives and Library System (PeDALS) SC Department of Archives and History.
Recordkeeping for Good Governance Toolkit Digital Recordkeeping Guidance Funafuti, Tuvalu – June 2013.
The History of Data / Records SLAC Jean Marie Deken, M.A., M.L.I.S., C.A. Archivist and Head SLAC Archives and History Office 2 nd Workshop.
What Agencies Should Know About PDF/A September 20, 2005 Susan J. Sullivan, CRM
 To explain the importance of software configuration management (CM)  To describe key CM activities namely CM planning, change management, version management.
ECHO DEPository Project: Highlight on tools & emerging issues The ECHO DEPository Project is a 3-year digital preservation research and development project.
A disaggregated model for preservation of E-Prints Gareth Knight SHERPA DP Project Arts and Humanities Data Service.
State Records Office of Western Australia.NET Proof of Concept Project Slideshow: Prototype Online Disposal Authority/Recordkeeping Plan System Project.
WHAT IS A SEARCH ENGINE A search engine is not a physical engine, instead its an electronic code or a software programme that searches and indexes millions.
Systematic Alien Verification for Entitlements (SAVE) Program 2013 AAMVA Annual International Conference.
The Real At Risk E-Content: University Web Resources EDUCAUSE Joanne Kaczmarek University of Illinois at Urbana-Champaign Taylor Surface OCLC October 12,
Preserving Digital Culture: Tools & Strategies for Building Web Archives : Tools and Strategies for Building Web Archives Internet Librarian 2009 Tracy.
DISCLAIMER All opinions are my own and do not reflect the opinion of Fold3 (Ancestry.com), the National Archives (NARA), or the Federation of Genealogical.
Implementing the Standard on digital recordkeeping.
What Agencies Should Know About PDF/A-1 April 6, 2006 Mark Giguere
Lifecycle Metadata for Digital Objects September 11, 2002 Major archival and digital library metadata schemes.
“Guidance on the Selection and Appraisal of Geospatial Content of Enduring Value, April 2014 Draft” groups-subcommittees/hdwg/index_html.
+ Information Systems and Databases 2.2 Organisation.
Systems Development Life Cycle
A Technical Guide to ERMS Bill Manago, CRM. What You Need to Plan For Implementing an Electronic Records Management System Out of the Box What you should.
Collecting History: Profiles in Science Alexa T. McCray National Library of Medicine Bethesda, MD Stanford University August 21, 1999.
Website design and structure. A Website is a collection of webpages that are linked together. Webpages contain text, graphics, sound and video clips.
Lifecycle Metadata for Digital Objects October 23, 2006 Creation Metadata.
Legal Holds Department of State Division of Records Management Kevin Callaghan, Director.
Program Assessment User Session Experts (PAUSE) Information Sessions: RSS & Subscription Services October , 2006.
Lifecycle Metadata for Digital Objects November 15, 2004 Preservation Metadata.
A Project of the University Libraries Ball State University Libraries A destination for research, learning, and friends.
1 CS 430: Information Discovery Lecture 26 Architecture of Information Retrieval Systems 1.
STREAMLINING STORM WATER PERMIT IMPLEMENTATION THROUGH USE OF AN INFORMATION MANAGEMENT SYSTEM April 8, 2004 Presented by John W. Hart 30 th Environmental.
Donald G. Davis Collection 392K Amy Baker, Megan Peck, Zach Vowell.
SAA 2007 Metadata Roundtable Metadata Development for the Persistent Archives Testbed (PAT) Project Jean Deken.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
ImageNow -- An Overview --. What is ImageNow?  Loyola’s document imaging and workflow application  Primary application (web based and desktop) of the.
Digitalcommons.unl.edu Archiving Department Records.
XP Creating Web Pages with Microsoft Office
Basic Electronic Records Management
7th Annual Hong Kong Innovative Users Group Meeting
Internet Made Easy! Make sure all your information is always up to date and instantly available to all your clients.
Software Configuration Management
Chapter 11: Software Configuration Management
Better than it was Finding what works for processing born-digital archives at the Bentley Historical Library Mike Shallcross U-M Bentley Historical Library.
MSC photo:  It was taken some time in the late 1930s, but we don’t have an exact date.  The college was known as MSC from 1925 until 1955 when we became.
Metadata to fit your needs... How much is too much?
Chapter 11: Software Configuration Management
MSC photo:  It was taken some time in the late 1930s, but we don’t have an exact date.  The college was known as MSC from 1925 until 1955 when we became.
Metadata supported full-text search in a web archive
Presentation transcript:

4/19-21/2006J. M. Deken Future Proof II SLAC Archives & History Office 1 Continuing Persistence: The Persistent Archives Test-bed (PAT) Project at SLAC in 2005 – 2006: A Progress Report

4/19-21/2006J. M. Deken Future Proof II SLAC Archives & History Office 2 Basic records description SLAC Large Detector (SLD) for the SLAC Linear Collider (SLC) Early and prolific user of world-wide web No further need to keep data confidential Many types of electronic documents Meet US Department of Energy (DOE)/National Archives (NARA) criteria for retention

4/19-21/2006J. M. Deken Future Proof II SLAC Archives & History Office 3 Basic records description News items and Hypertext News Publications and Technical Notes in a variety of formats Presentations in PowerPoint, PDF, and Postscript Web pages in HTML format Graphics in Postscript, Encapsulated Postscript, GIF and JPEG formats

4/19-21/2006J. M. Deken Future Proof II SLAC Archives & History Office 4 Progress in Web Crawl Analysis Metadata Skeleton / Scheme development SLD Collection Arrangement Next Steps

4/19-21/2006J. M. Deken Future Proof II SLAC Archives & History Office 5 Web Crawl Analysis ITERATIVE PROCESS Round 1: Difficulties/issues encountered Massive crawl: Mother Lode and Monstrum Ingens Mother Lode Preserved endangered electronic records Serves as a foundation and basis for subsequent work: can be mined as we iterate crawling Absolutely necessary first step

4/19-21/2006J. M. Deken Future Proof II SLAC Archives & History Office 6 Web Crawl Analysis Monstrum Ingens Too much information, Too little useful organization Benefit: Made us have to think about what we really want / need… Had: Series descriptions based on archival appraisal of SLD records. Needed: the same information, but  Arranged hierarchically  Linked to NARA/DOE research records control schedule (our target)

4/19-21/2006J. M. Deken Future Proof II SLAC Archives & History Office 7 Web Crawl Analysis Made us have to think about what we really want/need…(cont’d) Had: all of the SLD electronic records (maybe?) Needed: a way to know precisely what we had gathered in the crawl Analyzed original crawl, sorted by urls  Were all links captured? Which ones weren’t? why not?  Created a script to parse webpage for URLS and compare the URLs with the crawl result. If the URL isn't in the list, capture the URL along with the file

4/19-21/2006J. M. Deken Future Proof II SLAC Archives & History Office 8 Web Crawl Analysis Round 2: Ran a second, tightly targeted crawl Used freeware tool: HTTrack Crawled only one records series in the hierarchy (7: Committee Reports) Uploaded crawl result to SRB at SDSC Now ready to attempt metadata extraction/injection Major epiphany: the crawl is PART of the archival process, not outside of it

4/19-21/2006J. M. Deken Future Proof II SLAC Archives & History Office 9 Metadata Development Parallel activity: constructing metadata scheme Compatible with NARA LCDRG (Life-Cycle Data Requirements Guide) Informed by current best practices : Dublin Core Arizona Model PREMIS (PREservation Metadata Implementation Strategies – OCLC) issued 2005—not studied in depth PREMIS Hodge, et al. Discussed metadata attributes with collaborators

4/19-21/2006J. M. Deken Future Proof II SLAC Archives & History Office 10 Metadata Development Evolving as the crawl analysis progresses Two levels of metadata Collection level Item level Two main categories of metadata Injected – externally applied, manually or automatically Extracted – automatically pulled out of the content of the electronic records 7.html

4/19-21/2006J. M. Deken Future Proof II SLAC Archives & History Office 11 Metadata Development Six sub-categories of metadata 1. slac.gov – NARA/DOE required attributesslac.gov slac.gov.recordgroup slac.gov.agency slac.gov.referenceby slac.gov.schedule slac.gov.series slac.gov.description slac.gov.retention

4/19-21/2006J. M. Deken Future Proof II SLAC Archives & History Office 12 Metadata Development Six sub-categories of metadata 2. slac.creator – all flavors of creatorsslac.creator slac.creator.organization slac.creator.division slac.creator.group slac.creator.person slac.creator.owner

4/19-21/2006J. M. Deken Future Proof II SLAC Archives & History Office 13 Metadata Development Six sub-categories of metadata 3. slac.descriptionslac.description slac.description.type slac.description.by slac.description.date slac.description.remarks slac.description.local slac.description.webplatform slac.description.format slac.description.filesize

4/19-21/2006J. M. Deken Future Proof II SLAC Archives & History Office 14 Metadata Development Six sub-categories of metadata 4. slac.identifier – attributes that identify this copy of the electronic entityslac.identifier slac.identifier.storagelocation slac.identifier.persistent [others may be developed…]

4/19-21/2006J. M. Deken Future Proof II SLAC Archives & History Office 15 Metadata Development Six sub-categories of metadata 5. slac.capture – attributes that detail how the electronic entity was gathered for archivingslac.capture slac.capture.tool slac.capture.settings slac.capture.sitemap slac.capture.date slac.capture.contact slac.capture.remarks

4/19-21/2006J. M. Deken Future Proof II SLAC Archives & History Office 16 Metadata Development Six sub-categories of metadata 6. slac.date – date of archived entity, rather than of any processing/handling of entityslac.date slac.date.begun slac.date.modified

4/19-21/2006J. M. Deken Future Proof II SLAC Archives & History Office 17 SLD Collection Arrangement Arranged descriptions hierarchically: 30 descriptions  5 series: 1B1a Administrative records 1B8 Computer code documentation 1B9a Technical documents 1B10 Supporting technical information 1B13a Evaluated or summarized data Based on relevant DOE Records Control Schedule (RCS) items

4/19-21/2006J. M. Deken Future Proof II SLAC Archives & History Office 18 SLD Collection Arrangement Linked to NARA DOE Records Control Schedule for Research and Development Records recsV5.htm Analyzed how electronic records series relate to SLD paper records: Duplicates? Supplements? Entirely new/different content?

4/19-21/2006J. M. Deken Future Proof II SLAC Archives & History Office 19 Next steps … Crawling Automate further analysis? Comparing what we have crawled with the records descriptions, to see how completely the crawl captured the desired sites. Part automatic and part manual Why are we taking from a web crawl rather than from the machine? Benefit: will pick up the linked information. Drawback: has limitations/boundaries (dynamic pages)

4/19-21/2006J. M. Deken Future Proof II SLAC Archives & History Office 20 Next Steps… Collection Arrangement Trial run on Committee Reports Series Upload to SRB (done) Try out PAWN tool (Producer Archive Workflow Network – UMd) (beginning in April 2006) Transfer electronically to NARA ERA Evaluate results Replicate process with a second SLD records series… and a third series…

4/19-21/2006J. M. Deken Future Proof II SLAC Archives & History Office 21 Next Steps… Metadata Trial run on Committee Reports Series Upload to SRB (done) Automate injection of metadata (with GaTech tools – beginning in April 2006) Automate extraction of metadata (“ “ “ “ “) Evaluate results Develop crawl parameters metadata that could possibly be generalized across several crawl tools? Look in-depth at PREMISPREMIS

4/19-21/2006J. M. Deken Future Proof II SLAC Archives & History Office 22 Next Steps… Beyond PAT Establish Electronic Records archiving program at SLAC Institutional commitment Financial support Who is an Archival IT professional? What type of background? What kind of position description What sort of pay scale/compensation? How and where recruited?

4/19-21/2006J. M. Deken Future Proof II SLAC Archives & History Office 23 Next Steps… Beyond PAT Archival Primer for IT professionals (?) NARA ERM Guidance on the Web ( mgmt/initiatives/erm-guidance.html), Fast-Track Guidance Productshttp:// mgmt/initiatives/erm-guidance.html Preliminary Planning for Electronic Recordkeeping: Checklist for IT Staff Preliminary Planning for Electronic Recordkeeping: Checklist for IT Staff Preliminary Planning for Electronic Recordkeeping: Checklist for RM Staff Preliminary Planning for Electronic Recordkeeping: Checklist for RM Staff

4/19-21/2006J. M. Deken Future Proof II SLAC Archives & History Office 24 Conclusion All SLAC work products for the PAT project are online, at Home page for entire PAT project is My address: