Presentation is loading. Please wait.

Presentation is loading. Please wait.

Project Planning Workshop Woods Hole July 11-13, 2005 Multi-Institution Testbed for Scalable Digital Archiving NSF CISE/Library of Congress DIGARCH Award.

Similar presentations

Presentation on theme: "Project Planning Workshop Woods Hole July 11-13, 2005 Multi-Institution Testbed for Scalable Digital Archiving NSF CISE/Library of Congress DIGARCH Award."— Presentation transcript:

1 Project Planning Workshop Woods Hole July 11-13, 2005 Multi-Institution Testbed for Scalable Digital Archiving NSF CISE/Library of Congress DIGARCH Award Stephen Miller Scripps Institution of Oceanography Bob Detrick Woods Hole Oceanographic Institution John Helly San Diego Supercomputer Center

2 Our DIGARCH project website SIO/WHOI/SDSC Plone-driven All members can upload documents

3 Why are NSF and the Library of Congress interested? Digital archiving and preservation

4 A long history of backup and recovery Capital burned August 19, 1814 Library of Congress offsite recovery Thomas Jefferson’s Library

5 What is the national DIGARCH program? Bill Lefurgy Library of Congress Larry Brandt NSF/CISE 10 new awards “Produce results within 1 year”

6 Alignment: SIO/WHOI needs Match DIGARCH interests

7 3. Cyber-capabilities 2. Barriers to advances 1. Community Goals

8 Broad support Across disciplines And institutions Research And education 1. Community Goals

9 Guarantee long term preservation Gulf of California 1939 Expedition, R/V E W Scripps

10 Need more than data storage Need metadata Enable re-use Also need infrastructure Networked community tools, archives, understanding

11 Why re-use data? New ship time expensive ($22K/day) Use archives for: 1. Regional synthesis projects 3. Support other disciplines 3. Monitor environmental changes through time Before and after Earthquakes, slumps, seeps Volcanoes …

12 2. Barriers to advances

13 Data from a firehose Can we keep up? Shipboard data rates – yes Satellite links – maybe depends on heading Metadata – yes, but not widely implemented Preservation – maybe Community usage –help needed from Cyberinfrastructure Tiffany Houghton, SDSC, on R/V Sproul

14 We can archive from paper documents Track plots Cruise reports Handwritten and printed data

15 But digital preservation is risky business Endangered Species 9-track tapes Exabytes fail Even CDs fail RAIDS fail “Shoe-box” archiving not to be trusted

16 Solution: Active Archiving “Don’t trust any media, person or process” Actively monitor status Migrate to new storage media Mirror on multiple systems daily Backup to independent sites Technology makes this possible, just need to do it

17 3. Emerging Cyber-capabilities SIOExplorer digital library Design for scalability Automate harvesting Collection Builder’s Toolkit for other projects Crossing institutional boundaries Multi-Institution Testbed SIO, WHOI, SDSC

18 SIOExplorer Digital Library Community access Data Images Documents 647 cruises 150,000 objects Multiple federated collections

19 Collection status board Live on web Auto-updated Monitor status of 800 cruises, work in progress 4000 files, 10 GB per cruise Click for individual cruise status

20 Issue for future use: Access to complete cruise collections Current practice hit-or-miss Only selected data streams archived Cyberinfrastructure allows comprehensive solution Auto-harvesting and archiving Alvin and Jason data in context of entire cruise Claim: Very little additional cost to archive everything

21 Design to Overcome Project Barriers Build scalable digital library Federate independent authorities 4 Operational collections 3 Work-in-progress John Helly, IT Architect, SDSC

22 Multiple access methods Google No interface Just type name of cruise Basic web form Text-based search for experts Java CruiseViewer Full graphical search Web services Computer-to-computer Enable next generation interoperability

23 Don Sutton, SDSC Java CruiseViewer Full graphical search All capabilities Any combination of collections Metadata Oracle or PostgreSQL Data Storage Resource Broker User Graphical search Keyword search Search results for visualization objects Discover content Browse metadata View or download objects

24 Launch visualization experiences Visualization of multibeam seafloor mapping swath sonar data 300 cruises since 1982 20-km wide swaths Sonar quality control Geological research Education Download free viewer

25 Other organizations using mtf technology CUAHSI Consortium of Universities for Advanced Hydrologic Science, Inc. Major technology co-development 95 institutional members WHOI – DIGARCH Multi-Institution Testbed project Bob Detrick CCOM/UNH cruise and multibeam archives Jim Case, Larry Mayer MBARI – Monterey Bay Aquarium Research Institute collection building in progress Dave Caress, Andrew Chase SOEST/HAWAII – April 4-26, 2005 realtime digital library testing R/V Kilo Moana NIWA – Digital-Library-in-a-Box tested on R/V Tangaroa in New Zealand John Helly, Don Robertson Arctic DMS - Data Management System under development Margo Edwards (Hawaii), Dawn Wright (Oregon State)

26 Closely related project – IODP Site Survey Data Bank 6-9 years of support Digital Library Technology Modular metadata tools Webform user interfaces Reliable servers and storage IODP interested in access to SIO and WHOI collections Cruise Alvin Jason

27 Multi-Institution Testbed for Scalable Digital Archiving Extend SIOExplorer approach to WHOI Integrate SIO, SDSC and WHOI tools and data 30 years of WHOI cruise data 4098 Alvin submersible dives Jason ROV surveys (200 DVD per cruise) Results from 1600 NSF awards online

28 WHOI cruises 800 cruises since 1930

29 4098 Alvin dives Since June 26, 1964

30 Project Challenges Auto-harvest data, metadata “Shoe-box archives” only prior to 2002 Build distributed digital library Both institutions Ships and submersibles Extend WHOI data exploration tools Persistent digital library objects Interoperability across institutions

31 Project Facilities UCSD server San Diego Supercomputer Center Dell PowerEdge 2850 server Dell PowerVault 220S SCSI storage (4 TB) Staging and backup area Geological Data Center, SIO Dell PowerEdge 2850 server Dell PowerVault 220S SCSI storage (2 TB) Also Sun workstations 4 RAID systems WHOI server Dell PowerEdge Storage Dru Clark, Uta Peckman at GDC

32 Project Identity Decision Do we maintain separate identities? SIOExplorer WHOIexplorer Or create new integrated system OceanExplorer (or other name) Select collectionsSIO or WHOI Future expansion LDEO, UH, UW, NGDC, even IFREMER In either case archives will be distributed and replicated

33 What do we need to accomplish this year? Proof of concept for Library of Congress / NSF Working multi-institution testbed for archiving Define achievable goals Presentations AGU Abstracts due Sept 8, meeting Dec 5-9 (San Francisco) DIGARCH All-PI and digital government conference May 21-24 2006 (Marina del Rey?) Preparation for continued effort Identify sources of funding

34 Future plans 1 year no-cost extension Complete the prototype testbed New support for Harvesting at-risk legacy data Cruises, Alvin, Jason Harvesting data from new cruises Other ideas? Datasets to add Technology for archiving and display Partnerships

Download ppt "Project Planning Workshop Woods Hole July 11-13, 2005 Multi-Institution Testbed for Scalable Digital Archiving NSF CISE/Library of Congress DIGARCH Award."

Similar presentations

Ads by Google