Project Planning Workshop Woods Hole July 11-13, 2005 Multi-Institution Testbed for Scalable Digital Archiving NSF CISE/Library of Congress DIGARCH Award.

Slides:



Advertisements
Similar presentations
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids for Collection Federation Reagan W. Moore University.
Advertisements

Rolling Deck to Repository: Transforming the United States Academic Fleet Into an Integrated Global Observing System Suzanne M. Carbotte, Robert Arko,
Digital Collections: Storage and Access Jon Dunn Assistant Director for Technology IU Digital Library Program
Joint CASC/CCI Workshop Report Strategic and Tactical Recommendations EDUCAUSE Campus Cyberinfrastructure Working Group Coalition for Academic Scientific.
A new Network Concept for transporting and storing digital video…………
Visualizing Fitness for Purpose Bob Groman and Dicky Allison Biological and Chemical Oceanography Data Management Office Woods Hole Oceanographic Institution.
Data Grid: Storage Resource Broker Mike Smorul. SRB Overview Developed at San Diego Supercomputing Center. Provides the abstraction mechanisms needed.
Background Chronopolis Goals Data Grid supporting a Long-term Preservation Service Data Migration Data Migration to next generation technologies Trust.
The Digital Preservation Network at UT Austin Chris Jordan Texas Advanced Computing Center.
Arctic Observing Viewer a web mapping application for AON data collection sites
Chronopolis: Preserving Our Digital Heritage David Minor UC San Diego San Diego Supercomputer Center.
Robust Tools for Archiving and Preserving Digital Data Joseph JaJa, Mike Smorul, and Mike McGann Institute for Advanced Computer Studies Department of.
Tools and Services for the Long Term Preservation and Access of Digital Archives Joseph JaJa, Mike Smorul, and Sangchul Song Institute for Advanced Computer.
Mike Smorul Saurabh Channan Digital Preservation and Archiving at the Institute for Advanced Computer Studies University of Maryland, College Park.
Dogan Seber, PhD San Diego Supercomputer Center University of California, San Diego I. DLESE Library II. DISCOVER OUR EARTH Earth Science Resources for.
1 Archiving and Preserving the Web Kristine Hanna Internet Archive April 2006.
Data-PASS Shared Catalog Micah Altman & Jonathan Crabtree 1 Micah Altman Harvard University Archival Director, Henry A. Murray Research Archive Associate.
Data Management Practices: BCO-DMO’s Successes and Challenges Bob Groman BCO-DMO Woods Hole Oceanographic Institution NERACOOS/NeCODP Data Management Workshop.
Multi-Institution Testbed for Scalable Digital Archiving NSF CISE/Library of Congress DIGARCH Award Stephen Miller Scripps Institution of Oceanography.
The Marine Metadata Interoperability Project
Project Builder and MediaMatrix: Redefining Access in the Digital Age Dean Rehberger and Michael Fegan MERLOT August 7-10, 2006 New Orleans, LA.
SSDB Advisory Board Discussion Topics 1.Work-in-Progress 2. New Initiatives Scripps Institution of Oceanography February 21, 2006 Subcontract to IODP-MI.
ECHO DEPository Project: Highlight on tools & emerging issues The ECHO DEPository Project is a 3-year digital preservation research and development project.
Rule-Based Data Management Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar {moore, schroede, mwan, {moore, schroede, mwan,
U.S. Department of the Interior U.S. Geological Survey Management of Oceanographic time-series data at the Woods Hole Coastal and Marine Science Center.
ESRI User Conference, August 8, 2006 Long-term archiving of geospatial data: the NGDA project Julie Sweetkind-Singer John Banning Stanford University.
GPO’s Federal Digital System August 17, 2010 U.S. Government Printing Office.
Mind the Gap: Finding Data Across Decades and Disciplines with the SSDB Stephen P. Miller 1, P. Dru Clark 1, Jacob M. Perez 1, Aaron D. Sweeney 1, John.
1PeopleDocumentsData Catalog Generation Tools Analysis and Visualization Tools Data Services Discovery and Publication Tools Discovery and Publication.
Mind the Gap: Finding Data Across Decades and Disciplines with the SSDB Stephen P. Miller 1, P. Dru Clark 1, Jacob M. Perez 1, Aaron D. Sweeney 1, John.
CUAHSI: A University Consortium for Hydrologic Science Richard P. Hooper, Executive Director Consortium of Universities for the Advancement of Hydrologic.
Recommend SSDB FY06 Priorities Oct – Sep Provide access 2.Respond to reviews 3.Add new capabilities 4.Action items SSDB Advisory Board.
NanoHUB.org and HUBzero™ Platform for Reproducible Computational Experiments Michael McLennan Director and Chief Architect, Hub Technology Group and George.
Experts Workshop on the IPT, v. 2, Copenhagen, Denmark The Pathway to the Integrated Publishing Toolkit version 2 Tim Robertson Systems Architect Global.
Rule-Based Preservation Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar Richard Marciano {moore, schroede, mwan, sekar,
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Management of Distributed Data Reagan W. Moore.
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Archive for the NSDL Reagan W. Moore Charlie Cowart.
National Center for Supercomputing Applications Barbara S. Minsker, Ph.D. Associate Professor National Center for Supercomputing Applications and Department.
GPO’s Federal Digital System December 10, 2009 U.S. Government Printing Office.
WHOI and SIO (II): Next Steps Towards Multi-Institution Archiving of Shipboard and Deep Submergence Vehicle Data (IN51A-0306) The Woods Hole Oceanographic.
Cyberinfrastructure What is it? Russ Hobby Internet2 Joint Techs, 18 July 2007.
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
NEES Cyberinfrastructure Center at the San Diego Supercomputer Center, UCSD George E. Brown, Jr. Network for Earthquake Engineering Simulation Analyzing.
June 20, 2007ESRI Intl. User Conference Dawn Wright - Oregon State University Val Cummins - Coastal & Marine Resources Centre, IRELAND Liz O’Dea - Coastal.
Breakout # 1 – Data Collecting and Making It Available Data definition “ Any information that [environmental] researchers need to accomplish their tasks”
GEOSCIENCE NEEDS & CHALLENGES Dogan Seber San Diego Supercomputer Center University of California, San Diego, USA.
November 16, 2009 Page 1 of 28 Data and Data Management: Introduction to the BCO-DMO Presented to Professor Keiichi Uchida November 16, 2009 Robert C.
1 NSF/TeraGrid Science Advisory Board Meeting July 19-20, San Diego, CA Brief TeraGrid Overview and Expectations of Science Advisory Board John Towns TeraGrid.
Millman—Nov 04—1 An Update on Digital Libraries David Millman Director of Research & Development Academic Information Systems Columbia University
National Library of the Czech Republic Integration of digital materials into EDL Adolf Knoll National Library of the Czech Republic Helsinki CENL Workshop.
29 March 2004 Steven Worley, NSF/NCAR/SCD 1 Research Data Stewardship and Access Steven Worley, CISL/SCD Cyberinfrastructure meeting with Priscilla Nelson.
April 14, 2005MIT Libraries Visiting Committee Libraries Strategic Plan Theme III Work to shape the future MacKenzie Smith Associate Director for Technology.
Cyberinfrastructure: Many Things to Many People Russ Hobby Program Manager Internet2.
Fire Emissions Network Sept. 4, 2002 A white paper for the development of a NSF Digital Government Program proposal Stefan Falke Washington University.
SIOExplorer Stephen Miller Scripps Institution of Oceanography USA International Data Exchange Workshop Building a Global Data Network for Studies of Earth.
National Archives and Records Administration1 Integrated Rules Ordered Data System (“IRODS”) Technology Research: Digital Preservation Technology in a.
National Geophysical Data Center (NGDC) U.S. Department of Commerce National Oceanic & Atmospheric Administration National Geophysical Data Center (NGDC)
The Earth Information Exchange. Portal Structure Portal Functions/Capabilities Portal Content ESIP Portal and Geospatial One-Stop ESIP Portal and NOAA.
SAN DIEGO SUPERCOMPUTER CENTER Replication Policies for Federated Digital Repositories Robert H. McDonald Chronopolis Project Manager
SSDB Progress Report Site Survey Panel Meeting CIRE, Sapporo, Japan July 22, 2006 John Weatherford San Diego Supercomputer Center Subcontract to IODP-MI.
The launching of an expedition has its own brand of excitement, with the sound of the main engines firing up, and the lifting of the gangway in a foreign.
Working prototype Multi-Institution Testbed for Scalable Digital Archiving Three institutions are working together to rescue at-risk media, establish interoperability,
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Rolling Deck to Repository (R2R): How to Systematically Document Quality for the New Era of Data Re-Usability? AGU Poster IN21B-1048 AGU Fall Meeting December.
Collection-Based Persistent Archives Arcot Rajasekar, Richard Marciano, Reagan Moore San Diego Supercomputer Center Presented by: Preetham A Gowda.
SIOExplorer: Digital Library Projects R/V Alexander Agassiz November, 1907 UCSD Libraries Scripps Institution of Oceanography San Diego Supercomputer Center.
Store and exchange data with colleagues and team Synchronize multiple versions of data Ensure automatic desktop synchronization of large files B2DROP is.
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
Joseph JaJa, Mike Smorul, and Sangchul Song
Presentation transcript:

Project Planning Workshop Woods Hole July 11-13, 2005 Multi-Institution Testbed for Scalable Digital Archiving NSF CISE/Library of Congress DIGARCH Award Stephen Miller Scripps Institution of Oceanography Bob Detrick Woods Hole Oceanographic Institution John Helly San Diego Supercomputer Center

Our DIGARCH project website SIO/WHOI/SDSC Plone-driven All members can upload documents

Why are NSF and the Library of Congress interested? Digital archiving and preservation

A long history of backup and recovery Capital burned August 19, 1814 Library of Congress offsite recovery Thomas Jefferson’s Library

What is the national DIGARCH program? Bill Lefurgy Library of Congress Larry Brandt NSF/CISE 10 new awards “Produce results within 1 year”

Alignment: SIO/WHOI needs Match DIGARCH interests

3. Cyber-capabilities 2. Barriers to advances 1. Community Goals

Broad support Across disciplines And institutions Research And education 1. Community Goals

Guarantee long term preservation Gulf of California 1939 Expedition, R/V E W Scripps

Need more than data storage Need metadata Enable re-use Also need infrastructure Networked community tools, archives, understanding

Why re-use data? New ship time expensive ($22K/day) Use archives for: 1. Regional synthesis projects 3. Support other disciplines 3. Monitor environmental changes through time Before and after Earthquakes, slumps, seeps Volcanoes …

2. Barriers to advances

Data from a firehose Can we keep up? Shipboard data rates – yes Satellite links – maybe depends on heading Metadata – yes, but not widely implemented Preservation – maybe Community usage –help needed from Cyberinfrastructure Tiffany Houghton, SDSC, on R/V Sproul

We can archive from paper documents Track plots Cruise reports Handwritten and printed data

But digital preservation is risky business Endangered Species 9-track tapes Exabytes fail Even CDs fail RAIDS fail “Shoe-box” archiving not to be trusted

Solution: Active Archiving “Don’t trust any media, person or process” Actively monitor status Migrate to new storage media Mirror on multiple systems daily Backup to independent sites Technology makes this possible, just need to do it

3. Emerging Cyber-capabilities SIOExplorer digital library Design for scalability Automate harvesting Collection Builder’s Toolkit for other projects Crossing institutional boundaries Multi-Institution Testbed SIO, WHOI, SDSC

SIOExplorer Digital Library Community access Data Images Documents 647 cruises 150,000 objects Multiple federated collections

Collection status board Live on web Auto-updated Monitor status of 800 cruises, work in progress 4000 files, 10 GB per cruise Click for individual cruise status

Issue for future use: Access to complete cruise collections Current practice hit-or-miss Only selected data streams archived Cyberinfrastructure allows comprehensive solution Auto-harvesting and archiving Alvin and Jason data in context of entire cruise Claim: Very little additional cost to archive everything

Design to Overcome Project Barriers Build scalable digital library Federate independent authorities 4 Operational collections 3 Work-in-progress John Helly, IT Architect, SDSC

Multiple access methods Google No interface Just type name of cruise Basic web form Text-based search for experts Java CruiseViewer Full graphical search Web services Computer-to-computer Enable next generation interoperability

Don Sutton, SDSC Java CruiseViewer Full graphical search All capabilities Any combination of collections Metadata Oracle or PostgreSQL Data Storage Resource Broker User Graphical search Keyword search Search results for visualization objects Discover content Browse metadata View or download objects

Launch visualization experiences Visualization of multibeam seafloor mapping swath sonar data 300 cruises since km wide swaths Sonar quality control Geological research Education Download free viewer

Other organizations using mtf technology CUAHSI Consortium of Universities for Advanced Hydrologic Science, Inc. Major technology co-development 95 institutional members WHOI – DIGARCH Multi-Institution Testbed project Bob Detrick CCOM/UNH cruise and multibeam archives Jim Case, Larry Mayer MBARI – Monterey Bay Aquarium Research Institute collection building in progress Dave Caress, Andrew Chase SOEST/HAWAII – April 4-26, 2005 realtime digital library testing R/V Kilo Moana NIWA – Digital-Library-in-a-Box tested on R/V Tangaroa in New Zealand John Helly, Don Robertson Arctic DMS - Data Management System under development Margo Edwards (Hawaii), Dawn Wright (Oregon State)

Closely related project – IODP Site Survey Data Bank 6-9 years of support Digital Library Technology Modular metadata tools Webform user interfaces Reliable servers and storage IODP interested in access to SIO and WHOI collections Cruise Alvin Jason

Multi-Institution Testbed for Scalable Digital Archiving Extend SIOExplorer approach to WHOI Integrate SIO, SDSC and WHOI tools and data 30 years of WHOI cruise data 4098 Alvin submersible dives Jason ROV surveys (200 DVD per cruise) Results from 1600 NSF awards online

WHOI cruises 800 cruises since 1930

4098 Alvin dives Since June 26, 1964

Project Challenges Auto-harvest data, metadata “Shoe-box archives” only prior to 2002 Build distributed digital library Both institutions Ships and submersibles Extend WHOI data exploration tools Persistent digital library objects Interoperability across institutions

Project Facilities UCSD server San Diego Supercomputer Center Dell PowerEdge 2850 server Dell PowerVault 220S SCSI storage (4 TB) basalt.sdsc.edu Staging and backup area Geological Data Center, SIO Dell PowerEdge 2850 server Dell PowerVault 220S SCSI storage (2 TB) gdcdb.ucsd.edu Also Sun workstations 4 RAID systems WHOI server Dell PowerEdge Storage Dru Clark, Uta Peckman at GDC

Project Identity Decision Do we maintain separate identities? SIOExplorer WHOIexplorer Or create new integrated system OceanExplorer (or other name) Select collectionsSIO or WHOI Future expansion LDEO, UH, UW, NGDC, even IFREMER In either case archives will be distributed and replicated

What do we need to accomplish this year? Proof of concept for Library of Congress / NSF Working multi-institution testbed for archiving Define achievable goals Presentations AGU Abstracts due Sept 8, meeting Dec 5-9 (San Francisco) DIGARCH All-PI and digital government conference May (Marina del Rey?) Preparation for continued effort Identify sources of funding

Future plans 1 year no-cost extension Complete the prototype testbed New support for Harvesting at-risk legacy data Cruises, Alvin, Jason Harvesting data from new cruises Other ideas? Datasets to add Technology for archiving and display Partnerships