Multi-Institution Testbed for Scalable Digital Archiving NSF CISE/Library of Congress DIGARCH Award Stephen Miller Scripps Institution of Oceanography.

Slides:



Advertisements
Similar presentations
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids for Collection Federation Reagan W. Moore University.
Advertisements

U.S. Government Printing Office Packaging and Metadata PREMIS Implementers Panel Library of Congress June 13, 2007.
Digital Collections: Storage and Access Jon Dunn Assistant Director for Technology IU Digital Library Program
DSpace: the MIT Libraries Institutional Repository MacKenzie Smith, MIT EDUCAUSE 2003, November 5 th Copyright MacKenzie Smith, This work is the.
A new Network Concept for transporting and storing digital video…………
Visualizing Fitness for Purpose Bob Groman and Dicky Allison Biological and Chemical Oceanography Data Management Office Woods Hole Oceanographic Institution.
The Documentum Team Lance Callaway, Brooke Durbin, Perry Koob, Lorie McMillin, Jennifer Song Missouri University of Science and Technology Rolla, Missouri.
Background Chronopolis Goals Data Grid supporting a Long-term Preservation Service Data Migration Data Migration to next generation technologies Trust.
The Digital Preservation Network at UT Austin Chris Jordan Texas Advanced Computing Center.
Arctic Observing Viewer a web mapping application for AON data collection sites
Components of an Integrated Environmental Observatory Information System Cyberinfrastructure to Support Publication of Water Resources Data Jeffery S.
Chronopolis: Preserving Our Digital Heritage David Minor UC San Diego San Diego Supercomputer Center.
Robust Tools for Archiving and Preserving Digital Data Joseph JaJa, Mike Smorul, and Mike McGann Institute for Advanced Computer Studies Department of.
Dogan Seber, PhD San Diego Supercomputer Center University of California, San Diego I. DLESE Library II. DISCOVER OUR EARTH Earth Science Resources for.
1 Archiving and Preserving the Web Kristine Hanna Internet Archive April 2006.
Bridging the Gap between Libraries and Data Archives: Progress Report Roger Revelle, Gulf of California Expedition, 1939 JISC/NSF Digital Libraries Initiative.
Tools for accessing distributed in-situ data collections Donald W. Denbo, NOAA/PMEL-JISAO Jason E. Fabritz, NOAA/PMEL-JISAO Bernard J. Kilonsky, Sea Level.
Computing in Atmospheric Sciences Workshop: 2003 Challenges of Cyberinfrastructure Alan Blatecky Executive Director San Diego Supercomputer Center.
Data Management Practices: BCO-DMO’s Successes and Challenges Bob Groman BCO-DMO Woods Hole Oceanographic Institution NERACOOS/NeCODP Data Management Workshop.
The Digital Library for Earth System Education: A Community Resource
Project Planning Workshop Woods Hole July 11-13, 2005 Multi-Institution Testbed for Scalable Digital Archiving NSF CISE/Library of Congress DIGARCH Award.
Project Builder and MediaMatrix: Redefining Access in the Digital Age Dean Rehberger and Michael Fegan MERLOT August 7-10, 2006 New Orleans, LA.
Improving user engagement in a data repository with web analytics LITA Forum November 7, 2013 Heather CoatesSummer Durrant Digital Scholarship & Data Management.
Publisher’s Perspective: Digitization of print resources, and archiving of digital resources Judy Best, June 13, 2006.
U.S. Department of the Interior U.S. Geological Survey Management of Oceanographic time-series data at the Woods Hole Coastal and Marine Science Center.
GEO: a special collection for Earth Science community *Stefania Biagioni, *Silvia Giannini, **Cecilia Giussani *CNR-ISTI, **CNR-IGG Pisa, Italy GL13 Conference,
GPO’s Federal Digital System August 17, 2010 U.S. Government Printing Office.
Mind the Gap: Finding Data Across Decades and Disciplines with the SSDB Stephen P. Miller 1, P. Dru Clark 1, Jacob M. Perez 1, Aaron D. Sweeney 1, John.
1PeopleDocumentsData Catalog Generation Tools Analysis and Visualization Tools Data Services Discovery and Publication Tools Discovery and Publication.
Mind the Gap: Finding Data Across Decades and Disciplines with the SSDB Stephen P. Miller 1, P. Dru Clark 1, Jacob M. Perez 1, Aaron D. Sweeney 1, John.
CUAHSI: A University Consortium for Hydrologic Science Richard P. Hooper, Executive Director Consortium of Universities for the Advancement of Hydrologic.
Recommend SSDB FY06 Priorities Oct – Sep Provide access 2.Respond to reviews 3.Add new capabilities 4.Action items SSDB Advisory Board.
NanoHUB.org and HUBzero™ Platform for Reproducible Computational Experiments Michael McLennan Director and Chief Architect, Hub Technology Group and George.
CONTENT DISCOVERY, SERVICES, AND SUSTAINED ACCESS Timothy Cole, William Mischo, Beth Sandore, Sarah Shreeves ~ University of Illinois Library
Rule-Based Preservation Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar Richard Marciano {moore, schroede, mwan, sekar,
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Management of Distributed Data Reagan W. Moore.
National Center for Supercomputing Applications Barbara S. Minsker, Ph.D. Associate Professor National Center for Supercomputing Applications and Department.
The Digital Library for Earth System Science: Contributing resources and collections GCCS Internship Orientation Holly Devaul 19 June 2003.
GPO’s Federal Digital System December 10, 2009 U.S. Government Printing Office.
WHOI and SIO (II): Next Steps Towards Multi-Institution Archiving of Shipboard and Deep Submergence Vehicle Data (IN51A-0306) The Woods Hole Oceanographic.
Cyberinfrastructure What is it? Russ Hobby Internet2 Joint Techs, 18 July 2007.
June 20, 2007ESRI Intl. User Conference Dawn Wright - Oregon State University Val Cummins - Coastal & Marine Resources Centre, IRELAND Liz O’Dea - Coastal.
Selene Dalecky March 20, 2007 FDsys: GPO’s Digital Content System.
“A Library outranks any other one thing a community can do to benefit its people.” --Andrew Carnegie.
Breakout # 1 – Data Collecting and Making It Available Data definition “ Any information that [environmental] researchers need to accomplish their tasks”
GEOSCIENCE NEEDS & CHALLENGES Dogan Seber San Diego Supercomputer Center University of California, San Diego, USA.
November 16, 2009 Page 1 of 28 Data and Data Management: Introduction to the BCO-DMO Presented to Professor Keiichi Uchida November 16, 2009 Robert C.
Millman—Nov 04—1 An Update on Digital Libraries David Millman Director of Research & Development Academic Information Systems Columbia University
National Library of the Czech Republic Integration of digital materials into EDL Adolf Knoll National Library of the Czech Republic Helsinki CENL Workshop.
29 March 2004 Steven Worley, NSF/NCAR/SCD 1 Research Data Stewardship and Access Steven Worley, CISL/SCD Cyberinfrastructure meeting with Priscilla Nelson.
OOI-CYBERINFRASTRUCTURE OOI Cyberinfrastructure Education and Public Awareness Plan Cyberinfrastructure Design Workshop October 17-19, 2007 University.
Institutional Repositories: the DSpace Experience Ann J. Wolpert Director of Libraries Massachusetts Institute of Technology.
April 14, 2005MIT Libraries Visiting Committee Libraries Strategic Plan Theme III Work to shape the future MacKenzie Smith Associate Director for Technology.
Cyberinfrastructure Overview Russ Hobby, Internet2 ECSU CI Days 4 January 2008.
Cyberinfrastructure: Many Things to Many People Russ Hobby Program Manager Internet2.
SIOExplorer Stephen Miller Scripps Institution of Oceanography USA International Data Exchange Workshop Building a Global Data Network for Studies of Earth.
National Archives and Records Administration1 Integrated Rules Ordered Data System (“IRODS”) Technology Research: Digital Preservation Technology in a.
The Earth Information Exchange. Portal Structure Portal Functions/Capabilities Portal Content ESIP Portal and Geospatial One-Stop ESIP Portal and NOAA.
SSDB Progress Report Site Survey Panel Meeting CIRE, Sapporo, Japan July 22, 2006 John Weatherford San Diego Supercomputer Center Subcontract to IODP-MI.
The launching of an expedition has its own brand of excitement, with the sound of the main engines firing up, and the lifting of the gangway in a foreign.
Working prototype Multi-Institution Testbed for Scalable Digital Archiving Three institutions are working together to rescue at-risk media, establish interoperability,
Rolling Deck to Repository (R2R): How to Systematically Document Quality for the New Era of Data Re-Usability? AGU Poster IN21B-1048 AGU Fall Meeting December.
Collection-Based Persistent Archives Arcot Rajasekar, Richard Marciano, Reagan Moore San Diego Supercomputer Center Presented by: Preetham A Gowda.
SIOExplorer: Digital Library Projects R/V Alexander Agassiz November, 1907 UCSD Libraries Scripps Institution of Oceanography San Diego Supercomputer Center.
Accessing Data from Ship Operating Institutions R/V Alexander Agassiz November, 1907 Stephen P. Miller Geological Data Center Scripps Institution of Oceanography.
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
Data and Data Management: Introduction to the BCO-DMO
on Seamounts Research Coordination Network
The Only Digital Asset Management System on Microsoft Azure, MediaValet Is Uniquely Equipped to Meet Any Company’s Needs MICROSOFT AZURE ISV PROFILE: MEDIAVALET.
Long-Lived Data Collections
Presentation transcript:

Multi-Institution Testbed for Scalable Digital Archiving NSF CISE/Library of Congress DIGARCH Award Stephen Miller Scripps Institution of Oceanography Bob Detrick Woods Hole Oceanographic Institution John Helly San Diego Supercomputer Center Atlanta

3. Cyber-capabilities 2. Barriers to advances 1. Community Goals

Broad support Across disciplines And institutions Research And education 1. Community Goals

Guarantee long term preservation Gulf of California 1939 Expedition, R/V E W Scripps

Need more than data storage Need metadata Enable re-use Also need infrastructure Networked community tools, archives, understanding

Why re-use data? New ship time expensive ($22K/day) Use archives for: 1. Regional synthesis projects 3. Support other disciplines 3. Monitor environmental changes through time Before and after Earthquakes, slumps, seeps Volcanoes …

April 16, 2005 New Volcanic Cone in the Vailulu'u Crater With a minimum rate of eight inches per day, a new cone has been growing inside the crater of Vailulu'u seamount since the last depth soundings by the US Coastguard vessel Polar Sea in April Our survey using the SIMRAD 120 system of the Kilo Moana displays a new volcanic summit at 708 m depth. This volcano was named Nafanua, after the Samoan Goddess of War. Hubert Staudigel, Stan Hart, John Helly, Anthony Koppers, Jasper Kontor

2. Barriers to advances

Data from a firehose Can we keep up? Shipboard data rates – yes Satellite links – maybe depends on heading Metadata – yes, but not widely implemented Preservation – maybe Community usage –help needed from Cyberinfrastructure Tiffany Houghton, SDSC, on R/V Sproul

We can archive from paper documents Track plots Cruise reports Handwritten and printed data

But digital preservation is risky business Endangered Species 9-track tapes Exabytes fail Even CDs fail RAIDS fail “Shoe-box” archiving not to be trusted

Solution: Active Archiving “Don’t trust any media, person or process” Actively monitor status Migrate to new storage media Mirror on multiple systems daily Backup to independent sites Technology makes this possible, just need to do it

Example of early backup Capital burned August 19, 1814 Library of Congress offsite backup Thomas Jefferson’s Library

3. Emerging Cyber-capabilities SIOExplorer digital library Design for scalability Automate harvesting Collection Builder’s Toolkit for other projects Crossing institutional boundaries Multi-Institution Testbed SIO, WHOI, SDSC

SIOExplorer Digital Library Community access Data Images Documents 647 cruises 150,000 objects 500 GB Multiple federated collections

Collection status board Live on web Auto-updated Monitor status of 800 cruises, work in progress 4000 files, 10 GB per cruise Click for individual cruise status

Issue for future use: Access to complete cruise collections Current practice hit-or-miss Only selected data streams archived Cyberinfrastructure allows comprehensive solution Auto-harvesting and archiving Data and metadata Claim: Very little additional cost to archive everything

Design to Overcome Project Barriers Build scalable digital library Federate independent authorities 4 Operational collections 3 Work-in-progress John Helly, IT Architect, SDSC

Multiple access methods Google No interface Just type name of cruise Basic web form Text-based search for experts Java CruiseViewer Full graphical search Web services Computer-to-computer Enable next generation interoperability

Don Sutton, SDSC Java CruiseViewer Full graphical search All capabilities Any combination of collections Metadata Oracle or PostgreSQL Data Storage Resource Broker User Graphical search Keyword search Search results for visualization objects Discover content Browse metadata View or download objects

Launch visualization experiences Visualization of multibeam seafloor mapping swath sonar data 300 cruises since km wide swaths Sonar quality control Geological research Education Download free viewer

Broader Impact with ERESE National Teachers Workshops Enduring Resources for Earth Science Education Two-week summer workshops 2004 and 2005 Build inquiry-driven learning experiences

Other organizations using mtf technology CUAHSI Consortium of Universities for Advanced Hydrologic Science, Inc. Major technology co-development 95 institutional members WHOI – DIGARCH Multi-Institution Testbed project Bob Detrick CCOM/UNH cruise and multibeam archives Jim Case, Larry Mayer MBARI – Monterey Bay Aquarium Research Institute collection building in progress Dave Caress, Andrew Chase SOEST/HAWAII – April 4-26, 2005 realtime digital library testing R/V Kilo Moana NIWA – Digital-Library-in-a-Box tested on R/V Tangaroa in New Zealand John Helly, Don Robertson Arctic DMS - Data Management System under development Margo Edwards (Hawaii), Dawn Wright (Oregon State)

Multi-Institution Testbed for Scalable Digital Archiving Extend SIOExplorer approach to WHOI Integrate SIO, SDSC and WHOI tools and data 30 years of WHOI cruise data 4098 Alvin submersible dives Jason ROV surveys (200 DVD per cruise) Results from 1600 NSF awards online

Project Challenges Auto-harvest data, metadata “Shoe-box archives” only prior to 2002 Build distributed digital library Both institutions Ships and submersibles Extend WHOI data exploration tools Persistent digital library objects Interoperability across institutions

WHOI cruises 800 cruises since 1930

4098 Alvin dives Since June 26, 1964