Bridging the Gap between Libraries and Data Archives: Progress Report Roger Revelle, Gulf of California Expedition, 1939 JISC/NSF Digital Libraries Initiative.

Slides:



Advertisements
Similar presentations
Subject Based Information Gateways in The UK Coordinated Activities in The UK Within the UK Higher Education community, the JISC (Joint Information Systems.
Advertisements

Background Chronopolis Goals Data Grid supporting a Long-term Preservation Service Data Migration Data Migration to next generation technologies Trust.
1. The Digital Library Challenge The Hybrid Library Today’s information resources collections are “hybrid” Combinations of - paper and digital format.
StatCat Building a Statistical Data Finder ssrs.yale.edu/statcat Steven Citron-Pousty Ann Green Julie Linden Yale University.
Using Metadata in CONTENTdm Diana Brooking and Allen Maberry Metadata Implementation Group, Univ. of Washington Crossing Organizational Boundaries Oct.
William Y. Arms Corporation for National Research Initiatives March 22, 1999 Object models, overlay journals, and virtual collections.
Tools and Services for the Long Term Preservation and Access of Digital Archives Joseph JaJa, Mike Smorul, and Sangchul Song Institute for Advanced Computer.
What is DLESE (part 1) Shelley Olds University Corporation for Atmospheric Research DLESE Program Center July 17 – 22, Resources.
Corporation For National Research Initiatives NSF SMETE Library Building the SMETE Library: Getting Started William Y. Arms.
Dogan Seber, PhD San Diego Supercomputer Center University of California, San Diego I. DLESE Library II. DISCOVER OUR EARTH Earth Science Resources for.
An Oceanographic Event Logger James R. Wilkinson and Karen S. Baker Scripps Institution of Oceanography, University of California San Diego Field Practices.
2 nd Training Workshop 4 – 5 June 2007 Common Data Index - CDI By Dick M.A Schaap Technical Coordinator SeaDataNet.
Digital Library Architecture and Technology
Digital Libraries: New Tools for ScienceTeaching and Learning.
Computing in Atmospheric Sciences Workshop: 2003 Challenges of Cyberinfrastructure Alan Blatecky Executive Director San Diego Supercomputer Center.
ABSTRACT Real-time systems applied to seismic data acquisition, asynchronous processing, and data archiving tasks have clearly demonstrated their utility.
Svein Arne Brygfjeld National Library of Norway Nordic Web Archive.
Resources for Teaching Teachers Earth Science Content and Pedagogy The Association for Science Teacher Education Rusty Low Shelley Olds January 2006.
Multi-Institution Testbed for Scalable Digital Archiving NSF CISE/Library of Congress DIGARCH Award Stephen Miller Scripps Institution of Oceanography.
PowerPoint 2003 – Level 1 Computer Concepts Cathy Horwitz April 25, 2011.
© NERC All rights reserved British Geological Survey Helen Glaves.
Project Planning Workshop Woods Hole July 11-13, 2005 Multi-Institution Testbed for Scalable Digital Archiving NSF CISE/Library of Congress DIGARCH Award.
NEPTUNE Canada Workshop Oceans 2.0 Project Environment NEPTUNE Canada DMAS Team Victoria, BC February 16, 2009.
“A Library outranks any other one thing a community can do to benefit its people.” Andrew Carnegie Mary R. Marlino, Ed.D. DLESE Program Center Presentation.
U.S. Department of the Interior U.S. Geological Survey CDI Webinar Sept. 5, 2012 Kevin T. Gallagher and Linda C. Gundersen September 5, 2012 CDI Science.
Indo-US Workshop, June23-25, 2003 Building Digital Libraries for Communities using Kepler Framework M. Zubair Old Dominion University.
U.S. Department of the Interior U.S. Geological Survey Management of Oceanographic time-series data at the Woods Hole Coastal and Marine Science Center.
Creating and Operating a Digital Library for Information and Learning– the GROW Project Muniram Budhu Department of Civil Engineering & Engineering Mechanics.
Metadata Lessons Learned Katy Ginger Digital Learning Sciences University Corporation for Atmospheric Research (UCAR)
Mind the Gap: Finding Data Across Decades and Disciplines with the SSDB Stephen P. Miller 1, P. Dru Clark 1, Jacob M. Perez 1, Aaron D. Sweeney 1, John.
Mind the Gap: Finding Data Across Decades and Disciplines with the SSDB Stephen P. Miller 1, P. Dru Clark 1, Jacob M. Perez 1, Aaron D. Sweeney 1, John.
The Digital Library for Earth System Science: Contributing resources and collections Meeting with GLOBE 5/29/03 Holly Devaul.
Semantic Web, Web Services and Museums: Mapping the Road to Implementation John Perkins “MESMUSES Workshop” Florence, June 16-17, 2003.
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Management of Distributed Data Reagan W. Moore.
ESIP & Geospatial One-Stop (GOS) Registering ESIP Products and Services with Geospatial One-Stop.
Beth Russell Scientific Communications and Data Specialist NOAA Science On a Sphere Data Updates for Science On a Sphere.
NOAA National Geophysical Data Center & collocated World Data Centers, Boulder CO USA World Data Center for Marine Geology and Geophysics, Boulder, CO.
The Digital Library for Earth System Science: Contributing resources and collections GCCS Internship Orientation Holly Devaul 19 June 2003.
MARINE SPATIAL DATA INFRASTRUCTURE : SOPAC GEONETWORK PACIFC ISLAND COUNTRIES GIS / RS USER CONFERENCE Keleni Raqisia.
WHOI and SIO (II): Next Steps Towards Multi-Institution Archiving of Shipboard and Deep Submergence Vehicle Data (IN51A-0306) The Woods Hole Oceanographic.
June 20, 2007ESRI Intl. User Conference Dawn Wright - Oregon State University Val Cummins - Coastal & Marine Resources Centre, IRELAND Liz O’Dea - Coastal.
Integrating Access to Digital Content Sarah Shreeves University of Illinois at Urbana-Champaign Visual Resources Association 23 rd Annual Conference Miami.
GEOSCIENCE NEEDS & CHALLENGES Dogan Seber San Diego Supercomputer Center University of California, San Diego, USA.
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
JISC/NSF PI Meeting, June Archon - A Digital Library that Federates Physics Collections with Varying Degrees of Metadata Richness Department of Computer.
OOI-CYBERINFRASTRUCTURE OOI Cyberinfrastructure Education and Public Awareness Plan Cyberinfrastructure Design Workshop October 17-19, 2007 University.
The Research Data Archive at NCAR: A System Designed to Handle Diverse Datasets Bob Dattore and Steven Worley National Center for Atmospheric Research.
The Proliferation of Metadata Standards and the Evolution of NASA’s Global Change Master Directory (GCMD) Standard for Uses in Earth Science Data Discovery.
SIOExplorer Stephen Miller Scripps Institution of Oceanography USA International Data Exchange Workshop Building a Global Data Network for Studies of Earth.
DESIGN AND DEVELOPMENT OF NOAA VIRTUAL LIBRARIES: THE INTERSECTION OF TRADITIONAL LIBRARY KNOWLEDGE AND CUTTING EDGE INFORMATION TECHNOLOGIES Dottie Anderson.
National Geophysical Data Center (NGDC) U.S. Department of Commerce National Oceanic & Atmospheric Administration National Geophysical Data Center (NGDC)
Digital Data Preservation: a schema-driven model Student: Stacy Kowalczyk Co-Authors: Clare McInerney and Phil Mitchell Digital Data Preservation – the.
The Earth Information Exchange. Portal Structure Portal Functions/Capabilities Portal Content ESIP Portal and Geospatial One-Stop ESIP Portal and NOAA.
U.S. Department of the Interior U.S. Geological Survey Manage and Provide Information: Examples from fish health, contaminants, and water quality data.
SSDB Progress Report Site Survey Panel Meeting CIRE, Sapporo, Japan July 22, 2006 John Weatherford San Diego Supercomputer Center Subcontract to IODP-MI.
The launching of an expedition has its own brand of excitement, with the sound of the main engines firing up, and the lifting of the gangway in a foreign.
Working prototype Multi-Institution Testbed for Scalable Digital Archiving Three institutions are working together to rescue at-risk media, establish interoperability,
Rolling Deck to Repository (R2R): How to Systematically Document Quality for the New Era of Data Re-Usability? AGU Poster IN21B-1048 AGU Fall Meeting December.
Scripps Institution of Oceanography R/V Alexander Agassiz November, 1907 SIO: New developments since 1903 SIOExplorer Stephen P. Miller Geological Data.
The Virtual Observatory and Ecological Informatics System (VOEIS): Using RESTful architecture and an extensible data model to provide a unique data management.
SIOExplorer: Digital Library Projects R/V Alexander Agassiz November, 1907 UCSD Libraries Scripps Institution of Oceanography San Diego Supercomputer Center.
The National Digital Stewardship Alliance: Stewardship, Collaboration, Inclusiveness, Exchange.
Accessing Data from Ship Operating Institutions R/V Alexander Agassiz November, 1907 Stephen P. Miller Geological Data Center Scripps Institution of Oceanography.
1 Using DLESE: Finding Resources to Enhance Teaching Shelley Olds Holly Devaul 11 July 2004.
WHAT DOES THE FUTURE HOLD? Ann Ellis Dec. 18, 2000
Digital library for Earth System Education Teaching Boxes
Michael Crowley1 S. Glenn1, S. Lichtenwalner1,
The Digital Library for Earth System Science
ORNL is Operated by UT-Battelle for DOE
Robert Dattore and Steven Worley
Presentation transcript:

Bridging the Gap between Libraries and Data Archives: Progress Report Roger Revelle, Gulf of California Expedition, 1939 JISC/NSF Digital Libraries Initiative All Projects Meeting June 2002, Edinburgh

Two new NSF Projects … “Bridging the Gap between Libraries and Data Archives” NSDL Collections Track “SIOExplorer: Web Exploration of Seagoing Archives” Information Technology Research (ITR) Started October 2001

Collaborative effort UCSD Libraries Scripps Institution of Oceanography San Diego Supercomputer Center Advisory Board NOAA US Naval Oceanographic Office Private Industry Other oceanographic institutions

Combine … Data 50 years of digital data Growing 200 GB per year Images 99 years of SIO Archives Documents Reports, publications, books … into one digital library

Data in the collection …

Bathymetry, magnetics, gravity Gathered from worldwide sources 795 SIO cruise legs Swath bathymetry since 1981 Approx cruise legs online at SIO

Multibeam sonar revolutionizes seafloor understanding Map a wide swath Not just a single profile –SeaBeam Classic, –16 beams –SeaBeam 2000, –121 beams –SeaBeam 2100, –151 beams –Simrad EM120, –191 beams –150 degree swath width Also backscatter –Determine bottom type –Sediment –Lava flow Realtime swath 20 km across-track

SIO Swath Mapping Expeditions 244 swath mapping cruises on vessels, since 1981 Thomas Washington Melville Revelle 600 GB multibeam holdings Adding 200 GB/year

Deliver sampling information Sample index, ,000 entries 500 types –Dredged rocks, cores –Biological trawls –Water samples –CTD Build on Seamount catalog (Amelia Earhart) Roger Revelle, MidPac, 1950

Images in the collection …

Access Voyages of Discovery Encourage inquiry “What’s this?” links from image –Data (“What”) –Instruments (“How”) –Other voyages Dual use Research and education Naga Expedition, (artist’s illustrations from logbook)

R/V Albatross departed SIO 1904 Sigsbee sounding machine

Voyages of Discovery in the Pacific La Perouse 1780’s R/V Revelle “La Perouse Expedition” –Departed June 8 R/V Melville “Cook Expedition” –Returns July 17 Special Collections, UCSD Library James Cook By Nathaniel Dance, 1776

Voyages of Discovery in the Pacific 1950’s Ed Hamilton, MidPac, 1950 Samoa, Capricorn, 1952

R/V Spencer F. Baird L to R back row: Dick Von Herzen, Roger Revelle, Willard Bascom, Ted Folsom, Alan Jones, Gustaf Arrhenius, Henri Rotschi, Robert Livingston, Russell Raitt. Seated: Dick Blumberg, Ronald Mason, Bob Dill, Art Maxwell, Winter Horton, Walter Munk, Helen Raitt Capricorn Expedition, Query for ideas and careers Not just data Track a scientist’s expeditions and publications

Documents in the collection …

Full text of publications The Challenger Expedition 30,000 scanned pages Anatomy of an Expedition Bill Menard, 1967 Nova Expedition –Link to 1998 Avon Expedition Exploring the Deep Pacific Helen Raitt, 1952 Capricorn Expedition

Cruise reports 50 years available Scan older versions Currently generate.pdf automatically Page with swath bathymetry every 6 hours

Bridging the Gap: Progress Report

The Problem Archives are search-impaired Content not a problem Material exists in great abundance Data archives Historical archives But it is hard to get Litany of woes …

Litany of archive woes Magnetic media at risk Need to migrate to new storage Local access only Some online, but sprawling directories Tapes and CDs in drawers Inconsistent naming over 30 years Home-grown software Pre-database technology Minimal documentation Formal metadata non-existent Creators now retired What to do? Shipboard archives for one recent cruise

Steps toward a Solution Seek professional help Computer scientists Advisory Board (Similar problems faced in many fields) Review the problem Seven issues from national workshop Analyze the dataflow Build a prototype Test the prototype New Zealand – Samoa Expedition

Search Metadata rarely exist Access Automated management Quality A challenge Display Interactive tools Flexibility Import, export Scalability Interoperate with large projects Stability Curation, beyond end of project Review archive problems NSF/ONR Marine Geology and Geophysics Workshop

First, create a conceptual data model Spend time to review with all participants Design a robust model Define common categories –9 basic directories –Specific subdirectories Controlled design document Map existing digital objects to categories Both documents and data Accommodate variations –Data types and names over 50 years –Valid for future developments Result “CCDS” – Canonical Cruise Data Structure Dataflow

Second, organize domain-specific content Work inside a “Staging Area” Deal with complexity –Extract from 3 archive levels –Shipboard (tape, CD) –Post-processing lab (tape) –Current online content – (not always “best”) Opportunity for data cleanup –Apply corrections –Weed out intermediate and duplicate versions –Gather information for metadata

Third, load the “CCDS” Clear transition in activities Domain specialists final approval IT team takes over Early mistake “Pushed” content from legacy data directories –Complex, vary over years –Revised to “pull” into Canonical Structure IT lesson learned Dataflow needs to be “template-driven” Template can incorporate –Rules for automatic loading –Adaptive choice among multiple alternatives Maintain flexibility as project evolves –Team members negotiate content of template

Fourth, load the data Persistent data archive management Use the “Storage Resource Broker” –San Diego Supercomputer Center product Fifth, load the metadata Harvest metadata from data files, automatically Provide tools for metadata editing Load into Oracle

Building a Collection Developer’s Toolkit

Collection Developer’s Toolkit Make it easy to build, and maintain Not just for IT experts Portable and scalable for other projects Integrate Metadata tools Data tools Interactive search and display console

Make use of existing resources Alexandria Digital Library Geospatial content OAI-compliant server Environmental data archive and delivery tools John Helly, Storage Resource Broker Domain-specific toolkits GMT, MB-System, ARC/IMS

Build metadata tools Automate Bulk harvesting from data files Bulk loading into Oracle database Use NSDL community standards Dublin Core + “ADN” metadata –Alexandria Digital Library (UCSB) –DLESE (Digital Library for Earth System Education) –NASA Controlled vocabularies –Science themes –Geographic names Embed domain-specific metadata into standards Multibeam, cruise, sampling

MOBE Metadata Object Browser and Editor Inherit metadata from –Dublin Core –Cruise Flexible –Expand for projects as needed –Generic ascii metadata interchange format “MIF” –Export to xml Java

Search interface Design for alternative approaches Geospatial –Lat, lon Temporal –“ ” Keyword –Region “Samoa” –Vessel “Melville” –Cruise “AVON02MV” –Data type“dredge” –Scientist“Staudigel” Expert-level –Research, teacher, student, public Prototype search interface

CruiseViewer Interactive browser and query interface Display tracks and samples Download library objects Java

Manage interfaces for multiple projects Both data and metadata

Lessons learned (so far)…

Make it easier to collaborate Interactions between groups Not just a technology project Diverse goals, vocabularies and audiences Interoperate Each domain has own sphere of responsibility –Don’t engineer someone else’s domain Work through interfaces –Re-negotiate as needed –Avoid long-term maintenance headaches between domains

Build tools for collaborative projects 3 “cultures” in this project Oceanographers Computer scientists Librarians Example: bridge vocabularies between separate domains Use metadata “triples,” not “pairs” Reduce phone calls by including narrative label parameter name valuenarrative label science_themesgeochemistry, marine geology, marine geophysics, hot spots, mantle plumes, geochronometry, seamount chains keywords, from controlled vocabulary of science terms, selected from the “SIOExplorer Science Theme” template

Adding new projects to SIOExplorer Make use of Collection Developer’s Toolkit NSDL server Metadata interchange Query processing SDSC –Managed storage –Web service

Test the prototype Melville departs Lyttelton harbor

Floating Digital Library Workshop R/V Melville March 7-21, New Zealand to Samoa Realtime acquisition of library objects? Load metadata into swath files –At acquisition time Specify cruise metadata Sensor documentation database Load the CCDS Learn from a common experience

A good day at 51° S Renewed appreciation for the collection of field data

Common experience Librarians Computer scientists Oceanographers Royal New Zealand Navy Melville in Lyttelton Collaboration between SIO and RNZN

Floating Digital Library Workshop Librarian at sea Computer scientist in galley Oceanographer holding onto computer

Bollons Gap survey New Zealand Law of the Sea Claim Librarian at sea Visualization of swath bathymetry, looking north

Heading for Samoa Crossing the Louisville Ridge Tonga Trench Osbourn Trough (ancient spreading center) Visualization of Global Topography, looking north

Relate cruise to SIO holdings Display search results Red –SIO multibeam Black –Other cruises Yellow –SIO dredged rock samples Also –Volcanoes –Earthquakes –Plate boundaries Typical research support product Make it available on web Select cruises for further study Export for ArcView –Related NSF/ITR project

Data Publishing Toolkit for Digital Library Interoperability: Integrating the Albatross Cruise Holdings into SIOExplorer NSF Division of Biological Infrastructure Collaboration with Smithsonian Institution Biogeography and Geology of the Oceans: SIO Collections Gateway for the NSDL NSF NSDL Collections Track Track of the Albatross, Next steps

SIOExplorer: Expedition Planner Open research data for student discovery Leverage Digital Library efforts Students design a virtual expedition –Explore relationships –Depth, Sediment thickness, Crustal age –More … –Earthquakes, volcanoes, trenches –Wind, waves, currents –Climate Students publish expedition report –On the web Teacher workshops –At the Birch Aquarium Crustal Age Sediment thickness Global Topography

SIO 100 th Anniversary September 26, 2003 SIO, R/V Alexander Agassiz, 1907