Download presentation
Presentation is loading. Please wait.
Published byAugusta McKinney Modified over 9 years ago
1
Bridging the Gap between Libraries and Data Archives: Progress Report Roger Revelle, Gulf of California Expedition, 1939 JISC/NSF Digital Libraries Initiative All Projects Meeting 24-25 June 2002, Edinburgh
2
Two new NSF Projects … “Bridging the Gap between Libraries and Data Archives” NSDL Collections Track “SIOExplorer: Web Exploration of Seagoing Archives” Information Technology Research (ITR) Started October 2001
3
Collaborative effort UCSD Libraries Scripps Institution of Oceanography San Diego Supercomputer Center Advisory Board NOAA US Naval Oceanographic Office Private Industry Other oceanographic institutions
4
Combine … Data 50 years of digital data Growing 200 GB per year Images 99 years of SIO Archives Documents Reports, publications, books … into one digital library
5
Data in the collection …
6
Bathymetry, magnetics, gravity Gathered from worldwide sources 795 SIO cruise legs Swath bathymetry since 1981 Approx. 3000 cruise legs online at SIO
7
Multibeam sonar revolutionizes seafloor understanding Map a wide swath Not just a single profile –SeaBeam Classic, 1981-1992 –16 beams –SeaBeam 2000, 1992- –121 beams –SeaBeam 2100, 1996-2000 –151 beams –Simrad EM120, 2001- –191 beams –150 degree swath width Also backscatter –Determine bottom type –Sediment –Lava flow Realtime swath 20 km across-track
8
SIO Swath Mapping Expeditions 244 swath mapping cruises on vessels, since 1981 Thomas Washington Melville Revelle 600 GB multibeam holdings Adding 200 GB/year
9
Deliver sampling information Sample index, 1968- 100,000 entries 500 types –Dredged rocks, cores –Biological trawls –Water samples –CTD Build on www.EarthRef.org Seamount catalog (Amelia Earhart) Roger Revelle, MidPac, 1950
10
Images in the collection …
11
Access Voyages of Discovery Encourage inquiry “What’s this?” links from image –Data (“What”) –Instruments (“How”) –Other voyages Dual use Research and education Naga Expedition, 1959-61 (artist’s illustrations from logbook)
12
R/V Albatross departed SIO 1904 Sigsbee sounding machine
13
Voyages of Discovery in the Pacific La Perouse 1780’s R/V Revelle “La Perouse Expedition” –Departed June 8 R/V Melville “Cook Expedition” –Returns July 17 Special Collections, UCSD Library James Cook By Nathaniel Dance, 1776
14
Voyages of Discovery in the Pacific 1950’s Ed Hamilton, MidPac, 1950 Samoa, Capricorn, 1952
15
R/V Spencer F. Baird L to R back row: Dick Von Herzen, Roger Revelle, Willard Bascom, Ted Folsom, Alan Jones, Gustaf Arrhenius, Henri Rotschi, Robert Livingston, Russell Raitt. Seated: Dick Blumberg, Ronald Mason, Bob Dill, Art Maxwell, Winter Horton, Walter Munk, Helen Raitt Capricorn Expedition, 1952-53 Query for ideas and careers Not just data Track a scientist’s expeditions and publications
16
Documents in the collection …
17
Full text of publications The Challenger Expedition 30,000 scanned pages Anatomy of an Expedition Bill Menard, 1967 Nova Expedition –Link to 1998 Avon Expedition Exploring the Deep Pacific Helen Raitt, 1952 Capricorn Expedition
18
Cruise reports 50 years available Scan older versions Currently generate.pdf automatically Page with swath bathymetry every 6 hours
19
Bridging the Gap: Progress Report
20
The Problem Archives are search-impaired Content not a problem Material exists in great abundance Data archives Historical archives But it is hard to get Litany of woes …
21
Litany of archive woes Magnetic media at risk Need to migrate to new storage Local access only Some online, but sprawling directories Tapes and CDs in drawers Inconsistent naming over 30 years Home-grown software Pre-database technology Minimal documentation Formal metadata non-existent Creators now retired What to do? Shipboard archives for one recent cruise
22
Steps toward a Solution Seek professional help Computer scientists Advisory Board (Similar problems faced in many fields) Review the problem Seven issues from national workshop Analyze the dataflow Build a prototype Test the prototype New Zealand – Samoa Expedition
23
Search Metadata rarely exist Access Automated management Quality A challenge Display Interactive tools Flexibility Import, export Scalability Interoperate with large projects Stability Curation, beyond end of project Review archive problems NSF/ONR Marine Geology and Geophysics Workshop
24
First, create a conceptual data model Spend time to review with all participants Design a robust model Define common categories –9 basic directories –Specific subdirectories Controlled design document Map existing digital objects to categories Both documents and data Accommodate variations –Data types and names over 50 years –Valid for future developments Result “CCDS” – Canonical Cruise Data Structure Dataflow
25
Second, organize domain-specific content Work inside a “Staging Area” Deal with complexity –Extract from 3 archive levels –Shipboard (tape, CD) –Post-processing lab (tape) –Current online content – (not always “best”) Opportunity for data cleanup –Apply corrections –Weed out intermediate and duplicate versions –Gather information for metadata
26
Third, load the “CCDS” Clear transition in activities Domain specialists final approval IT team takes over Early mistake “Pushed” content from legacy data directories –Complex, vary over years –Revised to “pull” into Canonical Structure IT lesson learned Dataflow needs to be “template-driven” Template can incorporate –Rules for automatic loading –Adaptive choice among multiple alternatives Maintain flexibility as project evolves –Team members negotiate content of template
27
Fourth, load the data Persistent data archive management Use the “Storage Resource Broker” –San Diego Supercomputer Center product Fifth, load the metadata Harvest metadata from data files, automatically Provide tools for metadata editing Load into Oracle
28
Building a Collection Developer’s Toolkit
29
Collection Developer’s Toolkit Make it easy to build, and maintain Not just for IT experts Portable and scalable for other projects Integrate Metadata tools Data tools Interactive search and display console
30
Make use of existing resources Alexandria Digital Library Geospatial content OAI-compliant server Environmental data archive and delivery tools John Helly, http://ceed.sdsc.edu/ Storage Resource Broker http://www.npaci.edu/DICE/SRB/index.html/ Domain-specific toolkits GMT, MB-System, ARC/IMS
31
Build metadata tools Automate Bulk harvesting from data files Bulk loading into Oracle database Use NSDL community standards Dublin Core + “ADN” metadata –Alexandria Digital Library (UCSB) –DLESE (Digital Library for Earth System Education) –NASA Controlled vocabularies –Science themes –Geographic names Embed domain-specific metadata into standards Multibeam, cruise, sampling
32
MOBE Metadata Object Browser and Editor Inherit metadata from –Dublin Core –Cruise Flexible –Expand for projects as needed –Generic ascii metadata interchange format “MIF” –Export to xml Java
33
Search interface Design for alternative approaches Geospatial –Lat, lon Temporal –“1995-2000” Keyword –Region “Samoa” –Vessel “Melville” –Cruise “AVON02MV” –Data type“dredge” –Scientist“Staudigel” Expert-level –Research, teacher, student, public Prototype search interface
34
CruiseViewer Interactive browser and query interface Display tracks and samples Download library objects Java
35
Manage interfaces for multiple projects Both data and metadata
36
Lessons learned (so far)…
37
Make it easier to collaborate Interactions between groups Not just a technology project Diverse goals, vocabularies and audiences Interoperate Each domain has own sphere of responsibility –Don’t engineer someone else’s domain Work through interfaces –Re-negotiate as needed –Avoid long-term maintenance headaches between domains
38
Build tools for collaborative projects 3 “cultures” in this project Oceanographers Computer scientists Librarians Example: bridge vocabularies between separate domains Use metadata “triples,” not “pairs” Reduce phone calls by including narrative label parameter name valuenarrative label science_themesgeochemistry, marine geology, marine geophysics, hot spots, mantle plumes, geochronometry, seamount chains keywords, from controlled vocabulary of science terms, selected from the “SIOExplorer Science Theme” template
39
Adding new projects to SIOExplorer Make use of Collection Developer’s Toolkit NSDL server Metadata interchange Query processing SDSC –Managed storage –Web service
40
Test the prototype Melville departs Lyttelton harbor
41
Floating Digital Library Workshop R/V Melville March 7-21, New Zealand to Samoa Realtime acquisition of library objects? Load metadata into swath files –At acquisition time Specify cruise metadata Sensor documentation database Load the CCDS Learn from a common experience
42
A good day at 51° S Renewed appreciation for the collection of field data
43
Common experience Librarians Computer scientists Oceanographers Royal New Zealand Navy Melville in Lyttelton Collaboration between SIO and RNZN
44
Floating Digital Library Workshop Librarian at sea Computer scientist in galley Oceanographer holding onto computer
45
Bollons Gap survey New Zealand Law of the Sea Claim Librarian at sea Visualization of swath bathymetry, looking north
46
Heading for Samoa Crossing the Louisville Ridge Tonga Trench Osbourn Trough (ancient spreading center) Visualization of Global Topography, looking north
47
Relate cruise to SIO holdings Display search results Red –SIO multibeam Black –Other cruises Yellow –SIO dredged rock samples Also –Volcanoes –Earthquakes –Plate boundaries Typical research support product Make it available on web Select cruises for further study Export for ArcView –Related NSF/ITR project
48
Data Publishing Toolkit for Digital Library Interoperability: Integrating the Albatross Cruise Holdings into SIOExplorer NSF Division of Biological Infrastructure Collaboration with Smithsonian Institution Biogeography and Geology of the Oceans: SIO Collections Gateway for the NSDL NSF NSDL Collections Track Track of the Albatross, 1884-1921 Next steps
49
SIOExplorer: Expedition Planner Open research data for student discovery Leverage Digital Library efforts Students design a virtual expedition –Explore relationships –Depth, Sediment thickness, Crustal age –More … –Earthquakes, volcanoes, trenches –Wind, waves, currents –Climate Students publish expedition report –On the web Teacher workshops –At the Birch Aquarium Crustal Age Sediment thickness Global Topography
50
SIO 100 th Anniversary September 26, 2003 SIO, 1909 http://SIOExplorer.ucsd.edu R/V Alexander Agassiz, 1907
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.