Presentation is loading. Please wait.

Presentation is loading. Please wait.

Accessing Data from Ship Operating Institutions R/V Alexander Agassiz November, 1907 Stephen P. Miller Geological Data Center Scripps Institution of Oceanography.

Similar presentations


Presentation on theme: "Accessing Data from Ship Operating Institutions R/V Alexander Agassiz November, 1907 Stephen P. Miller Geological Data Center Scripps Institution of Oceanography."— Presentation transcript:

1 Accessing Data from Ship Operating Institutions R/V Alexander Agassiz November, 1907 Stephen P. Miller Geological Data Center Scripps Institution of Oceanography http://SIOExplorer.ucsd.edu

2 Ship Operating Institutions What do we archive? Example –Geological Data Center, SIO Critical problems Okay in 2 out of 6 areas Resources MG&G (workshop) Other communities

3 Ship Operating Institutions Proposed next steps “SIOExplorer” web access “Bridging the Gap …” Full digital library implementation EarthDBforum Listserver, similar to gmthelp Workshop Follow up activities

4 Geological Data Center (GDC) 440 GB digital holdings 646 SIO cruise legs, 1952- 2400 other cruise legs Bathymetry, magnetics, gravity Multibeam, 1981- 238 SeaBeam cruise legs Currently adding 100 GB/year

5 SIO SeaBeam Expeditions, 1981- Thomas Washington Melville Revelle

6 GDC Archives … Sample index, 1968- 100,000 entries 500 types (all disciplines) Expedition reports, 1971- Starting point for database search –Personnel –Track chart –Profile plots –Sampling –(Scan older reports)

7 SIO Analog Archives All digitally indexed SIO Archives in Library, 1903- Log books, track charts, 1952-

8 GDC Archives Echosounding records, 1950- Includes all DSDP Legs 1-96 Seismic records, 1963- Microfilm Copies of almost all paper records Flow camera for copying long rolls

9 “State-of-the-Art” always changing Instrumentation Navigation Media and format changes Database technology Web access demands Sigsbee Sounding Machine R/V Albatross, March 1904

10 Older data may be valuable R/V E. W. Scripps Monitor changes over time Merge with modern data Resolve navigation Gulf of California Expedition, 1939

11 R/V Thomas Washington, 1965-1992 SeaBeam “Classic” 16 beams 1981-92

12 R/V Melville, 1969- SeaBeam 2000 121 beams 1992- Sample survey Alarcon Spreading Ridge Entrance Gulf of California Christina Massell Peter Lonsdale

13 R/V Roger Revelle, 1996- SeaBeam 2100 151 beams 1996-2000 Simrad EM120 191 beams 2001-

14 Critical Problem Areas 1.Search 2.Access* 3.Quality 4.Display and analysis tools* 5.Flexibility 6.Long-term stability _____ * (Doing well in 2 out of 6 areas)

15 Problem 1 - Search Find what you’re looking for Data Images Documents Discover things you didn’t know about Like browsing in library stacks Distributed archives Not just your own institution

16 Problem 1 - Search … Geospatial, temporal, keyword Expert-level for outreach Create and use metadata Define our domain-specific content Embed into library standards Need to streamline metadata creation Semi-automatic tools Need exchange standards Between institutions

17 Problem 2 - Access* Good support by Internet and storage advances Web download Needed improvements Storage Resource Broker Streamline distributed archive management Access full archives Automate permission Allow owner to grant permission to specific user

18 Problem 3 - Quality “End-to-end” scrutiny Shipboard acquisition Processing steps Archival collections Multibeam Beampoint editing Sound velocity Navigation Time Pitch, roll, yaw bias

19 Problem 4 - Tools* Display and analysis GMT, MB-System, ArcView Define others in workshop Existing Needed

20 Problem 5 - Flexibility Design for interoperability Support wide range Project sizes (scalability) –Individual PI –Collaborative Data types Display and analysis tools –Commercial and public Computers –Unix, PC, Mac Users Institutions

21 Problem 6 - Stability Long-term planning Beyond end of project Beyond individual career Harvest legacy data Curatorial standards and documentation Provide stable funding mechanism Modest, but reliable

22 Our Resources MG&G Post-workshop listserver, website Help from other communities Space, atmosphere and oceans Digital Library community DLESE, OAI, CDL, Alexandria Computer science Storage Resource Broker

23 Next Steps - Example 1 SIOExplorer: Web Exploration of Seagoing Archives NSF Information Technology proposal SIO and San Diego Supercomputer Center http://SIOExplorer.ucsd.edu

24 SIOExplorer … First, review community practices Academia, Industry, Government Use modern geospatial database tools Data-Intensive Computing Environments National Partnership for Advanced Computational Infrastructure

25 SIOExplorer … Oceanographic institutions face a major challenge dealing with the rapid growth of data sets output by the latest shipboard instruments and the potential disappearance of older data sets that do not already reside in a secure database environment. Poor access to data collected in the past can lead to needless and expensive duplication of data acquisition efforts. Many older data sets were collected in remote areas that are seldom visited and loss of these data could lead to permanent holes in data coverage. These issues were highlighted in the FUMAGES report summarizing the 1996 conference on the Future of Marine Geology and Geophysics (MG&G): “The rapidly growing rate of data and sample acquisition requires more effective and standardized data management and publication. We risk losing vast amounts of data in the files and hard drives of individual investigators. Databases, sample archives and standardized data management are necessary complements to publication of research papers, and are likely to be of even longer-lasting value.” We intend to open up the shipboard archives of the Scripps Institution of Oceanography (SIO) to web access, with modern geospatial database tools. This will be a multi-step process: Assess community needs and resources Define and create a prototype system using the latest concepts in information technology and metadata management Test and refine the prototype system based on community review Populate system with all SIO digital holdings and new acquisitions Maintain the data system for more than 10 years.

26 SIOExplorer … Year 1 Development of prototype system. We will plan and begin construction of a prototype system. The design will be based on our existing gmtplus system, adding a metadata catalog and allowing web search and access for underway geophysical data across distributed archives. In addition, we will expand the metadata catalog for our rapidly growing archive of multibeam bathymetry data. This system will be used to provide outside access to SIO holdings as well as to track the levels of in-house editing and processing. We will also take advantage of recent image compression developments to make multibeam map imagery more accessible. A web interface to the 1600 GB holdings of the SIO Seismic Reflection Archive will be built for data retrieval and display. Year 2 Testing, outreach, and expansion. The prototype system for distribution of underway, multibeam, imagery and seismic holdings will be tested with key users in the academic community. The results will be assimilated, interfaces revised, and plans made for future developments. We expect to add the capability to combine data from diverse sources at various resolutions. In addition, a public outreach portal will be installed at the Birch Aquarium at Scripps, which sees about 350,000 visitors a year. A hands-on review workshop will be held at SIO. Year 3 Build-out the databases and add functionality. Following the recommendations of the review workshop, we will install all of the SIO digital holdings in the system and add additional functionality, as appropriate. Possibilities include an interface to outside holdings, the addition of new data types, the interactive display of multiple layers, and capabilities to monitor the time dependence of geospatial data.

27 SIOExplorer … Extract across distributed archives Raw Data Underway geophysical data Multibeam (440 GB) Seismic data (2000 GB) Derived data Grids, map images, profiles Documents Processing notes Reports and publications Participation by other institutions Academia, Industry, Government

28 SIOExplorer … Facilitate data synthesis projects Combine bathymetry from multiple sources Overlay other data types Create prototype web delivery system Deliver compressed images across the web MrSID, ECW, jpeg2000 Use IGPP Digital Library for storage Robotic access to 20 TB

29 SIOExplorer … “gmtplus” interface Underway data 3000 cruises Gravity Magnetics Togography Search engine Specify lat,lon area Select type of data Extend Global Topography efforts http://topex.ucsd.edu/cgi-bin/cruise_data.cgi

30 SIOExplorer … Access distributed archives Wessel and Smith Lamont SIO NGDC Allow selection of specific cruises

31 SIOExplorer … PI’s Steve Miller, Steve Cande, Paul Henkart, Dave Sandwell (SIO) Ilya Zaslavsky, SDSC Cheryl Peach, Birch Aquarium Advisory Committee Mike Carron, NAVO Jim Case, SAIC Art Green, ExxonMobil Cathy Manduca, Carleton College, DLESE Brice Pryszo, MaxSea Bob Spiegelthal, GlobalPhoton Khalid Soofi, Conoco

32 Next Steps - Example 2 SIO Digital Library “Bridging the Gap between Libraries and Research Archives” Proposal to NSF National Digital Library Program http://earthref.org/PACER/btg.html

33 SIO Digital Library … Combine the resources of 3 institutions UCSD Library Scripps Institution of Oceanography San Diego Supercomputer Center

34 SIO Digital Library … Libraries generally do an excellent job of distributing information to the public. Scientific data archives generally focus the interests of a narrow group of expert users. We propose to bridge the gap by opening archives to library search, with a team of UCSD researchers from the Library, the Scripps Institution of Oceanography, and the San Diego Super Computer Center (SDSC). Many scientific archives have evolved over the decades in an incremental fashion, with ad hoc solutions added for each new type of data. The retirement of key personnel and the cost of maintaining custom software can leave important archives in jeopardy. We propose to take advantage of recent developments in database and library technology to create a model to rescue scientific archives, place them under cost-effective modern database control, and make them available to a wider set of users. As a prototype, we intend to apply the model to the archives of the Geological Data Center at the Scripps Institution of Oceanography, which contain nearly fifty years and 440 GB of digital cruise data.

35 SIO Digital Library … Metadata will be the key to bridging the gap. We will embed the domain-specific metadata catalog into a more general catalog, conforming to the digital library standards of DLESE and OAI. Access tools will allow a combination of geospatial, temporal, and expert-level searching. For example, K-12 teachers and students will be able to select appropriate levels for their inquiries, so that they can discover essential features without being overwhelmed by detailed information requiring special tools and understanding. Long-term sustainability will be based on the SDSC Storage Resource Broker toolkit, which will manage the migration of the distributed archives through various media, such as CD, DAT, Exabyte, DLT, and various operating systems, and also maintain backups and monitor proprietary holds and copyright permission. The prototype digital library will contain more than shipboard data. We will establish a searchable infrastructure to facilitate a wide range of the tasks needed to write proposals, conduct research, or create lesson plans. Links will be made for literature searches, including the “gray literature” of technical reports and Master’s theses. As an integral part of the geospatial search engine, links will be provided to other archives, such as the Ridge PETDB Petrological Database of the Ocean Floor.

36 SIO Digital Library … Modern database search engine Geospatial Lat, lon Temporal “1995-2000” Keyword Region“East Pacific Rise” Vessel“Melville” Cruise“COOK01MV” Expert-level Research, teacher, student, public

37 SIO Digital Library … Single search approach for all information Apply equally to Digital data Scanned documents Publications Citations or full text –Depending on user’s institutional copyright permission

38 Keep track of publications R/V Spencer F. Baird L to R back row: Dick Von Herzen, Roger Revelle, Willard Bascom, Ted Folsom, Alan Jones, Gustaf Arrhenius, Henri Rotschi, Robert Livingston, Russell Raitt. Seated: Dick Blumberg, Ronald Mason, Bob Dill, Art Maxwell, Winter Horton, Walter Munk, Helen Raitt Capricorn Expedition, 1952-53 Monitor ideas and careers Not just raw data

39 SIO Digital Library … Sampling Build on EarthRef.org capabilities –Seamount catalog –Embed multibeam into Global Topography –Analysis tools

40 SIO Digital Library … Build a modern metadata catalog Key to successful searching Take advantage of community standards and tools OAI– Open Archive Initiative DLESE– Digital Library for Earth System Education ADEPT– Alexandria Digital Earth Prototype Make it easy to maintain Automatic tools to harvest metadata from data file Embed domain-specific metadata into standard catalog

41 SIO Digital Library … Example: auto-harvest multibeam metadata mbinfo creates ascii file Format, lat, lon, depth, time bounds, quality Quality, processing pedigree New script “mbmcat” Extract metadata from mbinfo Load metadata catalog (Oracle database) –Use appropriate DLESE community keywords –Embed multibeam-specific keys Now ready for search Geospatial/temporal/keyword

42 SIO Digital Library … Track evolving data, seamlessly From acquisition to publication Corrections, editing Maintain relationships with other evolving data Various media DAT, EXB, CD, local file, robotic storage Various locations Distributed archives Monitor backup status Synchronize mirrored sites

43 SIO Digital Library … Take advantage of the “Storage Resource Broker” Facilitate archive operations Simple enough for small projects Designed for peta-byte scale (1000 TB) databases Easily manages hundreds of millions of files Metadata catalog Manage distributed sites Automatically extract files from tape Robotic DLT storage, HPSS and other systems Hold on disk farm while active Transfer to user’s computer www.npaci.edu/online/v5.4/srb118.html

44 Storage Resource Broker

45 SDSC Storage Resource Broker Tames Flood of Online Information The size and online availability of data sets is increasing exponentially in the Information Age, across all sectors of society. With the promise of limitless information at researchers’ fingertips, however, there are new challenges, and this flood of information is threatening to overwhelm existing methods of accessing, organizing, and archiving such data. One problem researchers face in handling scientific information as it travels from data sensors to computer applications to mass storage systems and digital libraries is that data sets must be moved and reorganized numerous times. Thus, researchers need, during this life cycle, an infrastructure tool capable of managing large data sets in such uses as massive data analysis systems, scientific data publication systems, and persistent digital archives. SDSC’s Data-Intensive Computing Environments (DICE) group has developed the SDSC Storage Resource Broker (SRB), a software tool for just this purpose. Initially released some three years ago, the SDSC SRB has proven highly useful, and there are now more than 200 registered users at more than 50 sites. "Digital data collections are becoming indispensable to the advance of science," said Reagan Moore, associate director for Data-Intensive Computing at SDSC and adjunct professor in UCSD’s Computer Science and Engineering Department. "SDSC’s DICE group is working to create a scalable data-handling and information discovery environment." This environment integrates distributed persistent digital archives, data grids for remote data access, and digital library services. The core components include the SDSC SRB and MCAT metadata catalog developed at SDSC and the Mediation of Information using XML (MIX) tools developed in collaboration with the UCSD Database Lab. These systems are finding applications in many data-intensive scientific disciplines and national-scale data centers. For further details, see http://www.npaci.edu/online/v5.4/srb118.html

46 SIO Digital Library … Create custom maps on demand Select lat, lon Select global database: –Global topography –Sediment thickness –Crustal age Overlay earthquakes, volcanoes, etc Create multibeam maps on demand? MB-System running on SDSC computers 400 GB swath data storage

47 SIO Digital Library … PI Brian Schottlaender, University Librarian, UCSD Co-PI’s Steve Miller, Hubert Staudigel, Catherine Johnson (SIO) John Helly, SDSC Advisory Board Bob Ballard, IFE, Mystic Aquarium Dave Caress, MBARI Molly Hoffman, The Children’s School Cathy Manduca, Carleton College, DLESE Dave Sandwell, SIO George Sharman, NGDC Terry Smith, Alexandria Digital Library, UCSB Lisa Tauxe, SIO

48 SIO 100 th Anniversary September 26, 2003 SIO, 1909 SIO, 1926 http://SIOExplorer.ucsd.edu


Download ppt "Accessing Data from Ship Operating Institutions R/V Alexander Agassiz November, 1907 Stephen P. Miller Geological Data Center Scripps Institution of Oceanography."

Similar presentations


Ads by Google