MBARI Data Management Initiatives John Graybeal Information Applications Group Lead
Established in 1987 Monterey Bay Aquarium Research Institute David and Lucile Packard Foundation
Santa Cruz Monterey Monterey Canyon MBARI MBARI Location
Monterey Ocean Observing System Suitable for deep ocean or coastal studies Suitable for deep ocean or coastal studies low power, long term moorings and benthic nodes low power, long term moorings and benthic nodes low bandwidth communication links to shore low bandwidth communication links to shore Configurable, re-deployable instruments and platforms (using ships and ROVs) Configurable, re-deployable instruments and platforms (using ships and ROVs) Smart nodes on deployed platforms Smart nodes on deployed platforms some on-board data processing some on-board data processing facilitate autonomous event detection facilitate autonomous event detection perform on-board calculations/detections perform on-board calculations/detections handle responses from shore handle responses from shore
MOOS Concept of Operations Benthic Node Mooring Autonomous Underwater Vehicle (AUV) MBARI
Data Stream Challenge (MUSE)
Data Management Challenge Large number of data sources Large number of data sources Large variety of data sources Large variety of data sources Dynamic systems Dynamic systems Data sources may appear and disappear Devices & platforms reconfigured often Interactions from shore and ship No standard data format No standard data format Data can be instrument ‘native’ New sources coming on-line all the time Streams or files, automated or manual
Example: Samples Database
Example: Video and Images 14 years, up to 300 dives/year 14 years, up to 300 dives/year video tapes, hours video tapes, hours frame grabs… => 900,000 annotations frame grabs… => 900,000 annotations How to manage this valuable repository? How to manage this valuable repository? Advanced annotation system Advanced annotation system Detailed knowledge base of concepts Detailed knowledge base of concepts Easy-to-use querying tool Easy-to-use querying tool
Video Annotation and Reference System (VARS)
Notes About SSDS: The Shore Side Data System A MOOS Development Project A MOOS Development Project Goals: low cost, flexible, expandable, reliable Goals: low cost, flexible, expandable, reliable Future systems beyond MOOS (e.g., MARS) Future systems beyond MOOS (e.g., MARS) Now in 3rd year, deploying initial elements Now in 3rd year, deploying initial elements Key Tenets of SSDS Development Key Tenets of SSDS Development Iterative development—improve it as we go Iterative development—improve it as we go Test with real data—new and archival Test with real data—new and archival Build for change—use modular interfaces Build for change—use modular interfaces
Shore Side Data System: Requirements Overview Ingest data in any described format and save it Ingest data in any described format and save it Capture, publish data descriptions (metadata) Capture, publish data descriptions (metadata) Provide standards-based access to data Provide standards-based access to data Raw data, and other common digital formats Raw data, and other common digital formats APIs for common visualization and analysis tools APIs for common visualization and analysis tools User-oriented web interfaces, quick-look plots User-oriented web interfaces, quick-look plots Merge data (different sources & time intervals) Merge data (different sources & time intervals) Support data visualization & quality control Support data visualization & quality control Provide data access security as needed Provide data access security as needed
Shore Side Data System: User Requirements Raw data via device ID pages? (sort of limited) Raw data via device ID pages? (sort of limited) Standard plots like OASIS quality controlled ones? Standard plots like OASIS quality controlled ones? Access data from applications via a DODS URLs? Access data from applications via a DODS URLs? Matlab, Ingrid, Live Access Server, Excel, IDV, Ferret Matlab, Ingrid, Live Access Server, Excel, IDV, Ferret And hopefully, Ocean Data View And hopefully, Ocean Data View Access data via returned data files (e.g., ASCII CSV w/headers) opened within desktop applications? Access data via returned data files (e.g., ASCII CSV w/headers) opened within desktop applications? Excel, ArcView, Ocean Data View Excel, ArcView, Ocean Data View Delivery of data directly into an application? Delivery of data directly into an application? Ability to subset data, for example by time window? Ability to subset data, for example by time window? Ability to merge data from different data sets? Ability to merge data from different data sets?
Data Management at MBARI: SSDS Efforts Infrastructure/model development Infrastructure/model development Ontologies Ontologies Metadata schema Metadata schema Metadata entry/correction/annotation Metadata entry/correction/annotation User interfaces User interfaces Data processing Data processing Visualizations Visualizations Federated access to MBARI data/metadata Federated access to MBARI data/metadata
More MBARI SSDS Tasks Legacy data migration Legacy data migration OASIS, expd etc., Samples, Waypoints, ? OASIS, expd etc., Samples, Waypoints, ? New data sources New data sources MTM II, AUV Sonar, CIMT, … MTM II, AUV Sonar, CIMT, … Outreach (integrating non-SSDS projects) Outreach (integrating non-SSDS projects) Documentation Documentation NEPTUNE NEPTUNE Education Education Operational support Operational support
MOOS/SSDS Architecture (shows data flow) MOOS/SSDS Architecture (shows data flow) Devices Deployed Platform Shore Side Data System User Applications (User Tools) Data Tracking Communications Applications/ Interfaces Archiving Data Presentation Data line 1 more data last data OceanSideShoreSide Portal
SSDS Elements Applications Data Presentation Data line 1 more data last data Ingest Archiving Arriving Data Data Tracker Data Catalog External Data Stores Data For Analysis Web I/F < Requests Data > Shared Descriptions < Requests Data > Metadata (Re)Processed and New Data Sets Automated Data Flow Internal Interfaces On-Demand Interactions
Example SIAM to SSDS Data Flow Portal Mooring SSDS
Example SIAM to SSDS Data Flow A device is connected to a platform, such as a Mooring. Portal Device Mooring SSDS
<RecordVariable name="time" columnIndex="1" format="double" longName="Time(GMT)" units="milliseconds since Jan 01, 1970"/> Example SIAM to SSDS Data Flow The mooring retrieves the metadata from the device. Portal Device Mooring SSDS
Metadata Packet Example SIAM to SSDS Data Flow The metadata is packaged and sent to a portal on shore before any data is sent to shore. Portal DeviceMooring SSDS
Example SIAM to SSDS Data Flow Metadata Packet The portal forwards the metadata to SSDS. Portal DeviceMooring SSDS
DB Example SIAM to SSDS Data Flow SSDS stores the metadata in a database. This allows applications to query for and use data. Portal DeviceMooring SSDS
DB Example SIAM to SSDS Data Flow SSDSPortal DeviceMooring
DB 34,56.234,0.0023,... Example SIAM to SSDS Data Flow The device produces a data record. Portal Device Mooring SSDS
DB Data Packet 34,56.234,0.0023,... Example SIAM to SSDS Data Flow The data is packaged and sent to SSDS. Portal DeviceMooring SSDS
DB VersionID, DeviceID, MetadataID, RecordType, PlatformID, SystemTime, SequenceNumber, DataBuffer(34,56.234,0.0023,…) Serialized Example SIAM to SSDS Data Flow SSDS uses information in the packet to sort and store the data in a ‘raw’ format. Portal DeviceMooring SSDS
DB netCDF Example SIAM to SSDS Data Flow Serialized VersionID, DeviceID, MetadataID, RecordType, PlatformID, SystemTime, SequenceNumber, DataBuffer(34,56.234,0.0023,…) The ‘raw’ data is parsed and stored as netCDF for easier access. Portal DeviceMooring SSDS
Software applications allow users to discover and obtain data in formats useful to the typical MBARI user. (netCDF, text, etc.) DB netCDF Example SIAM to SSDS Data Flow Serialized netcdf parosci { dimensions: time = UNLIMITED ; // (17761 currently) variables: double time(time) ; time:long_name = "Time (GMT)" ; time:units = "seconds since :00:00" ; double depth(time) ; depth:long_name = "depth" ; depth:units = "UNKNOWN" ; // global attributes: :title = "AUV data" ; :created = " T23:34:58Z" ; :history0 = ": Deployment information for parosci.log" ; :deploymentName = " " ; :instrumentId = "3699" ; } Portal DeviceMooring SSDS MBARI Software
DB netCDF Example SIAM to SSDS Data Flow Serialized Software applications also provide simple visual representations of data Portal DeviceMooring SSDS MBARI Software
DB netCDF Example SIAM to SSDS Data Flow Serialized Provide internet access Portal DeviceMooring SSDS MBARI Software Web Pages
Portal DeviceMooring DB SSDS netCDF Existing netCDF Software Example SIAM to SSDS Data Flow Serialized MBARI Software Web Pages Save development time by using existing software applications
SSDS Data Mgt Sequence
AUV Data Sequence Diagram
Metadata Approach (Credit: Dan Davis) XML suitable for MOOS metadata XML suitable for MOOS metadata Enables use of many other tools/software Enables use of many other tools/software But, it looks a little bit user-unfriendly But, it looks a little bit user-unfriendly Use XML-driven GUI technology to create forms to create and display metadata Use XML-driven GUI technology to create forms to create and display metadata Users don’t have to directly read XML Users don’t have to directly read XML It’s there and easy to access if they want it It’s there and easy to access if they want it Bind XML metadata to each device through its puck Bind XML metadata to each device through its puck
Sensor Puckto host computer serial interface During pre-deployment instrument configuration, and test, sensor driver and associated metadata is stored in compact flash memory in puck During pre-deployment instrument configuration, and test, sensor driver and associated metadata is stored in compact flash memory in puck Metadata stored in puck interface
Metadata Schema Design
Metadata User Form Design User interface designer uses schema to build a form for creation, display, access, of metadata instances User interface designer uses schema to build a form for creation, display, access, of metadata instances There may be different forms for different users (e.g. scientific, system, and operational) to create, and display metadata of interest There may be different forms for different users (e.g. scientific, system, and operational) to create, and display metadata of interest
Metadata Form Design
Instrument Configuration Metadata forms are used during device configuration to create metadata that is entered into device puck Metadata forms are used during device configuration to create metadata that is entered into device puck Similarly metadata forms are used during configuration of other system elements, such as platforms, and even communication links. This metadata is maintained in system nodes. Similarly metadata forms are used during configuration of other system elements, such as platforms, and even communication links. This metadata is maintained in system nodes.
Metadata Form Layout
SSDS— Metadata (Object View)
The data source. SSDS tracks: Software or hardware source Unique identifier Manufacturer information References to documentation SSDS— Metadata (Device)
SSDS— Metadata (Deployment) ‘Deployment’ information. SSDS tracks: Where the data was collected. When it was collected. What other data was used. Relation to other deployments
References to the data. SSDS tracks: The data storage location. How to access this data. The deployment that produced this data. SSDS— Metadata (DataContainer)
Format and contents of a DataContainer. SSDS tracks: The contents of a data set. The data format (to allow parsing by software). Descriptive info like units, scale, … SSDS— Metadata (Records)
Metadata and Access: Catalogs and Repositories View From the Shore View From the Shore Many data registries and models Many data registries and models GDC, OBIS, EarthRef, NVODS, … GDC, OBIS, EarthRef, NVODS, … Many standards Many standards Communications protocols: SOAP, OPeNDAP, OBIS, … Communications protocols: SOAP, OPeNDAP, OBIS, … Metadata formats (MIF, XML, NGDC, NetCDF…) Metadata formats (MIF, XML, NGDC, NetCDF…) Metadata ontologies and efforts Metadata ontologies and efforts NGDC, MarineXML, ESRI, Metadata Wranglers NGDC, MarineXML, ESRI, Metadata Wranglers Conclusion: Watch, Learn, Try (Iterate) Conclusion: Watch, Learn, Try (Iterate)
SSDS Data Access Desktop Application: HOOVES Desktop Application: HOOVES Data File Service Data File Service Quick Look Quick Look Metadata Access (and Validation) Metadata Access (and Validation) Metadata Editing Metadata Editing Networked API: Servlet / JSP Pages Networked API: Servlet / JSP Pages Application API (NetCDF): OPeNDAP Application API (NetCDF): OPeNDAP Web Access (NetCDF): Live Access Server Web Access (NetCDF): Live Access Server Archived Files: Direct Access (?) Archived Files: Direct Access (?)
HOOVES Help
HOOVES Mission View
HOOVES Mission Outputs View
HOOVES Mission Resources: Overview
HOOVES Mission Resources: Vehicle
HOOVES Instrument View
SSDS Schedule
Prime Areas for Collaboration Infrastructure/model development Infrastructure/model development Ontologies Ontologies Metadata schema Metadata schema Metadata entry/correction/annotation Metadata entry/correction/annotation User interfaces User interfaces Data processing Data processing Visualizations Visualizations Federated access to data/metadata Federated access to data/metadata Documentation Documentation
IAG Team Kevin Gomes Kevin Gomes John Graybeal John Graybeal Mike McCann Mike McCann Brian Schlining Brian Schlining Rich Schramm Rich Schramm And, a Mystery Guest (To Be Determined) And, a Mystery Guest (To Be Determined) Science Representative to SSDS John Ryan John Ryan