Oceanographic Data Provenance Tracking with the Shore Side Data System Mike McCann, Kevin Gomes International Provenance and Annotation Workshop June 18, 2008
Outline Motivation Motivation Monterey Bay Aquarium Research Institute (MBARI) Projects: Monterey Bay Aquarium Research Institute (MBARI) Projects: –Monterey Ocean Observing System (MOOS) –Shore-Side Data System (SSDS) Data Model Data Model Application framework Application framework Operational details Operational details –Instrument configuration –Data processing software
MUSE data Diversity of platforms & sensors Diversity of platforms & sensors Post-experiment organization Post-experiment organization Document for later use => FGDC Document for later use => FGDC Motivation for a better design Motivation for a better design
Identifying Requirements Configuring Instruments Configuring Instruments –Many different instrument –Many different manufacturers –Varied hardware, communication and metadata interfaces –However, all must interact with infrastructure Instrument Have Lifecycles Instrument Have Lifecycles –Changed for normal maintenance, cleaning, failure –Can also be change configuration depending on science goal/experiment –In-situ re-configuration –Instrument-Infrastructure relationship must be kept intact and, in fact, tracked.
Identifying Requirements Metadata Must Tie To Data Metadata Must Tie To Data –Huge variation in data formats that users must handle –Traditionally added on after-the-fact –Not scalable and error prone
Identifying Requirements Instruments Can Cross Observatories Instruments Can Cross Observatories –Some instrument supplies are limited –Experiment configuration Metadata and Data Can Cross Observatories Metadata and Data Can Cross Observatories –Example: data processing for instruments should not have to be re-written
Software Middleware for MOOS Shore network MOOS moored network Shore Side Data System TCP/IP via satellite SurfaceBenthic-1Benthic-2 Instrument services Instrument services Instrument services Instrument GUI Telemetry retriever
SSDS: Metadata and Data Management Requirements for SSDS (partial list) Requirements for SSDS (partial list) –Capture observatory and instrument lifecycle data –Return instrument data in its native (“raw”) format –Simple analysis tools for viewing data –Capture and archive processed data products and associated metadata, maintaining known relationships between data sets – Convert data to common formats
SSDS: Metadata and Data Management Data Access Services Metadata Access Services Aggregate HTTP-based Services (SOA) Wet SideShore Side Ingest XML Data Metadata , , ,91.1 Instrument Packets (+ infrastructure metadata) SSDS Domain Logic API
SSDS Data Model
Data Container & Data Producer attributes Recording… Recording… –What –Where –When –Relations
Satisfying Requirements: Configuration Data Access Services Metadata Access Services Aggregate HTTP-based Services Wet SideShore Side Ingest XML Data Metadata , , ,91.1 Instrument Packets (+ infrastructure metadata) SSDS Business Logic API
Satisfying Requirements : Dynamic Lifecycle Data Access Services Metadata Access Services Aggregate HTTP-based Services Wet SideShore Side Ingest XML Data Metadata , , ,91.1 Instrument Packets (+ infrastructure metadata) SSDS Business Logic API
Satisfying Requirements : Resource Mgmt. Data Access Services Metadata Access Services Aggregate HTTP-based Services Wet SideShore Side Ingest XML Data Metadata , , ,91.1 Instrument Packets (+ infrastructure metadata) SSDS Business Logic API
Satisfying Requirements : Health Monitoring Data Access Services Metadata Access Services Aggregate HTTP-based Services Wet SideShore Side Ingest XML Data Metadata , , ,91.1 Instrument Packets (+ infrastructure metadata) SSDS Business Logic API Subject: SSDS: No recent data stream update from instruments: 1441 A problem has been encountered while checking on the status of the following data streams that SSDS is monitoring: Device ID :: Last update time (in hours) :: Device Name MSE Surface Node :: 2.9 :: Medusa Card
Satisfying Requirements : Metadata/Data Data Access Services Metadata Access Services Aggregate HTTP-based Services Wet SideShore Side Ingest XML Data Metadata , , ,91.1 Instrument Packets (+ infrastructure metadata) SSDS Business Logic API
- Perl Application Programming Interface
- Matlab (works with R2008a) % Import SSDS package import moos.ssds.services.metadata.* % Get Home interface home = moos.ssds.services.metadata.DataProducerAccessUtil.getHome(); % Get Access object dpAccess = home.create(); % Call methods on the Access object dList = dpAccess.findByName('Back', logical(0), 'id','ascending', logical(1)); it = dList.iterator; d = it.next; d.getDevice.getMfgSerialNumber ans = WL Application Programming Interface
SSDS data life cycle Instrument is defined by creating Device record. Instrument is defined by creating Device record. Instrument is configured for deployment by writing Deployment, DataContainer, RecordDescription, RecordVariable XML. Instrument is configured for deployment by writing Deployment, DataContainer, RecordDescription, RecordVariable XML. Instrument is deployed. XML metadata is ingested by SSDS, data packets flow into the Instrument Packets database. Instrument is deployed. XML metadata is ingested by SSDS, data packets flow into the Instrument Packets database. Automated DataStream processing software consumes the data packets producing a NetCDF file for each instrument’s data. A DataProducer record is created linking the input DataStream to the output DataFile. Automated DataStream processing software consumes the data packets producing a NetCDF file for each instrument’s data. A DataProducer record is created linking the input DataStream to the output DataFile.
SSDS data life cycle (cont.) Follow-on data processing runs consume instrument NetCDF DataContainers producing combined data sets and graphical products. Metadata from SSDS is extracted as needed to fully describe data in all the NetCDF data sets. Follow-on data processing runs consume instrument NetCDF DataContainers producing combined data sets and graphical products. Metadata from SSDS is extracted as needed to fully describe data in all the NetCDF data sets. User uses the data with all the needed information to assess its suitability for a particular use. User uses the data with all the needed information to assess its suitability for a particular use.
SSDS Explorer web application Drill down deployment tree Drill down deployment tree Drill down processing tree Drill down processing tree Used mainly by developers Used mainly by developers
Lessons Learned Solid Interface Definitions Key Solid Interface Definitions Key Web Services Great – Not Always SOAP Web Services Great – Not Always SOAP Policies Are As Important As Interfaces Policies Are As Important As Interfaces Identify The Critical Metadata Identify The Critical Metadata Consume The Critical Metadata Early And Often Consume The Critical Metadata Early And Often Testing Station/Simulator Testing Station/Simulator
Acknowledgements SSDS funded by the David and Lucille Packard Foundation SSDS funded by the David and Lucille Packard Foundation MOOS Leads: Mark Chaffey, Kent Headley MOOS Leads: Mark Chaffey, Kent Headley Operations: Paul Coenen, Ken Heller, Hans Thomas, Duane Thompson Operations: Paul Coenen, Ken Heller, Hans Thomas, Duane Thompson Science: Jim Barry, Francisco Chavez, Charlie Paull, Erich Rienecker, John Ryan Science: Jim Barry, Francisco Chavez, Charlie Paull, Erich Rienecker, John Ryan SSDS Development Team: Andrew Chase, Mike McCann, Brian Schlining, Rich Schramm SSDS Development Team: Andrew Chase, Mike McCann, Brian Schlining, Rich Schramm