Physical Oceanography Distributed Active Archive Center THUANG June 9-13, 20089th GHRSST-PP Science Team Meeting GHRSST GDAC and EOSDIS PO.DAAC Integration 9th GHRSST-PP Science Team Meeting June 9-13, 2008 Presenters: Patricia Liggett, Thomas Huang Jet Propulsion Laboratory California Institute of Technology
Physical Oceanography Distributed Active Archive Center THUANG June 9-13, 20089th GHRSST-PP Science Team Meeting GHRSST GDAC Integration An automated system Using virtual file system technology for local and remote data product discovery MMR to Submission Information Package (SIP) Translation on-the- fly Publish SIP for data product ingestion, metadata inventory, and product archive.
Physical Oceanography Distributed Active Archive Center THUANG June 9-13, 20089th GHRSST-PP Science Team Meeting Current System Architecture Need multi- mission ingestion capability Need to multi-mission metadata inventory Need single- point of access to inventory
Physical Oceanography Distributed Active Archive Center THUANG June 9-13, 20089th GHRSST-PP Science Team Meeting Architecture Approach Conforms to the CCSDS Reference Model for an Open Archival Information System (OAIS), (CCSDS B-1), in determination of program sets, their components and terminology. SIP - Submission Information Package AIP - Archival Information Package DIP - Dissemination Information Package
Physical Oceanography Distributed Active Archive Center THUANG June 9-13, 20089th GHRSST-PP Science Team Meeting New System Architecture Multi-mission ingest engine metadata inventory Extensible metadata model Spatial data queries Usability Reliability Adaptability Scalability Manageability
Physical Oceanography Distributed Active Archive Center THUANG June 9-13, 20089th GHRSST-PP Science Team Meeting Core Service Components Ingest Retrieves and receives data from external data providers to be ingested into the system. It is the point of entry to PO.DAAC for data coming from external data providers. Inventory Provides the services and functions for populating, maintaining, and accessing both descriptive information, which identifies and documents archive holdings, and administrative data used to manage the archive. Archive Provides the services and functions for the storage, maintenance and retrieval of data from the archive repository. Generate Produces value-added data products, regardless of whether these products are requested by an external user via a PO.DAAC web page, or internally from another subsystem.
Physical Oceanography Distributed Active Archive Center THUANG June 9-13, 20089th GHRSST-PP Science Team Meeting Ingest Client/Server architecture Federate architecture - multiple Ingest Server instances running in parallel for load sharing and failover. Supports TCP and Java™ Message Service communication methods Parallel request processing and data ingestion Multiple data ingestion protocols - FILE, FTP, SFTP, HTTP… Pooled ingestion model to reduce connection creation time and reduce overall resource usage
Physical Oceanography Distributed Active Archive Center THUANG June 9-13, 20089th GHRSST-PP Science Team Meeting Inventory Reusable component that manages and provides access to inventory, catalog, event log, and all other information stored in the backend database. Pooled connection to the backend database to reduce connection creation time and reduce overall resource usage => higher performance Leverage existing high performance object/relational persistence and query service to eliminate development complexity and improve manageability of SQL operations. Enable mapping to external data models for NASA services (e.g., ECHO, GCMD, etc.)
Physical Oceanography Distributed Active Archive Center THUANG June 9-13, 20089th GHRSST-PP Science Team Meeting Archive Manages and provides access to granule data files stored in the PO.DAAC repository based on policy. E.g. All GHRSST products will be stored in the PO.DAAC repository and available for distribution for 30 days, then out to NOAA LTSRF for long term archive. Performs routine integrity checking of the repository against the granule inventory in the database. Facilitates the backing up of data to the Deep Archive and the Long Term Archive.
Physical Oceanography Distributed Active Archive Center THUANG June 9-13, 20089th GHRSST-PP Science Team Meeting Generate Provides a service for producing value-added data products Value-added products refers to any data product that must be generated upon request (e.g. subsets, binned data, co-located data) Products can be requested on-demand by an external user via a PO.DAAC web page, or internally from another subsystem as part of the ingestion process Scheduling and prioritization of value-added product requests will be managed by a Workflow Manager component Resource utilization and request execution will be managed by a Resource Manager component We are currently looking at existing packages for providing the Workflow and Resource Manager capabilities
Physical Oceanography Distributed Active Archive Center THUANG June 9-13, 20089th GHRSST-PP Science Team Meeting Timeline Winter/Spring 2008 Complete development of the core system services with incremental deliveries to Integration & Test. Side-by-side testing with current operations. Summer 2008 Deploy system to operations fully supporting GHRSST data streams. Fall 2008 Migrate other PO.DAAC active data streams to the new system. Continue development of non-core system services. FY 2009 Focus on interface (GUI), data casting and subsetter integration. FY 2010 Focus on value added processing, remote processing and data mining capabilities.
Physical Oceanography Distributed Active Archive Center THUANG June 9-13, 20089th GHRSST-PP Science Team Meeting Q&A