Download presentation
Presentation is loading. Please wait.
Published byArnold Black Modified over 9 years ago
2
Thomas Huang PO.DAAC Software System Engineer Jet Propulsion Laboratory California Institute of Technology These activities were carried out at the Jet Propulsion Laboratory, California Institute of Technology, under a contract with the National Aeronautics and Space Administration. © 2010 California Institute of Technology. Government sponsorship acknowledged.
3
About me… PO.DAAC Software System Engineer and Architect of its Data Management and Archive System Background in planetary data management, secure near real-time distribution systems Huang - 01062010
4
Outline Pattern for data ingestion to distribution Our legacy data system The new PO.DAAC Data Management and Archive System Conclusion Q&A Huang - 01062010
5
Simple Pattern Huang - 01062010
6
Can All These Broken Pieces Fit? Huang - 01062010
7
Legacy Data Systems Huang - 01062010 … It Works!? 3 different data systems according to the simple pattern Deployed in multiple instances Mostly consists of one-off scripts Limited reusability Limited portability Scalability? Reliability?
8
stovepipe Legacy Data Systems Huang - 01062010
9
Our New Data Management and Archive System Huang - 01062010
10
Software Development Process
11
Technologies and Standards Huang - 01062010
12
Documents Huang - 01062010
13
Architecture A system of RESTful services Standardized messages exchange between services Unified data model Distributed data ingestion services Standardized event tracking and notification service Huang - 01062010
14
Manager Webservice Transaction-Oriented Load-Balanced job assignment On-The-Fly Deployment of Engines Dynamic support of new data product State-Driven Product Management Resource Management Transaction-Oriented Load-Balanced job assignment On-The-Fly Deployment of Engines Dynamic support of new data product State-Driven Product Management Resource Management RESTful Huang - 01062010
15
File Management Engines RESTful Lightweight RESTful file service Supports typical file operations (add, move, delete, etc.) A single instance can carryout multiple granule operations in parallel Supports various file protocols (FTP, SFTP, FILE, HTTP… etc.) Tracks and limits the number of jobs it can handle Trans and limits the number of outbound communications Typical instances: ingest, archive, and purge Lightweight RESTful file service Supports typical file operations (add, move, delete, etc.) A single instance can carryout multiple granule operations in parallel Supports various file protocols (FTP, SFTP, FILE, HTTP… etc.) Tracks and limits the number of jobs it can handle Trans and limits the number of outbound communications Typical instances: ingest, archive, and purge Huang - 01062010
16
Product Inventory Unified Metadata Data Model References applicable models (e.g. ISO 19115, DIF, DIF, ECHO, GCMD…) Extensible to support capturing of collection/dataset/granule-specific data attributes Support geospatial data Support project-specific data archive and distribution policies Unified Metadata Data Model References applicable models (e.g. ISO 19115, DIF, DIF, ECHO, GCMD…) Extensible to support capturing of collection/dataset/granule-specific data attributes Support geospatial data Support project-specific data archive and distribution policies Huang - 01062010
17
Data Handlers An application framework Plugin interface for product-specific metadata handling and validation Transforming product metadata into internal Submission Information Package (SIP) Data discovery Local caching of data products Huang - 01062010
18
Data Handlers - GHRSST Adaptation – MMR validation and translation – Data file validation – Scans local/remote locations for new data – Integration with back-end RDAC cluster Inventory – Full migration from existing MySQL database Port to use the new data model – FGDC and Index generators – Website Adaptation – MMR validation and translation – Data file validation – Scans local/remote locations for new data – Integration with back-end RDAC cluster Inventory – Full migration from existing MySQL database Port to use the new data model – FGDC and Index generators – Website Huang - 01062010 The Group for High-Resolution Sea Surface Temperature (GHRSST) Ingest and maintain interfaces to 52 GHRSST L2P/L3P/L4 datastreams from 10 Regional Data Assembly Center (RDAC) ~25GB/day >5000 granules/day Realtime quality checking for data and metadata granules Create Federal Geographic Data Committee metadata for daily collection granules Distribution via FTP/OPeNDAP/POET Maintain interfaces to the LTSRF for 30- day old data and metadata exchange
19
Data Handlers - ASCAT Adaptation – Metadata validation and translation – Data file validation – Scans remote locations for new data Dataset definition and policies Adaptation – Metadata validation and translation – Data file validation – Scans remote locations for new data Dataset definition and policies Huang - 01062010 The Advanced SCATterometer (ASCAT) Ingest and maintain interfaces to 2 L2 datastreams KNMI ~57 MB/day ~21 GB/year
20
Significant Event WS Huang - 01062010
21
Significant Event Web Huang - 01062010
22
DAAC in a Box? Huang - 01062010
23
“premature optimization is the root of all evil.” Donald Knuth “The Art of Computer Programming” Huang - 01062010
24
Ingest3 (36 parallel jobs) Archive3 (36 parallel jobs) Purge2 (20 parallel jobs) 21,254 granules/day 4 seconds/granule 21,254 granules/day 4 seconds/granule Implementation Optimization Database Performance Turning Implementation Optimization Database Performance Turning Sample Performance Huang - 01062010
25
Conclusion PO.DAAC DMAS A system of RESTful webservices Scalable Portable Extensible Operationally supports GHRSST and ASCAT Future works New products: Aquarius GHRSST GDS 2.0 metadata model Migration Data subscription Administration tools Huang - 01062010
27
BACKUP SLIDES
28
FY ‘09 Highlights Webservice Architecture Data Ingestion and Archive WS Distributed Ingestion/Archive Engines Load Balancing Service Monitoring Significant Event WS Suite of reusable components ECHO publication Dataset and Granule metadata GHRSST ASCAT L2 ASCAT Huang - 09022009
29
Product Subscription Enable implementation of value-added services
30
Archive Tools Metadata Distribution
31
… can we build a data system with all these characteristics? Scalable Simple Speed Standardize Our Challenge Huang - 09022009
32
Load-Balance Transaction-Oriented On-The-Fly Deployment of Engines Dynamic support of new data product Scalable State-Driven Job Management Load-Balance Transaction-Oriented On-The-Fly Deployment of Engines Dynamic support of new data product Scalable State-Driven Job Management DMAS – Ingestion and Archive Service Huang - 09022009
33
DMAS – Significant Event Service Huang - 09022009
34
Swath Tiler Metadata Submission Metadata Submission Dataset subscriber Trigger by newly archived granules Dispatch swath tiling program Submit tiling metadata to NAIAD WS Dataset subscriber Trigger by newly archived granules Dispatch swath tiling program Submit tiling metadata to NAIAD WS DMAS – Data Subscriber Integration with NAIAD Huang - 09022009
35
DMAS Goals Service tools administration product rollout contact management New data subscription capability Making DMAS the data hub - RSS feed, automatic delivery of new granule, thumbnail generation… etc. New dataset search capability evaluating VODC – ACCESS program New data products Legacy migration support Planning 4 DMAS releases FY ’10 2 System Releases (DMAS + T&S) Huang - 09022009
36
Configuration Management How to management versions of third-party software dependency matrix upgrade to one or more third-party software Standard development process between development teams change management software packaging dependency management Standard build and deployment process FY ’10 CM? Huang - 09022009
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.