Presentation is loading. Please wait.

Presentation is loading. Please wait.

Conductor at HiROC Bradford Castalia 19 September, 2007 PSI-2007 1 Designing and Implementing Processing Pipelines with Conductor: The HiROC Experience.

Similar presentations


Presentation on theme: "Conductor at HiROC Bradford Castalia 19 September, 2007 PSI-2007 1 Designing and Implementing Processing Pipelines with Conductor: The HiROC Experience."— Presentation transcript:

1 Conductor at HiROC Bradford Castalia 19 September, 2007 PSI-2007 1 Designing and Implementing Processing Pipelines with Conductor: The HiROC Experience Bradford Castalia Systems Analyst Planetary Image Research Laboratory HiRISE Operations Center University of Arizona Tucson, Arizona

2 Conductor at HiROC Bradford Castalia 19 September, 2007 PSI-2007 2 Pipeline Processing Conductor is a Java application for managing queues of source files to be processed by sequences of procedures. Procedures Defined in a database table by sequence number Data processing procedure Success criteria A procedure must be successful for the next to run On-failure (branch) procedure Sources Defined in a database table by source number Source file pathname Log file pathname Procedure status values Will be processed by one and only one Conductor

3 Conductor at HiROC Bradford Castalia 19 September, 2007 PSI-2007 3 Pipeline Processing Database Procedures and Sources tables are paired Multiple Conductor instances use the same database Multiple Conductor instances for the same pipeline Configuration Based on ISO standard PVL Configuration files may be shared Configuration files may be included (e.g. site config) in other configuration files Environment variables are included Conductor maintained parameters Reference Resolving Configuration parameter references Database field references Nested references Expression evaluation

4 Conductor at HiROC Bradford Castalia 19 September, 2007 PSI-2007 4 Science Teams and Ops Staff HiWeb Public HiCat Pipelines Host OS Environment ISIS HiSPICE Eng DOM PDS Products HiVali HiArch RDRgenEDRgen HiReport HiEST RSDS HiDOG Conductor Downlink Data Flow

5 Conductor at HiROC Bradford Castalia 19 September, 2007 PSI-2007 5 HiDog Pipeline EDRgen Pipeline EDR_Stats Pipeline RSDS Raw Data Repository WatchDog Check data availability HiStitch Pipeline HiccdStitch Pipeline RedGeom Pipeline ColorGeom Pipeline ColorMosaic Pipeline RDRgen (JPEG2000) Pipeline Internal Products (JPEG2000) EDR Table HiCal Pipeline RedMosaic Pipeline Full-Res Color RDR Full-Res Red RDR Table EDR Geometry Table HiCat Database Standard Data Products HiGeomInit Pipeline NAIF Node SPICE Repository HiSPICE SPICE Validation SPICE Pause Validation & Release

6 Conductor at HiROC Bradford Castalia 19 September, 2007 PSI-2007 6 Initiate and Data Download FEI_Watchdog Poll the data delivery server (RSDS) Register the download file Pipeline_Source Fetch and prep the data file Download the file from the server Notify operators on failure Only continue if configured to do so perl -e ‘exit ${Continue_Status};’ Move the file and update the Source_Pathname Register the file in the next pipeline

7 Conductor at HiROC Bradford Castalia 19 September, 2007 PSI-2007 7 EDR Production and Metadata Collection Check for multi-channel data file Break out channel files and register new sources Generate EDR product file PVL_to_DB map of PDS label parameters to HiCat EDR_Products record field values Replace existing record if configured to do so RDR and Extras Production Photometric processing Geometric processing Collect all channel files for the observation before registering them in the next pipeline Use mutilple systems in parallel for compute-intensive processing Reprocessing

8 Conductor at HiROC Bradford Castalia 19 September, 2007 PSI-2007 8 Management Issues Incremental pipeline development The ability to grow the network of pipelines without inherent ripple effects is very important. Splitting and merging pipeline segments can be done at will. Testing of pipeline segments or portions of a network can be done in sandbox environments, including individual developer or user contexts, separate from the production environment without the need for a complete production configuration yet exactly mirroring the production configuration and operations. Adaptable to the level of demand Conductor instantiations can be added or removed from pipeline processing at any time. Error tolerant Each Conductor acts independently.

9 Conductor at HiROC Bradford Castalia 19 September, 2007 PSI-2007 9 Hardware System Design Issues Network bandwidth Consider all possible sources Network overload can cause hardware switches to fail Foundation (generally not incremental) CPUs Services: database, web, e-mail Compute engines: add as needed Data storage Fast, local space; especially /tmp Bulk, shared space: add as needed NFS latencies

10 Conductor at HiROC Bradford Castalia 19 September, 2007 PSI-2007 10 Future Development: PostgreSQL New Data_Port being integrated for distribution Composer Interactive Procedures table definition Add, remove and reorder procedures Edit procedure definition fields Test reference resolving Maestro Manage multiple Conductors Local or remote Start, suspend/resume, stop Monitor logging streams Report throughput and backlogs of Sources Accumulate resource utilization metrics


Download ppt "Conductor at HiROC Bradford Castalia 19 September, 2007 PSI-2007 1 Designing and Implementing Processing Pipelines with Conductor: The HiROC Experience."

Similar presentations


Ads by Google