1 CAA 20 th Cross Calibration Meeting, MPS, Gottingen 16th Oct 2014 CAA Cross Cal Meeting Oct 2014 Pipeline Automation Chris Perry
2 CAA 20 th Cross Calibration Meeting, MPS, Gottingen 16th Oct 2014 Motivation The CAA undertakes a number of data processing tasks Auxiliary products such as spacecraft position, Telemetry mode, timing information Value added products such as conversion of existing products to ISR2, EFW L3 etc. Format conversion e.g. WDB format conversion, CSDS to CAA Currently many pipelines are run semi-automatically or manually Keeping track of updates, particularly for re-deliveries of source products is a major challenge Significant risk of products becoming out of date Increase automation required To identify intervals in need of reprocessing To automatically prioritize and schedule tasks across CAA resources
3 CAA 20 th Cross Calibration Meeting, MPS, Gottingen 16th Oct 2014 Design Task can be broken down into a number of distinct components A generic system for identification of intervals in need of processing A pipeline scheduling system to issue jobs across the CAA machines A standard wrapper and support routines for execution of pipelines A common logging, pre-validation and submission system Streamline interval detection by only using file time span information Record level comparison (eg: Tomasz detailed checking) important from QA perspective, but more intensive and requires manual interpretation Automated system only requires access to DB information, therefore can check entire mission quickly Non-DB dependencies can easily be accommodated Whole files need to be re-processed and re-submitted This service could be provided as service to instrument teams
4 CAA 20 th Cross Calibration Meeting, MPS, Gottingen 16th Oct 2014 Design DB Non-DB Time Lists Non-DB Time Lists Dataset ID Dependencies Check Availability Time Intervals Merge Identify Intervals Dep Intervals Proc Intervals Comparison based on ingestion date Require all dependencies to exist Require any dependency to be newer Consolidate the result
5 CAA 20 th Cross Calibration Meeting, MPS, Gottingen 16th Oct 2014 Design Simple configuration table yesterday yesterday yesterday yesterday yesterday C1_CT_AUX_TMMODE yesterday C2_CT_AUX_TMMODE yesterday C3_CT_AUX_TMMODE yesterday C4_CT_AUX_TMMODE yesterday C1_CP_AUX_SPIN_AXIS CL_SP_AUX yesterday C2_CP_AUX_SPIN_AXIS CL_SP_AUX yesterday C3_CP_AUX_SPIN_AXIS CL_SP_AUX yesterday C4_CP_AUX_SPIN_AXIS CL_SP_AUX yesterday-20days yesterday-20days yesterday-20days yesterday-20days yesterday-20days yesterday-20days yesterday-20days yesterday-20days
6 CAA 20 th Cross Calibration Meeting, MPS, Gottingen 16th Oct 2014 Design Find recently delivered intervals # Check FGM since given date yesterday # C1_CP_FGM_FULL, # Check if dataset C1_CP_RAP_EPITCH needs updating yesterday C1_CP_RAP_EPITCH C1_CP_FGM_FULL, C1_CP_FGM_FULL # T18:45:42Z/ T03:48:59Z # T03:00:11Z/ T12:35:48Z # T02:40:53Z/ T08:57:25Z # T00:09:44Z/ T09:47:42Z # T04:31:36Z/ T06:05:48Z C1_CP_RAP_EPITCH T18:45:42Z/ T03:48:59Z C1_CP_RAP_EPITCH T03:00:11Z/ T12:35:48Z C1_CP_RAP_EPITCH T00:00:00Z/ T09:47:42Z C1_CP_RAP_EPITCH T23:59:59Z/ T06:05:48Z Output:
7 CAA 20 th Cross Calibration Meeting, MPS, Gottingen 16th Oct 2014 Design Output can optionally be given as interval split/aligned e.g. by day T00:00:00Z/ T00:00:00Z T00:00:00Z/ T00:00:00Z T00:00:00Z/ T00:00:00Z T00:00:00Z/ T00:00:00Z T00:00:00Z/ T00:00:00Z T00:00:00Z/ T00:00:00Z T00:00:00Z/ T00:00:00Z T00:00:00Z/ T00:00:00Z T00:00:00Z/ T00:00:00Z T00:00:00Z/ T00:00:00Z T00:00:00Z/ T00:00:00Z T00:00:00Z/ T00:00:00Z T00:00:00Z/ T00:00:00Z T00:00:00Z/ T00:00:00Z T00:00:00Z/ T00:00:00Z T00:00:00Z/ T00:00:00Z Option also provided to give the next available version number for each interval
8 CAA 20 th Cross Calibration Meeting, MPS, Gottingen 16th Oct 2014 Design Can also be used to find missing intervals for dataset # Find missing FGM_FULL files yesterday yesterday C1_CP_FGM_FULL T00:00:00Z/ T00:10:02Z C1_CP_FGM_FULL T12:10:14Z/ T21:16:07Z C1_CP_FGM_FULL T19:32:41Z/ T04:39:09Z C1_CP_FGM_FULL T06:05:48Z/ T00:00:00Z C3_CP_FGM_FULL T00:00:00Z/ T00:10:02Z C3_CP_FGM_FULL T05:10:27Z/ T14:18:08Z C3_CP_FGM_FULL T07:30:17Z/ T16:35:23Z C3_CP_FGM_FULL T15:34:55Z/ T00:40:56Z C3_CP_FGM_FULL T12:17:51Z/ T21:23:44Z C3_CP_FGM_FULL T00:38:55Z/ T09:47:08Z C3_CP_FGM_FULL T01:32:09Z/ T10:38:51Z C3_CP_FGM_FULL T06:05:48Z/ T00:00:00Z Output:
9 CAA 20 th Cross Calibration Meeting, MPS, Gottingen 16th Oct 2014 Design Or even for the Raw Data # Find missing Raw Data intervals @RAW_DATA T00:00:00Z/ T00:00:00Z Output: System used for internal CAA automation but if thought useful could be provided as a web service for access by instrument teams
10 CAA 20 th Cross Calibration Meeting, MPS, Gottingen 16th Oct 2014 Running the Pipelines DB Host Config Host Config Create Intervals Create Intervals Process Intervals Pipeline Config Pipeline Config Identify Version Identify Version Create Job Create Job Processing Jobs Check Load Submit Job Submit Job Execute Job Execute Job CEFpass & Submit Output Dropzone
11 CAA 20 th Cross Calibration Meeting, MPS, Gottingen 16th Oct 2014 Standard I/F for pipeline wrapper script Generic pipeline configuration allows to specify properties Job Priority Which machines, max concurrent jobs in total and on any one machine # 1) The dataset identifier # 2) The relative priority 0 (least) and 9 (most) likley to be run # 3) The max number jobs (if negative then means for all spacecraft) # 4) The max number jobs on a single machine (if negative...as above) # 5) The location of the pipeline script which conforms to the standard interface # 6) [Optional] hosts that can be used (defualt: any) e.g. caa[47] C1_CP_AUX_POSGSE_1M /home/caa_ops/PIPELINE/AUX_POSGSE_1M/bdgp_aux_posgse.sh C2_CP_AUX_POSGSE_1M /home/caa_ops/PIPELINE/AUX_POSGSE_1M/bdgp_aux_posgse.sh C3_CP_AUX_POSGSE_1M /home/caa_ops/PIPELINE/AUX_POSGSE_1M/bdgp_aux_posgse.sh C4_CP_AUX_POSGSE_1M /home/caa_ops/PIPELINE/AUX_POSGSE_1M/bdgp_aux_posgse.sh C1_CT_AUX_TMMODE /home/caa_ops/PIPELINE/AUX_TMMODE/bdgp_aux_tmmode.sh C2_CT_AUX_TMMODE /home/caa_ops/PIPELINE/AUX_TMMODE/bdgp_aux_tmmode.sh C3_CT_AUX_TMMODE /home/caa_ops/PIPELINE/AUX_TMMODE/bdgp_aux_tmmode.sh C4_CT_AUX_TMMODE /home/caa_ops/PIPELINE/AUX_TMMODE/bdgp_aux_tmmode.sh C1_CP_AUX_SPIN_AXIS /home/caa_ops/PIPELINE/AUX_SPIN_AXIS/bdgp_aux_spin_axis.sh C2_CP_AUX_SPIN_AXIS /home/caa_ops/PIPELINE/AUX_SPIN_AXIS/bdgp_aux_spin_axis.sh C3_CP_AUX_SPIN_AXIS /home/caa_ops/PIPELINE/AUX_SPIN_AXIS/bdgp_aux_spin_axis.sh C4_CP_AUX_SPIN_AXIS /home/caa_ops/PIPELINE/AUX_SPIN_AXIS/bdgp_aux_spin_axis.sh C1_CT_AUX_TIME_CHK /home/caa_ops/PIPELINE/SUNREF_INFO/bdgp_aux_time_chk.sh C2_CT_AUX_TIME_CHK /home/caa_ops/PIPELINE/SUNREF_INFO/bdgp_aux_time_chk.sh C3_CT_AUX_TIME_CHK /home/caa_ops/PIPELINE/SUNREF_INFO/bdgp_aux_time_chk.sh C4_CT_AUX_TIME_CHK /home/caa_ops/PIPELINE/SUNREF_INFO/bdgp_aux_time_chk.sh C1_CP_AUX_SPIN_TIME /home/caa_ops/PIPELINE/SUNREF_INFO/bdgp_aux_spin_time.sh C2_CP_AUX_SPIN_TIME /home/caa_ops/PIPELINE/SUNREF_INFO/bdgp_aux_spin_time.sh C3_CP_AUX_SPIN_TIME /home/caa_ops/PIPELINE/SUNREF_INFO/bdgp_aux_spin_time.sh C4_CP_AUX_SPIN_TIME /home/caa_ops/PIPELINE/SUNREF_INFO/bdgp_aux_spin_time.sh CSDS_SYNC /home/caa_ops/PIPELINE/CSDS_SYNC/csds_sync.sh caa7 Running the Pipelines
12 CAA 20 th Cross Calibration Meeting, MPS, Gottingen 16th Oct 2014 System has been implemented to address the issue of ensuring CAA produced products remain up-to-date Currently operating for AUX products ( POSGSE, SPIN_TIME, TIME_CHK, SPIN_AXIS, CSDS conversion) Next step to incorporate other existing CAA pipeline tasks (EFW L3, ISR2 conversion, WBD conversion) In most cases just need to adapt existing wrapper script and configure Testing and QA to ensure pipelines operating as expected If useful some parts of the system (e.g. interval detection system) could be accessed by teams to assist with their production Caveat: Note there can be few days delay between when file is submitted and when it appears on the DB Summary / Status