Status Report of EDI on the CAA
CAA Public website All pages fully accessible by anyone
CAA Test Area Currently services accessible only by ESTEC/ESAC Individual instrument team members will be allowed on a need basis Some of the CAA Command-line services are available to all
EDI Datasets for Downloading
CAA EDI Graphics
CAA EDI Graphics
Dataset Inventory Analysis There are several similar datasets where the main difference is that data vectors or matrices are converted into another system. For instance, scientific data are given in different scientific units or coordinate systems Consistence of such datasets have never been investigated although there is some evidence that some errors can occur Such similar datasets can often be expected to have the same number of records Exception: if a dataset is given in raw units, the corresponding datasets in scientific units may have less records if poor records were deleted However, if poor data have been replaced with FILLVAL, the same number of records should exist In addition such datasets should have similar values for Version numbers Generation and ingestion dates Note: these metadata can have significant differences, so there is no automatic CAA has developed a tool that collects such metadata and gives them in a text file
Inventory Output The beta version of the inventory tool is available (currently) at http://caa.estec.esa.int/caa_stage/st-tk_inventory.xml When the tool is ready, inventory is executed at TBD frequency (only for the newly ingested files)
Example: RAPID - ESPCT6 Inventory analysis for C1 "Electron, omni-directional distribution" (C1_CP_RAP_ESPCT6). Compared datasets: 1: C1_CP_RAP_ESPCT6 2: C1_CP_RAP_ESPCT6_R Generation date 2014-10-08T21:34:36Z Analysis is based on the database content at 2014-10-08T11:31:06Z Data coverage 2000-12-07T00:00:00Z/2013-12-31T23:59:59Z Columns description: Date: YYMMDD OK?: OK: comparison OK, ERR: error ERR: if ERR, the reason for error: F: not all files exist R: number of records don't match T: timestamps don't match V: versions don't match Rx: number of records in file x Vx: version of file x Gx: generation time Ix: ingestion time
Example: RAPID ESPCT6
Example of Timing errors UT time stamps can differ up to 3 milliseconds
Automation Tool Purpose of the tool: Keeping track of dataset updates that may cause re-deliveries of other products Avoid risk of having products that are out-of-date Tool consists of a number of distinct components identification of intervals in need of (re-)processing scheduling pipeline: to issue jobs across the CAA machines standard wrapper and support routines for execution of pipelines common logging, pre-validation and submission system Instrument teams may benefit of this service, particularly the first part that identifies the intervals that are in need of (re-)processing
Automation Tool: Example # Check FGM since given date 2001-01-01 yesterday #2014-05-01 C1_CP_FGM_FULL,2014-05-01 Output: C1_CP_FGM_FULL #2014-05-01 2002-05-09T18:45:42Z/2002-05-12T03:48:59Z #2014-05-01 2009-12-03T03:00:11Z/2010-01-01T12:35:48Z #2014-05-01 2011-02-08T02:40:53Z/2011-02-10T08:57:25Z #2014-05-01 2012-03-03T00:09:44Z/2012-04-01T09:47:42Z #2014-05-01 2014-03-01T04:31:36Z/2014-05-01T06:05:48Z The check is being made on all data from 2001-01-01 to the most recent day primary dataset = time specification ->ingestion date of 2014-05-01 for the whole mission dependent dataset = C1_CP_FGM_FULL (a minimum ingestion date is specified but is not really needed in this case since it is the same as the primary dataset; it was included to avoid picking up the FGM data which was re-ingested with detached headers but the data were unchanged so did not want to trigger a reprocessing of the entire mission). Result = looking for intervals where the dependent dataset has been ingested more recently than the primary, so in this case it is finding all intervals where C1_CP_FGM_FULL has been ingested since 2014-05-01
Automation Tool: Example, cont … If the prime and dependent specifications are swapped, it would then list all C1_CP_FGM_FULL intervals that had not been ingested since 2014-05-01 # Check FGM not ingested since given date 2001-01-01 yesterday C1_CP_FGM_FULL #2014-05-01 Output: C1_CP_FGM_FULL 2001-01-01T00:00:00Z/2002-05-09T18:45:42Z C1_CP_FGM_FULL 2002-05-12T03:48:59Z/2009-12-03T03:00:11Z C1_CP_FGM_FULL 2010-01-01T12:35:48Z/2011-02-08T02:40:53Z C1_CP_FGM_FULL 2011-02-10T08:57:25Z/2012-03-03T00:09:44Z C1_CP_FGM_FULL 2012-04-01T09:47:42Z/2014-03-01T04:31:36Z C1_CP_FGM_FULL 2014-05-01T06:05:48Z/2014-10-21T00:00:00Z
RAPID Example # Check if dataset C1_CP_RAP_EPITCH needs updating 2001-01-01 yesterday C1_CP_RAP_EPITCH C1_CP_FGM_FULL,2014-05-01 C1_CP_RAP_EPITCH 2002-05-09T18:45:42Z/2002-05-12T03:48:59Z C1_CP_RAP_EPITCH 2009-12-03T03:00:11Z/2010-01-01T12:35:48Z C1_CP_RAP_EPITCH 2012-04-01T00:00:00Z/2012-04-01T09:47:42Z C1_CP_RAP_EPITCH 2013-12-31T23:59:59Z/2014-05-01T06:05:48Z
RAPID Example, cont … Output can optionally be given as interval split/aligned e.g. by day 2002-05-09T00:00:00Z/2002-05-10T00:00:00Z 2002-05-10T00:00:00Z/2002-05-11T00:00:00Z 2002-05-11T00:00:00Z/2002-05-12T00:00:00Z 2002-05-12T00:00:00Z/2002-05-13T00:00:00Z 2009-12-03T00:00:00Z/2009-12-04T00:00:00Z 2009-12-04T00:00:00Z/2009-12-05T00:00:00Z 2009-12-05T00:00:00Z/2009-12-06T00:00:00Z 2009-12-06T00:00:00Z/2009-12-07T00:00:00Z ... 2009-12-26T00:00:00Z/2009-12-27T00:00:00Z 2009-12-27T00:00:00Z/2009-12-28T00:00:00Z 2009-12-28T00:00:00Z/2009-12-29T00:00:00Z 2009-12-29T00:00:00Z/2009-12-30T00:00:00Z 2009-12-30T00:00:00Z/2009-12-31T00:00:00Z 2009-12-31T00:00:00Z/2010-01-01T00:00:00Z 2010-01-01T00:00:00Z/2010-01-02T00:00:00Z 2012-04-01T00:00:00Z/2012-04-02T00:00:00Z Option also provided to give the next available version number for each interval
Search of Missing Files # Find missing FGM_FULL files 2001-01-01 yesterday C1_CP_FGM_FULL @MISSION 2001-01-01 yesterday C3_CP_FGM_FULL @MISSION Output: C1_CP_FGM_FULL 2001-01-01T00:00:00Z/2001-01-07T00:10:02Z C1_CP_FGM_FULL 2001-07-04T12:10:14Z/2001-07-06T21:16:07Z C1_CP_FGM_FULL 2009-10-28T19:32:41Z/2009-10-31T04:39:09Z C1_CP_FGM_FULL 2014-05-01T06:05:48Z/2014-10-14T00:00:00Z C3_CP_FGM_FULL 2001-01-01T00:00:00Z/2001-01-07T00:10:02Z C3_CP_FGM_FULL 2001-12-06T05:10:27Z/2001-12-08T14:18:08Z C3_CP_FGM_FULL 2005-10-05T07:30:17Z/2005-10-07T16:35:23Z C3_CP_FGM_FULL 2006-05-02T15:34:55Z/2006-05-05T00:40:56Z C3_CP_FGM_FULL 2006-07-03T12:17:51Z/2006-07-05T21:23:44Z C3_CP_FGM_FULL 2006-07-13T00:38:55Z/2006-07-15T09:47:08Z C3_CP_FGM_FULL 2009-01-14T01:32:09Z/2009-01-16T10:38:51Z C3_CP_FGM_FULL 2014-05-01T06:05:48Z/2014-10-14T00:00:00Z
EDI Delivery/Ingestion Activity The plots are regenerated daily around mid-night Monthly and 6-month plots Top two panels are taken from database Top: Number of files ingested into the database 2nd from top: average time used for one file to validate/add into the database Bottom five shows an instantaneous situation at the time of plot production 3rd: Number of files failed validation: e.g. wrong version number 4th and 5th: number of CEF and nn-CEF files in the delivery area 6th and 7th: number of CEF and non-CEF files waiting for validation
Status of File Transfer to CSA http://caa.estec.esa.int/caa/csa_stats.xml
EDI inventory Notes: If EGD exists, there is a chance for PP/SPIN/MP If EGD does not exist, no chance for PP/SPIN/MP QZC and CRF should exist always in EF-mode, so they should have the same coverage as CLIST/EF-mode PP and SPIN should have identical coverage MP should have a wider coverage than PP/SPIN
EDI inventory Inventory plots are visible in annex 2