Pipeline Basics Jared Crossley NRAO NRAO
What is a data pipeline? One or more programs that perform a task with reduced user interaction. May be developed as an extension of a more general and more interactive software system. One or more programs that perform a task with reduced user interaction. May be developed as an extension of a more general and more interactive software system.
Why use it? Saves time Especially with large (repetitive) data sets Interactive data reduction may take a lot of time (even for an expert) Consistency Increased accessibility of a data reduction system You don’t have to be an “expert” to use a pipeline. A good learning tool -- with good documentation Saves time Especially with large (repetitive) data sets Interactive data reduction may take a lot of time (even for an expert) Consistency Increased accessibility of a data reduction system You don’t have to be an “expert” to use a pipeline. A good learning tool -- with good documentation
Building a Pipeline: Start simple Build a pipeline in layers. The lowest level of the pipeline should still be interactive. For example: Level 1: allow the user the specify input parameters needed by the following tasks. Level 2: find the best default parameter values for most data sets. Given these default values, most data can be processed with little interaction. Focus on a subset of input data. Build a pipeline in layers. The lowest level of the pipeline should still be interactive. For example: Level 1: allow the user the specify input parameters needed by the following tasks. Level 2: find the best default parameter values for most data sets. Given these default values, most data can be processed with little interaction. Focus on a subset of input data.
Building a Pipeline: continued The pipeline will evolve with time Parameter dependencies will reveal themselves Data processing algorithms will become apparent to the user. When well defined, add it to the pipeline. Acquire metadata when possible. This can be used to initialize parameters. The pipeline will evolve with time Parameter dependencies will reveal themselves Data processing algorithms will become apparent to the user. When well defined, add it to the pipeline. Acquire metadata when possible. This can be used to initialize parameters.
Areas of concern 1.How much control should the user be given? Depends on the target audience. Experts want more control than novices. A compromise is lots of controls, but most of them pre-set to good initial conditions. 1.How much control should the user be given? Depends on the target audience. Experts want more control than novices. A compromise is lots of controls, but most of them pre-set to good initial conditions.
Areas of concern 2.How many output diagnostics should the pipeline produce? Varies by processing goal and user preference. If possible, include a pipeline parameter determines the amount of diagnostics. 2.How many output diagnostics should the pipeline produce? Varies by processing goal and user preference. If possible, include a pipeline parameter determines the amount of diagnostics.
More on Output In addition to the primary output product, consider outputting calibrated data and log files. This allows advanced users to build upon what the pipeline has done And, this allows for quick “upgrades” to data products. In addition to the primary output product, consider outputting calibrated data and log files. This allows advanced users to build upon what the pipeline has done And, this allows for quick “upgrades” to data products.
Validating Output This is job is necessarily interactive. However, a pipeline can simplify the process by… Providing an easy way to view output, including diagnostics And an easy way to delete (or flag) unacceptable output. This is job is necessarily interactive. However, a pipeline can simplify the process by… Providing an easy way to view output, including diagnostics And an easy way to delete (or flag) unacceptable output.
The VLA (AIPS) Pipeline
DescriptionDescription The pipeline is a script (AIPS run file) that automates Editing, Calibration, And Imaging of VLA continuum data. May also process spectral line data. Emulates an AIPS task Takes input parameters Outputs images and calibration plots Suggested default parameters contained in AIPS memo. The pipeline is a script (AIPS run file) that automates Editing, Calibration, And Imaging of VLA continuum data. May also process spectral line data. Emulates an AIPS task Takes input parameters Outputs images and calibration plots Suggested default parameters contained in AIPS memo.
To use the AIPS pipeline: load data into AIPS; split out different frequencies. Demo: VLA (AIPS) Pipeline
Set the VLARUN input parameters. Demo: VLA (AIPS) Pipeline Flagging control Pause during calibration Diagnostic plots Imaging control Self-cal (fragile)
Image output by pipeline (axes and wedge added) Demo: VLA (AIPS) Pipeline
Demo of VLA Pipeline System: ( Imaging the VLA Archive)
DescriptionDescription The VLA Pipeline System is an extension of the AIPS pipeline. Includes 1.Data acquisition, and preparation for processing 2.Data processing (AIPS pipeline) 3.Image finalization, and export 4.Archiving 5.Easy interactive data validation The VLA Pipeline System is an extension of the AIPS pipeline. Includes 1.Data acquisition, and preparation for processing 2.Data processing (AIPS pipeline) 3.Image finalization, and export 4.Archiving 5.Easy interactive data validation
At a high level of pipeline automation, initial user interaction takes place only on the command line. The user can query the raw data archive via a Perl script: At a high level of pipeline automation, initial user interaction takes place only on the command line. The user can query the raw data archive via a Perl script: Demo: VLA Pipeline
Next, select data files for download and filling. Demo: VLA Pipeline Select files Download
A Unix shell script waits to be called by cron. Demo: VLA Pipeline Start AIPS Execute AIPS Pipeline
After processing, the output is archived via scripts invoked by cron. The data is now available online. The final step is image validation… After processing, the output is archived via scripts invoked by cron. The data is now available online. The final step is image validation… Demo: VLA Pipeline
A web-based validation tool allows for validation. Demo: VLA Pipeline
Images and diagnostics can be viewed together and flagged for removal. Demo: VLA Pipeline
For more info About AIPS Pipeline (VLARUN): AIPS Memo 112, by L. Sjouwerman. VLARUN “online” documentation. From the AIPS prompt type explain VLARUN About Pipeline System and NVAS: See the NVAS web page. For data acquisition scripts, see J. Crossley’s web page. About pipeline basics: See notes on J. Crossley’s web page. About AIPS Pipeline (VLARUN): AIPS Memo 112, by L. Sjouwerman. VLARUN “online” documentation. From the AIPS prompt type explain VLARUN About Pipeline System and NVAS: See the NVAS web page. For data acquisition scripts, see J. Crossley’s web page. About pipeline basics: See notes on J. Crossley’s web page.