Download presentation
Presentation is loading. Please wait.
Published byGabriel Bishop Modified over 9 years ago
1
Pipeline Basics Jared Crossley NRAO NRAO
2
What is a data pipeline? One or more programs that perform a task with reduced user interaction. May be developed as an extension of a more general and more interactive software system. One or more programs that perform a task with reduced user interaction. May be developed as an extension of a more general and more interactive software system.
3
Why use it? Saves time Especially with large (repetitive) data sets Interactive data reduction may take a lot of time (even for an expert) Consistency Increased accessibility of a data reduction system You don’t have to be an “expert” to use a pipeline. A good learning tool -- with good documentation Saves time Especially with large (repetitive) data sets Interactive data reduction may take a lot of time (even for an expert) Consistency Increased accessibility of a data reduction system You don’t have to be an “expert” to use a pipeline. A good learning tool -- with good documentation
4
Building a Pipeline: Start simple Build a pipeline in layers. The lowest level of the pipeline should still be interactive. For example: Level 1: allow the user the specify input parameters needed by the following tasks. Level 2: find the best default parameter values for most data sets. Given these default values, most data can be processed with little interaction. Focus on a subset of input data. Build a pipeline in layers. The lowest level of the pipeline should still be interactive. For example: Level 1: allow the user the specify input parameters needed by the following tasks. Level 2: find the best default parameter values for most data sets. Given these default values, most data can be processed with little interaction. Focus on a subset of input data.
5
Building a Pipeline: continued The pipeline will evolve with time Parameter dependencies will reveal themselves Data processing algorithms will become apparent to the user. When well defined, add it to the pipeline. Acquire metadata when possible. This can be used to initialize parameters. The pipeline will evolve with time Parameter dependencies will reveal themselves Data processing algorithms will become apparent to the user. When well defined, add it to the pipeline. Acquire metadata when possible. This can be used to initialize parameters.
6
Areas of concern 1.How much control should the user be given? Depends on the target audience. Experts want more control than novices. A compromise is lots of controls, but most of them pre-set to good initial conditions. 1.How much control should the user be given? Depends on the target audience. Experts want more control than novices. A compromise is lots of controls, but most of them pre-set to good initial conditions.
7
Areas of concern 2.How many output diagnostics should the pipeline produce? Varies by processing goal and user preference. If possible, include a pipeline parameter determines the amount of diagnostics. 2.How many output diagnostics should the pipeline produce? Varies by processing goal and user preference. If possible, include a pipeline parameter determines the amount of diagnostics.
8
More on Output In addition to the primary output product, consider outputting calibrated data and log files. This allows advanced users to build upon what the pipeline has done And, this allows for quick “upgrades” to data products. In addition to the primary output product, consider outputting calibrated data and log files. This allows advanced users to build upon what the pipeline has done And, this allows for quick “upgrades” to data products.
9
Validating Output This is job is necessarily interactive. However, a pipeline can simplify the process by… Providing an easy way to view output, including diagnostics And an easy way to delete (or flag) unacceptable output. This is job is necessarily interactive. However, a pipeline can simplify the process by… Providing an easy way to view output, including diagnostics And an easy way to delete (or flag) unacceptable output.
10
The VLA (AIPS) Pipeline
11
DescriptionDescription The pipeline is a script (AIPS run file) that automates Editing, Calibration, And Imaging of VLA continuum data. May also process spectral line data. Emulates an AIPS task Takes input parameters Outputs images and calibration plots Suggested default parameters contained in AIPS memo. The pipeline is a script (AIPS run file) that automates Editing, Calibration, And Imaging of VLA continuum data. May also process spectral line data. Emulates an AIPS task Takes input parameters Outputs images and calibration plots Suggested default parameters contained in AIPS memo.
12
To use the AIPS pipeline: load data into AIPS; split out different frequencies. Demo: VLA (AIPS) Pipeline
13
Set the VLARUN input parameters. Demo: VLA (AIPS) Pipeline Flagging control Pause during calibration Diagnostic plots Imaging control Self-cal (fragile)
14
Image output by pipeline (axes and wedge added) Demo: VLA (AIPS) Pipeline
15
Demo of VLA Pipeline System: ( Imaging the VLA Archive)
16
DescriptionDescription The VLA Pipeline System is an extension of the AIPS pipeline. Includes 1.Data acquisition, and preparation for processing 2.Data processing (AIPS pipeline) 3.Image finalization, and export 4.Archiving 5.Easy interactive data validation The VLA Pipeline System is an extension of the AIPS pipeline. Includes 1.Data acquisition, and preparation for processing 2.Data processing (AIPS pipeline) 3.Image finalization, and export 4.Archiving 5.Easy interactive data validation
17
At a high level of pipeline automation, initial user interaction takes place only on the command line. The user can query the raw data archive via a Perl script: At a high level of pipeline automation, initial user interaction takes place only on the command line. The user can query the raw data archive via a Perl script: Demo: VLA Pipeline
18
Next, select data files for download and filling. Demo: VLA Pipeline Select files Download
19
A Unix shell script waits to be called by cron. Demo: VLA Pipeline Start AIPS Execute AIPS Pipeline
20
After processing, the output is archived via scripts invoked by cron. The data is now available online. The final step is image validation… After processing, the output is archived via scripts invoked by cron. The data is now available online. The final step is image validation… Demo: VLA Pipeline
21
A web-based validation tool allows for validation. Demo: VLA Pipeline
22
Images and diagnostics can be viewed together and flagged for removal. Demo: VLA Pipeline
23
For more info About AIPS Pipeline (VLARUN): AIPS Memo 112, by L. Sjouwerman. http://www.aips.nrao.edu/aipsmemo.html http://www.aips.nrao.edu/aipsmemo.html VLARUN “online” documentation. From the AIPS prompt type explain VLARUN About Pipeline System and NVAS: See the NVAS web page. http://www.aoc.nrao.edu/~vlbacald http://www.aoc.nrao.edu/~vlbacald For data acquisition scripts, see J. Crossley’s web page. http://www.aoc.nrao.edu/~jcrossle/ http://www.aoc.nrao.edu/~jcrossle/ About pipeline basics: See notes on J. Crossley’s web page. About AIPS Pipeline (VLARUN): AIPS Memo 112, by L. Sjouwerman. http://www.aips.nrao.edu/aipsmemo.html http://www.aips.nrao.edu/aipsmemo.html VLARUN “online” documentation. From the AIPS prompt type explain VLARUN About Pipeline System and NVAS: See the NVAS web page. http://www.aoc.nrao.edu/~vlbacald http://www.aoc.nrao.edu/~vlbacald For data acquisition scripts, see J. Crossley’s web page. http://www.aoc.nrao.edu/~jcrossle/ http://www.aoc.nrao.edu/~jcrossle/ About pipeline basics: See notes on J. Crossley’s web page.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.