Download presentation
Presentation is loading. Please wait.
Published byLauren Robbins Modified over 9 years ago
1
The Pipeline Processing Framework LSST Applications Meeting IPAC Feb. 19, 2008 Raymond Plante National Center for Supercomputing Applications
2
LSST Applications Meeting February 19-20, 2008 2 Overview Pipeline Framework provides –a container for hosting science algorithms –a mechanism for applying algorithm in parallel Data-Parallel Processing Model –algorithm implemented as “stage” of the pipeline –stage can have optional serial sections –parallel section applied to one data-parallel unit of data one CCD amplifier one section of sky –algorithm implementation usually avoids doing I/O I/O handled in separate steps stage is handed data it is supposed to work on exception: database access
3
LSST Applications Meeting February 19-20, 2008 3 Pipeline Concepts
4
LSST Applications Meeting February 19-20, 2008 4 Pipeline Concepts Pipeline = a sequence of processing Stages
5
LSST Applications Meeting February 19-20, 2008 5 Pipeline Concepts Pipeline = a sequence of processing Stages Each stage can be distributed across multiple processors.
6
LSST Applications Meeting February 19-20, 2008 6 Pipeline Concepts Pipeline = a sequence of processing Stages Each stage can be distributed across multiple processors. –Each stage starts and ends with synchronized serial steps
7
LSST Applications Meeting February 19-20, 2008 7 Pipeline Concepts Pipeline = a sequence of processing Stages Each stage can be distributed across multiple processors. –Each stage starts and ends with synchronized serial steps Slice = Parts of the stages working on the same portion of data. –Can reside in one address space on a single machine
8
Parallel processing Slice Stage Queue Stage Queue Pipeline Serial processing Pipeline Parallel processing Slice Pipeline Process executes serial processing controls the parallel slice workers Slice Worker Processes processes one data-parallel portion of the data (e.g. a CCD) Stage Queue Stage Queue Slice Stage Queue Stage Queue Parallel processing Slice Stage Queue Stage Queue Parallel processing Slice Stage Queue Stage Queue DC2 Pipeline Harness
9
LSST Applications Meeting February 19-20, 2008 9 Pipeline Execution Pipeline Harness manages parallel processing on HPC platforms –Message Passing Interface MPI-2 functionality via MPICH2 Explicit process spawning, control –Coordination of Serial & Parallel Processing Pipeline is a sequence of Stages “Slices” serve as data parallel worker threads Pipeline manager instructs Slices in execution of Stages Pipeline Slices communicate via MPI Pipeline Harness interface hides complexity –Application Stage developers implement Stage API process() Parallel processing preprocess(), postprocess() Serial processing –Python as Stage “glue” Stage developer writes algorithm code in C++ Python interface is generated Stitches algorithm code together to create a Stage using Python
10
LSST Applications Meeting February 19-20, 2008 10 Pipeline Dataflow Data flows through Stages via Queues A stage can add data products to it output Queue. Products can be persisted at any point in the chain. Pipeline Manager Pipeline Stage Queue Stage Queue New Input Data Output Products
11
LSST Applications Meeting February 19-20, 2008 11 Coupling Pipelines via the Event Framework Pipeline Manager Image/Detection Pipeline Stage Queue Stage Queue Pipeline Manager Object Association Pipeline Stage Queue Stage Queue Pipeline Manager Moving Objects Pipeline Stage Queue Stage Queue Event System “New Detections available” “New Moving Object Candidates Available”
12
LSST Applications Meeting February 19-20, 2008 12 Tools for Stage Implementations Configuring a stage with Policies –Policy: a set of data properties as name-value pairs –Provided to stage implementation when stage is configured Recording messages: Logging –Messages have an associated “loudness” “DEBUG” = soft; “WARN” = louder –Messages sent to a named topic topics have an associated loudness threshold messages louder than the threshold will be recorded –Messages can have data properties associated with them all messages automatically timestamped –can be used to time sub-portions of implementation caller can attach other arbitrary properties –Framework handles destination of messages outside of pipeline harness, messages printed to screen inside a parallel pipeline, messages sent out through event system, recorded in database
13
LSST Applications Meeting February 19-20, 2008 13 Possible Variations Fine control over inter-slice communication –normal communication between master and slices –stage could have direct access to other slices via MPI commands Custom pipeline –managed by pipeline orchestration layer for monitoring –external communication via events
14
LSST Applications Meeting February 19-20, 2008 14 Building the Stack Basic Installation instructions: http://dev.lsstcorp.org/pkgs/GettingStarted.html http://dev.lsstcorp.org/pkgs/GettingStarted.html setenv LSST_HOME $PWD/stack mkdir $LSST_HOME; cd $LSST_HOME curl -o newinstall.sh http://dev.lsstcorp.org/pkgs/newinstall.shhttp://dev.lsstcorp.org/pkgs/newinstall.sh sh./newinstall.sh source loadLSST.sh lsstpkg fetch LSSTPipe Best supported platform: –Linux, gcc v3.4.6 Alternatives to building the stack –Logging into LSST cluster @ NCSA –Running Virtual Machine with stack pre-installed
15
LSST Applications Meeting February 19-20, 2008 15 Working with code from the repository GettingStarted document contains SVN survival guide LSST software organized into packages –packages are separately versioned –usually one person is in charge of tracking its state Building from SVN setenv LSST_DC2 svn+ssh://svn.lsstcorp.org/DC2 svn co $LSST_DC2/fw/trunk fw-trunk # check out the package cd fw-trunk setup -r. # load required environment scons # build it in place scons install # install it into the stack
16
LSST Applications Meeting February 19-20, 2008 16 Testing Your Code Outside the framework –Create classes that can apply your algorithm to arbitrary data –Classes should not depend on pipeline framework –Create unit tests (in tests subdir.) or examples (in examples subdir.) that exercise the class –testing can occur in C++, Python or both Inside the framework –Create python implementation of a Stage class –Create a policy file for configuring stage –Create a simple pipeline using policy files –Use the launchDC2.py script from the dc2pipe package to run provide identifying name for run (run ID) as input Process will likely change somewhat for DC3
17
LSST Applications Meeting February 19-20, 2008 17 Running on the cluster The LSST cluster @ NCSA –up to date software stack –input data organized and ready for use –standard pipelines configured to write output in organized tree /lsst/DC2root contains directory for each run ID each run ID has subdirectory that names a pipeline that was run each pipeline contains... –input: the input data processed –output: the output image products –work: the pipeline's working directory, contains copy of all input policy file, log capturing stdout, stderr from master process. output database products –saved in MySQL database on lsst10 –database named after run ID
18
LSST Applications Meeting February 19-20, 2008 18 Dealing with bugs Bugs, issues and milestones are tracked using trac –life as a trac “ticket” Life Cycle: –ticket is created and assigned to a developer –developer creates copy of relevant package under the package's tickets subdirectory in svn. Example: ticket #350 for change to fw svn copy -m “addressing #350” $LSST_DC2/fw/trunk $LSST_DC2/fw/tickets/350 –changes are implemented, tested, checked into ticket branch –request code review: checked for compliance against coding standards –reviewed code merged into trunk some refinement of process is expected
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.