The Pipeline Processing Framework LSST Applications Meeting IPAC Feb. 19, 2008 Raymond Plante National Center for Supercomputing Applications.

Slides:



Advertisements
Similar presentations
More on Processes Chapter 3. Process image _the physical representation of a process in the OS _an address space consisting of code, data and stack segments.
Advertisements

JTX Overview Overview of Job Tracking for ArcGIS (JTX)
WHAT IS AN OPERATING SYSTEM? An interface between users and hardware - an environment "architecture ” Allows convenient usage; hides the tedious stuff.
Operating System Structure
Operating-System Structures
1 DPS for DC2 Summary Model Implementation –Pipeline & Slice in Python and C++ Stage Loop, Policy configuration, Event handling in Python MPI env and communications.
NOAO Brown Bag May 13, 2008 Tucson, AZ 1 Data Management Middleware NOAO Brown Bag Tucson, AZ May 13, 2008 Jeff Kantor LSST Corporation.
FIU Chapter 7: Input/Output Jerome Crooks Panyawat Chiamprasert
Lecturer: Sebastian Coope Ashton Building, Room G.18 COMP 201 web-page: Lecture.
1 Frameworks. 2 Framework Set of cooperating classes/interfaces –Structure essential mechanisms of a problem domain –Programmer can extend framework classes,
1 Processes and Pipes COS 217 Professor Jennifer Rexford.
2/6/2008Prof. Hilfinger CS164 Lecture 71 Version Control Lecture 7.
1 SWE Introduction to Software Engineering Lecture 22 – Architectural Design (Chapter 13)
1: Operating Systems Overview
Application architectures
Computer Science 162 Section 1 CS162 Teaching Staff.
Chapter 1 and 2 Computer System and Operating System Overview
Chapter 11 Operating Systems
Chapter 1 and 2 Computer System and Operating System Overview
Low level CASE: Source Code Management. Source Code Management  Also known as Configuration Management  Source Code Managers are tools that: –Archive.
Cambodia-India Entrepreneurship Development Centre - : :.... :-:-
Lecture 2 – MapReduce CPE 458 – Parallel Programming, Spring 2009 Except as otherwise noted, the content of this presentation is licensed under the Creative.
Application architectures
Chapter 51 Threads Chapter 5. 2 Process Characteristics  Concept of Process has two facets.  A Process is: A Unit of resource ownership:  a virtual.
PIKA Technologies Inc. Analog Logger Application Sample December 2009.
Introduction to Parallel Programming MapReduce Except where otherwise noted all portions of this work are Copyright (c) 2007 Google and are licensed under.
Workflow Management Chris A. Mattmann OODT Component Working Group.
WaveMaker Visual AJAX Studio 4.0 Training Troubleshooting.
Building service testbeds on FIRE D5.2.5 Virtual Cluster on Federated Cloud Demonstration Kit August 2012 Version 1.0 Copyright © 2012 CESGA. All rights.
Framework for Automated Builds Natalia Ratnikova CHEP’03.
Software Tools and Processes Training and Discussion October 16, :00-4:30 p.m. Jim Willenbring.
Oracle10g RAC Service Architecture Overview of Real Application Cluster Ready Services, Nodeapps, and User Defined Services.
1 Lecture 4: Threads Operating System Fall Contents Overview: Processes & Threads Benefits of Threads Thread State and Operations User Thread.
LOGO OPERATING SYSTEM Dalia AL-Dabbagh
Operating System Review September 10, 2012Introduction to Computer Security ©2004 Matt Bishop Slide #1-1.
MICROPROCESSOR INPUT/OUTPUT
FINAL MPX DELIVERABLE Due when you schedule your interview and presentation.
LiveCycle Data Services Introduction Part 2. Part 2? This is the second in our series on LiveCycle Data Services. If you missed our first presentation,
1 Apache. 2 Module - Apache ♦ Overview This module focuses on configuring and customizing Apache web server. Apache is a commonly used Hypertext Transfer.
©Ian Sommerville 2000 Software Engineering, 6th edition. Chapter 10Slide 1 Architectural Design l Establishing the overall structure of a software system.
Parallel Interactive Computing with PyTrilinos and IPython Bill Spotz, SNL (Brian Granger, Tech-X Corporation) November 8, 2007 Trilinos Users Group Meeting.
Introduction of Geoprocessing Topic 7a 4/10/2007.
Computer Emergency Notification System (CENS)
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 3: Operating-System Structures System Components Operating System Services.
Lecture 3 Process Concepts. What is a Process? A process is the dynamic execution context of an executing program. Several processes may run concurrently,
Progress with migration to SVN Part3: How to work with g4svn and geant4tags tools. Geant4.
Replay Compilation: Improving Debuggability of a Just-in Time Complier Presenter: Jun Tao.
Framework for MDO Studies Amitay Isaacs Center for Aerospace System Design and Engineering IIT Bombay.
1: Operating Systems Overview 1 Jerry Breecher Fall, 2004 CLARK UNIVERSITY CS215 OPERATING SYSTEMS OVERVIEW.
Overview of the Automated Build & Deployment Process Johnita Beasley Tuesday, April 29, 2008.
Cluster Software Overview
Pipeline Introduction Sequential steps of –Plugin calls –Script calls –Cluster jobs Purpose –Codifies the process of creating the data set –Reduces human.
DØ Offline Reconstruction and Analysis Control Framework J.Kowalkowski, H.Greenlee, Q.Li, S.Protopopescu, G.Watts, V.White, J.Yu.
Introduction to Git Yonglei Tao GVSU. Version Control Systems  Also known as Source Code Management systems  Increase your productivity by allowing.
Source Level Debugging of Parallel Programs Roland Wismüller LRR-TUM, TU München Germany.
بسم الله الرحمن الرحيم MEMORY AND I/O.
Introduction Contain two or more CPU share common memory and peripherals. Provide greater system throughput. Multiple processor executing simultaneous.
INTRODUCTION TO HADOOP. OUTLINE  What is Hadoop  The core of Hadoop  Structure of Hadoop Distributed File System  Structure of MapReduce Framework.
INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.
Version Control Systems
Architecture Review 10/11/2004
Development Environment
OpenPBS – Distributed Workload Management System
Cross Platform Development using Software Matrix
HTCondor and LSST Stephen Pietrowicz Senior Research Programmer National Center for Supercomputing Applications HTCondor Week May 2-5, 2017.
Chapter 2: System Structures
Introduction to Operating System (OS)
CS510 Operating System Foundations
Presentation transcript:

The Pipeline Processing Framework LSST Applications Meeting IPAC Feb. 19, 2008 Raymond Plante National Center for Supercomputing Applications

LSST Applications Meeting February 19-20, Overview Pipeline Framework provides –a container for hosting science algorithms –a mechanism for applying algorithm in parallel Data-Parallel Processing Model –algorithm implemented as “stage” of the pipeline –stage can have optional serial sections –parallel section applied to one data-parallel unit of data one CCD amplifier one section of sky –algorithm implementation usually avoids doing I/O I/O handled in separate steps stage is handed data it is supposed to work on exception: database access

LSST Applications Meeting February 19-20, Pipeline Concepts

LSST Applications Meeting February 19-20, Pipeline Concepts Pipeline = a sequence of processing Stages

LSST Applications Meeting February 19-20, Pipeline Concepts Pipeline = a sequence of processing Stages Each stage can be distributed across multiple processors.

LSST Applications Meeting February 19-20, Pipeline Concepts Pipeline = a sequence of processing Stages Each stage can be distributed across multiple processors. –Each stage starts and ends with synchronized serial steps

LSST Applications Meeting February 19-20, Pipeline Concepts Pipeline = a sequence of processing Stages Each stage can be distributed across multiple processors. –Each stage starts and ends with synchronized serial steps Slice = Parts of the stages working on the same portion of data. –Can reside in one address space on a single machine

Parallel processing Slice Stage Queue Stage Queue Pipeline Serial processing Pipeline Parallel processing Slice Pipeline Process executes serial processing controls the parallel slice workers Slice Worker Processes processes one data-parallel portion of the data (e.g. a CCD)‏ Stage Queue Stage Queue Slice Stage Queue Stage Queue Parallel processing Slice Stage Queue Stage Queue Parallel processing Slice Stage Queue Stage Queue DC2 Pipeline Harness

LSST Applications Meeting February 19-20, Pipeline Execution Pipeline Harness manages parallel processing on HPC platforms –Message Passing Interface MPI-2 functionality via MPICH2 Explicit process spawning, control –Coordination of Serial & Parallel Processing Pipeline is a sequence of Stages “Slices” serve as data parallel worker threads Pipeline manager instructs Slices in execution of Stages Pipeline Slices communicate via MPI Pipeline Harness interface hides complexity –Application Stage developers implement Stage API process() Parallel processing preprocess(), postprocess() Serial processing –Python as Stage “glue” Stage developer writes algorithm code in C++ Python interface is generated Stitches algorithm code together to create a Stage using Python

LSST Applications Meeting February 19-20, Pipeline Dataflow Data flows through Stages via Queues A stage can add data products to it output Queue. Products can be persisted at any point in the chain. Pipeline Manager Pipeline Stage Queue Stage Queue New Input Data Output Products

LSST Applications Meeting February 19-20, Coupling Pipelines via the Event Framework Pipeline Manager Image/Detection Pipeline Stage Queue Stage Queue Pipeline Manager Object Association Pipeline Stage Queue Stage Queue Pipeline Manager Moving Objects Pipeline Stage Queue Stage Queue Event System “New Detections available” “New Moving Object Candidates Available”

LSST Applications Meeting February 19-20, Tools for Stage Implementations Configuring a stage with Policies –Policy: a set of data properties as name-value pairs –Provided to stage implementation when stage is configured Recording messages: Logging –Messages have an associated “loudness” “DEBUG” = soft; “WARN” = louder –Messages sent to a named topic topics have an associated loudness threshold messages louder than the threshold will be recorded –Messages can have data properties associated with them all messages automatically timestamped –can be used to time sub-portions of implementation caller can attach other arbitrary properties –Framework handles destination of messages outside of pipeline harness, messages printed to screen inside a parallel pipeline, messages sent out through event system, recorded in database

LSST Applications Meeting February 19-20, Possible Variations Fine control over inter-slice communication –normal communication between master and slices –stage could have direct access to other slices via MPI commands Custom pipeline –managed by pipeline orchestration layer for monitoring –external communication via events

LSST Applications Meeting February 19-20, Building the Stack Basic Installation instructions: setenv LSST_HOME $PWD/stack mkdir $LSST_HOME; cd $LSST_HOME curl -o newinstall.sh sh./newinstall.sh source loadLSST.sh lsstpkg fetch LSSTPipe Best supported platform: –Linux, gcc v3.4.6 Alternatives to building the stack –Logging into LSST NCSA –Running Virtual Machine with stack pre-installed

LSST Applications Meeting February 19-20, Working with code from the repository GettingStarted document contains SVN survival guide LSST software organized into packages –packages are separately versioned –usually one person is in charge of tracking its state Building from SVN setenv LSST_DC2 svn+ssh://svn.lsstcorp.org/DC2 svn co $LSST_DC2/fw/trunk fw-trunk # check out the package cd fw-trunk setup -r. # load required environment scons # build it in place scons install # install it into the stack

LSST Applications Meeting February 19-20, Testing Your Code Outside the framework –Create classes that can apply your algorithm to arbitrary data –Classes should not depend on pipeline framework –Create unit tests (in tests subdir.) or examples (in examples subdir.) that exercise the class –testing can occur in C++, Python or both Inside the framework –Create python implementation of a Stage class –Create a policy file for configuring stage –Create a simple pipeline using policy files –Use the launchDC2.py script from the dc2pipe package to run provide identifying name for run (run ID) as input Process will likely change somewhat for DC3

LSST Applications Meeting February 19-20, Running on the cluster The LSST NCSA –up to date software stack –input data organized and ready for use –standard pipelines configured to write output in organized tree /lsst/DC2root contains directory for each run ID each run ID has subdirectory that names a pipeline that was run each pipeline contains... –input: the input data processed –output: the output image products –work: the pipeline's working directory, contains copy of all input policy file, log capturing stdout, stderr from master process. output database products –saved in MySQL database on lsst10 –database named after run ID

LSST Applications Meeting February 19-20, Dealing with bugs Bugs, issues and milestones are tracked using trac –life as a trac “ticket” Life Cycle: –ticket is created and assigned to a developer –developer creates copy of relevant package under the package's tickets subdirectory in svn. Example: ticket #350 for change to fw svn copy -m “addressing #350” $LSST_DC2/fw/trunk $LSST_DC2/fw/tickets/350 –changes are implemented, tested, checked into ticket branch –request code review: checked for compliance against coding standards –reviewed code merged into trunk some refinement of process is expected