Pipeline Execution Environment Laboratory of NeuroImaging UCLA
Motivation You have the algorithms You have the data Possibly located on different platforms, different machines You have the data Possibly located on different machines, have different format How do you get your work done?
What is the Pipeline? It is a data flow execution environment It is useful for… Any task where you can draw the steps in a flowchart Any task where you need to write instructions for someone We use it for neuroimaging analysis To be extended in the near future to bioinformatics and other computational biology subdomains
Why you might find it useful… Integrate your program as part of an analytic process Send a collaborator an analytic process Parallel batch processing with error handling
Sample useful pipeline… Download all ITK/VTK/Slicer sources Run CMake for configuration Execute makefile Downloaded all examples Compiled and test all examples Available on the NAMIC Wiki page
LONI/NAMIC collaboration Pipeline’s modular architecture make is easy to integrate external tools Users want to see the data as it is being processed Visualization component Users want to make use of new algorithms implemented Wrap command line executables as pipeline modules ITK Filter integration (vs. command line executables) Users want to utilize Pipeline resources Open specification for communications with Pipeline server
Pipeline features Platform independent Modular, extensible architecture Components Events Modules- user defined High grade encrypted communications Automatic & transparent resource management Shell script output Feature rich GUI
Pipeline features (cont.) Distributed computing Cluster computing Local clusters (e.g PVM) Intranet grid computing Grid engine Internet/overlay grid computing In development
A live example (bear with me while I try this)
Integrating additional functionality into the Pipeline As modules Typically programs that do some sort of processing Rigidly structured User oriented As components Typically programs that support the processing and/or improve the usability (e.g. visualization) More flexibility Developer oriented
Developing a component Extend AbstractComponent class Override processEvent(Event) method Place into extensions directory to be loaded at startup Visualization component is <200 lines
Roadmap 6/2005 12/2005 Beta to be released Data-centric GUI Fault tolerance Overlay grid implementation
Roadmap (cont.) 6/2006 Ongoing ITK/VTK integration SCIRun integration Provenance Data validation Information recovery Ongoing Ontologies Self learning system Automated analysis Natural language interface
LONI Resources CVS Mantis Wiki Compute grid Data grid
For more information… Michael Pan mjpan@loni.ucla.edu http://pipeline.loni.ucla.edu
Software license Permission is granted to use this software without charge for non-commercial research purposes only.
NIH software terms This is compliance with NIH requirements, i.e. The software should be freely available to biomedical researchers and educators in the non-profit sector, such as institutions of education, research institutes, and government laboratories. The terms of software availability should permit the commercialization of enhanced or customized versions of the software, or incorporation of the software or pieces of it into other software packages.