The ARCS Data Analysis Software Michael Aivazis California Institute of Technology.

Slides:



Advertisements
Similar presentations
ESA Data Integration Application Open Grid Services for Earth Observation Luigi Fusco, Pedro Gonçalves.
Advertisements

A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University
NGAS – The Next Generation Archive System Jens Knudstrup NGAS The Next Generation Archive System.
DIGIDOC A web based tool to Manage Documents. System Overview DigiDoc is a web-based customizable, integrated solution for Business Process Management.
MIT Lincoln Laboratory A Service-Oriented Approach to Application Development Robert Darneille & Gary Schorer WPI MQP Presentations ICS Group 10 October.
Tahir Nawaz Introduction to.NET Framework. .NET – What Is It? Software platform Language neutral In other words:.NET is not a language (Runtime and a.
PHP Reusing Code and Writing Functions.
Web Applications Development Using Coldbox Platform Eddie Johnston.
Summary Role of Software (1 slide) ARCS Software Architecture (4 slides) SNS -- Caltech Interactions (3 slides)
Agent-Based Services (ABS) for Network-Centric Communities of Interest This Phase II SBIR project has demonstrated efficient and effective agent-based.
Programming System development life cycle Life cycle of a program
ARCS Data Analysis Software An overview of the ARCS software management plan Michael Aivazis California Institute of Technology ARCS Baseline Review March.
Software Project Brent Fultz California Institute of Technology Issues Specifications Algorithms Web service model Plan for a plan.
Kashif Jalal CA-240 (072) Web Development Using ASP.NET CA – 240 Kashif Jalal Welcome to week – 2 of…
Robust Tools for Archiving and Preserving Digital Data Joseph JaJa, Mike Smorul, and Mike McGann Institute for Advanced Computer Studies Department of.
DANSE Central Services Michael Aivazis Caltech NSF Review May 23, 2008.
© , Michael Aivazis DANSE Software Issues Michael Aivazis California Institute of Technology DANSE Software Workshop September 3-8, 2003.
CASE Tools CIS 376 Bruce R. Maxim UM-Dearborn. Prerequisites to Software Tool Use Collection of useful tools that help in every step of building a product.
1 CS6320 – Why Servlets? L. Grewe 2 What is a Servlet? Servlets are Java programs that can be run dynamically from a Web Server Servlets are Java programs.
The ARCS Data Analysis Software Michael Aivazis California Institute of Technology.
© , Michael Aivazis DANSE Software Architecture Challenges and opportunities for the next generation of data analysis software Michael Aivazis.
An overview of the DANSE software architecture Michael Aivazis Caltech DANSE Kick-Off Meeting Pasadena Aug 15, 2006.
Pyre: a distributed component framework Michael Aivazis Caltech DANSE Developers Workshop January 22-23, 2007.
Slide 1 of 9 Presenting 24x7 Scheduler The art of computer automation Press PageDown key or click to advance.
Professional Informatics & Quality Assurance Software Lifecycle Manager „Tools that are more a help than a hindrance”
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall 1 Building Applications.
UNIT-V The MVC architecture and Struts Framework.
Architecture Of ASP.NET. What is ASP?  Server-side scripting technology.  Files containing HTML and scripting code.  Access via HTTP requests.  Scripting.
A Scalable Application Architecture for composing News Portals on the Internet Serpil TOK, Zeki BAYRAM. Eastern MediterraneanUniversity Famagusta Famagusta.
Joel Bapaga on Web Design Strategies Technologies Commercial Value.
GRAPPA Part of Active Notebook Science Portal project A “notebook” like GRAPPA consists of –Set of ordinary web pages, viewable from any browser –Editable.
Framework for Automated Builds Natalia Ratnikova CHEP’03.
C Copyright © 2009, Oracle. All rights reserved. Appendix C: Service-Oriented Architectures.
Metadata Creation with the Earth System Modeling Framework Ryan O’Kuinghttons – NESII/CIRES/NOAA Kathy Saint – NESII/CSG July 22, 2014.
An Introduction to Software Architecture
Magnetic Field Measurement System as Part of a Software Family Jerzy M. Nogiec Joe DiMarco Fermilab.
DANSE Central Services Michael Aivazis Caltech NSF Review May 31, 2007.
Presentation: SOAP/WS in a distributed object framework, Application Servers & AXIS SOAP.
Selected Topics in Software Engineering - Distributed Software Development.
“DECISION” PROJECT “DECISION” PROJECT INTEGRATION PLATFORM CORBA PROTOTYPE CAST J. BLACHON & NGUYEN G.T. INRIA Rhône-Alpes June 10th, 1999.
Application portlets within the PROGRESS HPC Portal Michał Kosiedowski
1 Geospatial and Business Intelligence Jean-Sébastien Turcotte Executive VP San Francisco - April 2007 Streamlining web mapping applications.
Presentation: SOAP/WS in a distributed object framework, Application Servers & AXIS SOAP.
MINER A Software The Goals Software being developed have to be portable maintainable over the expected lifetime of the experiment extensible accessible.
Active Server Pages (ASP), also known as Classic ASP or ASP Classic, was Microsoft's first server-side script engine for dynamically generated web pages.
9 Systems Analysis and Design in a Changing World, Fourth Edition.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
9 Systems Analysis and Design in a Changing World, Fourth Edition.
A Data Access Framework for ESMF Model Outputs Roland Schweitzer Steve Hankin Jonathan Callahan Kevin O’Brien Ansley Manke.
CASE (Computer-Aided Software Engineering) Tools Software that is used to support software process activities. Provides software process support by:- –
XmlBlackBox The presentation Alexander Crea June the 15st 2010 The presentation Alexander Crea June the 15st 2010
Mantid Stakeholder Review Nick Draper 01/11/2007.
Module: Software Engineering of Web Applications Chapter 2: Technologies 1.
August 2003 At A Glance The IRC is a platform independent, extensible, and adaptive framework that provides robust, interactive, and distributed control.
May08-21 Model-Based Software Development Kevin Korslund Daniel De Graaf Cory Kleinheksel Benjamin Miller Client – Rockwell Collins Faculty Advisor – Dr.
In Vivo Imaging Middleware and Applications RSNA 2007 Berkant Barla Cambazoglu The Ohio State University Department of Biomedical Informatics.
Web-based Front End for Kraken Jing Ai Jingfei Kong Yinghua Hu.
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
V7 Foundation Series Vignette Education Services.
Fermilab Scientific Computing Division Fermi National Accelerator Laboratory, Batavia, Illinois, USA. Off-the-Shelf Hardware and Software DAQ Performance.
Sung-Dong Kim, Dept. of Computer Engineering, Hansung University Java - Introduction.
Towards a High Performance Extensible Grid Architecture Klaus Krauter Muthucumaru Maheswaran {krauter,
reduction data treatment for ARCS
Pipeline Execution Environment
Joseph JaJa, Mike Smorul, and Sangchul Song
Maintaining software solutions
Distributed System Concepts and Architectures
Web Development Using ASP .NET
Module 01 ETICS Overview ETICS Online Tutorials
An Introduction to Software Architecture
Presentation transcript:

The ARCS Data Analysis Software Michael Aivazis California Institute of Technology

2 Fractals in software “Drip programming” –may generate aesthetically interesting flow charts –but it is not a desirable practice Advanced technology may actually complicate matters –complex data structures –objects –user interfaces –multiple platforms –distributed computing –high performance computing –security –… –the Grid Pollock’s “Autumn Rhythm” … or Michael’s framework?

3 Software Roadmap

4 Account for incident flux Remove background Convert from time to energy Correct for detector efficiency Bin into rings of constant scattering angle Convert from angle to momentum Subtract multi-phonon and multiple scattering Correct for absorption Data reductions C++ Python

5 Rebin Write HDF file Sq. rt errs errors 2 errors energies counts in energy Subtract background Read HDF file Rebin filename Read HDF file raw counts Spect. Info times Subtract background Rebin data errors 2 times Spect. Info num_e e_min e_max e_i t_min t_max From TOF to energy

6 Data flow for TOF to Energy conversion

7 Design directions Integrate analysis modules using scripting –Python Data flow paradigm –Well understood –Easy to implement and document Meta-data in XML –fully reproducible description of the data analysis pipeline –tag and archive data –record the version number of each module used in the analysis Enable distributed computing –XMLRPC, SOAP, … File formats: NeXus + XML meta-data –Reuse, reuse, reuse –Augment, contribute –HDF5!

8 Flexibility through the use of scripting Scripting enables us to –Organize the large number of parameters –Allow the analysis environment to discover new capabilities without the need for recompilation or relinking The python interpreter –The interpreter modern object oriented language robust, portable, mature, well supported, well documented easily extensible rapid application development –Support for parallel programming trivial embedding of the interpreter in an MPI compliant manner a python interpreter on each compute node MPI is fully integrated: bindings + OO layer –No measurable impact on either performance or scalability

9 Writing python bindings Given a “low level” routine, such as and a wrapper double arcs::add(double a, double b); PyObject * arcs_add(PyObject *, PyObject * args) { double a, b; int ok = PyArg_ParseTuple(args, “dd”, &a, &b); if (!ok) { return 0; } double result = arcs::add(a,b ); return Py_BuildValue(“d”, result); } c = arcs.add(2, 2) one can place the result of the routine in a python variable The general case is not much more complicated than this

10 Pyre Architecture component bindings engine component bindings library infrastructure service framework service component bindings engine abstract class specialization package The integration framework is a set of co-operating abstract services FORTRAN/C/C++ python

11 Pyre services journal –flexible control over the generation and delivery of simulation diagnostics from the compute nodes to the workstation monitor –a distributed service for low bandwidth, on the fly visualizations –currently used mostly for status monitoring and debugging timer weaver –a general source code generation facility –support for many languages FORTRAN, C, C++, python, HTML, XML from makefiles to optimized C++ sources –automatic web page creation for cgi scripts –supports user authentication passwords, soon user SSL certificates blade –a toolkit independent UI generator

12 Distributed services Workstation ServicesCompute nodes analysis journal monitor component1 component2

13 IRIS Explorer

14 Data flow paradigm appears natural –usability problems are focused on knowledge of what is possible –used by many commercial and open source tools Improvements –decouple UI from diagram logic –interface use OpenGL! collaborative interesting and relevant research –diagram logic thin, reusable component scripting multi-layered control –development can use existing solutions as a guide of what not to do –many modules already available in pyre –enable distributed programming Target for prototype: early 2004 Visual Programming Environment

15 Client Remote Server Database Server Beowulf Cluster An open standard for remote procedure calls Allows us to perform the computation – where the data lives –independently of the local computing capacity Security is an issue XMLRPC: Enabling distributed computing

16 Application capabilities –depend on the remote server –exported to the client Boxes represent –data sources –computational modules Wires represent –data flows –control Boxes have input and output ports where wires can be attached Prototype User Interface

17 Data Analysis Execution User hits “Run” Applet interprets wiring diagram as XMLRPC commands Server receives commands,arranges Python script, and data processing commences.

18 User interface prototypes - I

19 User interface prototypes - II

20 User interface prototypes - III

21 MATLAB If you must… Fully accessible from Python Support involves converting result of data analysis into MATLAB native arrays

22 Software engineering practices Version control –Provides a record of the evolution of the software –CVS: well supported, open source Configuration management –Uniform, portable build procedure –Automatic, regular builds of the entire software base –config: a system based on make –merlin: a python-based replacement under development Regression testing –Test cases that Exercise expected behavior Exercise fixes for known bugs Bug tracking –Organize the “to do” list, the feature requests … and the known defects –Gnats: well supported, open source

23 Design directions Integrate analysis modules using scripting –Python Data flow paradigm –Well understood –Easy to implement and document Meta-data in XML –fully reproducible description of the data analysis pipeline –tag and archive data –record the version number of each module used in the analysis Enable distributed computing –XMLRPC, SOAP, … File formats: NeXus + XML meta-data –Reuse, reuse, reuse –Augment, contribute –HDF5!