DANSE Distributed Data Analysis for Neutron Scattering Experiments Michael M. McKerns, Michael A.G. Aivazis, Tim M. Kelley, June Kim, and Brent Fultz Materials Science and Applied Physics California Institute of Technology
Abstract The DANSE system will merge the various computational tasks of neutron scattering into a unified, component based run-time environment. Standard components will implement data analysis, visualization, modeling, and instrument simulation for all areas of neutron scattering. A core technology of DANSE is an open source framework that supports the components and mediates their interactions. Within the DANSE environment, users will be able to mix and match different software components without compilation, and execute calculations seamlessly across distributed resources. DANSE will provide tools to help instrument scientists and expert users migrate their existing routines (written in any number of languages) to components, and an interface that will allow new and casual users to access a stock set of standard analysis applications or configure their own new computing procedures for novel experiments. The modular structure of DANSE parallels the steps of data analysis performed by scientists, thus making it a natural environment for creating flexible computing procedures. DANSE will lower barriers to sharing software, and extend the experimentalist’s toolkit with capabilities of analysis and interpretation such as high-performance simulations (band structure, molecular dynamics, etc.), co-analysis of data from multiple experiments, and real-time feedback for experimental control.
An introduction to DANSE DANSE is a community organizing project with the potential to provide a unique facility/user interaction: –a single environment for data analysis, visualization, modeling, and instrument simulation for all areas of neutron scattering –a collaborative effort between software professionals, neutron scattering scientists, and facilities –provides tools for remote collaboration and co-analysis –support from members of the international community and from the directors of SNS, IPNS, HFIR/CNS, Lujan Center, NCNR –potentially the software environment for all instruments at the SNS DANSE provides a unified component-based runtime environment for computational neutron scattering: –open-source framework provides seamless use of distributed and high- performance resources –a flexible, extensible, dynamic, interactive, cross-platform, cross-compiler, object-oriented software architecture –integration of legacy codes and community-standard software –well suited for the development of new science, standard stock computation, quality and plausibility assessment, and as a educational tool
Tools for each level of user Beginning student –user of prepackaged tools and documentation as a learning environment Visiting scientist –user of prepackaged and specialized analysis tools Instrument scientist –author of prepackaged specialized tools Analysis expert –author of analysis, modeling or simulation software Established researcher –collaboration coordinator, designer of new analysis procedures Software integrator –responsible for extending software with new technology Framework maintainer –responsible for maintaining and extending the DANSE infrastructure
Encourages Better Science More science from experiment execution –Single crystals on chopper spectrometers –Feedback control for engineering diffraction –Alter experiment depending on results: visualization of science trends, not data trends e.g., see structure, not I(Q) on-demand modeling, ab-initio calculations reality checks against scattering theory Better science by planning experiments –Plausibility tests before submitting a proposal –Assessment of sample plus instrument –Contingency planning using prior simulations –Assessments of trends in previous data
Facilitates New Science New science with better data analysis –FEM calculations of strains in microstructures –Monte-Carlo inversions of S(Q,E) to obtain parameters of structure and dynamics models –Model refinements with multiple data sets. New science by leveraging theory –VASP, CASTEP, ABINIT are commodities today; use them for assessing structures and dynamics. –Micromechanics – correlations of local strains –Phase diagrams – thermodynamic functions –Ab-initio calculations of spin interactions –Soft matter structure – atomic force fields guided by diffraction
Simulation and plausibility testing on virtual instruments
Ni Pd Pt
Phonon Partition Function fcc Ni for E,g_E in spectrum: Z *= one_osc(E,T) ** g_E
Built on the Pyre integration architecture Pyre is a robust, stable foundation –75,000 lines of Python; 30,000 lines of C++ –multiply leveraged DoE ASCI project Pyre is a software architecture: –a specification of the organization of the software system –a description of the crucial structural elements and their interfaces –a specification for the possible collaborations of these elements –a strategy for the composition of structural and behavioral elements Pyre is multi-layered –flexibility –complexity management –robustness under evolutionary pressures Pyre is a component framework application-general application-specific framework computational engines
Component architecture component bindings library extension component bindings custom code core facility framework facility component bindings custom code service requirement implementation package The integration framework is a set of co-operating abstract services FORTRAN/C/C++ python
ANL LANL NIST ISIS java F77 IDL Matlab ISAW GSAS DAVE Mslice … A Path for Software of Today Finer-Grained Interoperable Components
NeXusReader Selector Bckgrnd Selector Energy NeXusWriter times instrument info raw counts filename time interval energy bins filename Component dataflow Granularity allows reusability of object-oriented components
Component Templates Standard Data Streams Python objects Standard communication protocol between components that can reside anywhere Data Flow Paradigm histograms tables meta-data Code Place Name Place Initiate, terminate, error properties
'''Multiphonon.py Calculates the multiphonon scattering, using a phonon DOS... ''' from mpFunctions import * def run(All_Inputs_List): """Multiphonon.py main loop...""" # check user inputs for validity, get data from disk checkUserInput(input_arglist) setup_arglist = setupRun(run_arglist) # 1-phonon quantities, multiphonon terms single_arglist = onePhonon(arglist) multi_arglist = multiPhonon(N_arglist) # prepare results for output, send to disk, etc. output_arglist = prepareResults(result_arglist) outputResults(output_arglist) return if __name__ == '__main__': """Run main loop if launched standalone.""" from mpUserInput import * run(All_Inputs_List) Encapsulation Abstraction Launched standalone or Inside Analysis Procedure
Component implementation strategy Write engine –custom code, third party libraries –modularize by providing explicit support for life cycle management –implement handling of exceptional events Construct python bindings –select entry points to expose Integrate into framework –construct object oriented veneer –extend and leverage framework services Cast as a component –provide object that implements component interface –describe user configurable parameters –provide meta data that specify the IO port characteristics –code custom conversions from standard data streams into lower level data structures
Flexibility through the use of scripting Scripting enables us to –organize large numbers of user tunable parameters –allow the runtime environment to discover new capabilities without the need for recompilation or relinking –compose computations at runtime The interpretive environment: –Python is a modern object oriented language robust, portable, mature, well supported, well documented easily extensible rapid application development –has been extended to support for parallel programming –has no measurable impact on either performance or scalability
Encapsulating critical technologies Extensibility –new algorithms and analysis engines –technologies and infrastructure High end –visualization –easy access to large data sets single runs, backgrounds, archived data metadata –distributed computing –parallel computing Flexibility: –interactivity: web, GUI, scripts –must be able to do almost everything on a laptop
Data Analysis as a Distributed Service Data analysis is a service controlled by the user User’s laptop issues commands and receives results Computation is arranged by your client software
Support for distributed computing We are in the process of migrating the existing support for distributed processing into gsl, a new package that completely encapsulates the middleware Provide both user space and grid-enabled solution User space: –ssh, scp –pyre service factories and component management Web services –pyglobus Advanced features –dynamic discovery for optimized deployment –reservation system for computational resources
Fultz/Aivazis Billinge Strengthening the neutron community Ustundag Kienzle Butler Fultz/Trouw
3 SNS instruments on-line in IDT instruments PROTONS Engineering Diffractometer – BL 9 Areas for User and Instrument Support SANS – BL 6 Cold Neutron Chopper Spectrometer – BL 5 Magnetism – BL 4a Liquids – BL 4b Reflectometers High Pressure Diffractometer – BL 3 Backscattering Spectrometer – BL 2 Disordered Materials Diffractometer – BL 1b ARCS Spectrometer – BL 18 High Resolution Chopper Spectrometer – BL 17 Single Crystal Diffractometer – BL 12 Fundamental Physics Beamline – BL 13 Powder Diffractometer – BL 11a Powder Diffractometer – BL 11a Software needs to be on-line to support BL 2, 4a, 4b, 5, 18
Crystal modelC1XX, C1XY… Calculate force constant matrix Phi_{alpha beta}(0 l_ kappa kappa_) Sweep reciprocal space Calculate dynamical Matrix D(q) Diagonalize D(q) Update DOS histogram Output DOS Initial guess Compare with experimental DOS Powell minimize n y Ouput force constants End? RMS Converged ?