Download presentation
Presentation is loading. Please wait.
Published bySuzanna Lewis Modified over 8 years ago
1
XRD data analysis software development
2
Outline Background Reasons for change Conversion challenges Status 2
3
X-Ray Diffraction (XRD) What is XRD experiment for? provides information on the relative positions of atoms in a crystal allows individual crystalline structures to be identified detects stains in the crystals as well What is XRD data? They are digital images collected by CCD Camera when synchrotron X-Ray beamline scanning on a sample area Image data sizes are large One 2D image collected for each scan point 8MB/image at CLS, 2084X2084 pixels/image Hundreds or even thousands of images could be collected in an experiment, depending on the size of the sample area to be scanned the step size in moving the sample during the scan more scan points provide more detailed information for the analysis 3
4
XRD data analysis Deals with large amount of image data Several procedures to each of the images Peak searching, identify regions of interest including threshold finding, blob searching and 2D curve fitting on each blob Indexing, identify possible/known crystalline structures Strain analysis, detect stains in material (?) Existing XRD data analysis software was written in IDL A proprietary scripting language Only carry out processes sequentially It is very time consuming ! e.g. normally, days are needed to complete the processing of a whole package of data 4
5
Reasons for change Needed for incorporation into Science Studio Aim of SS to provide remote users feedback during experimental runs; XRD analysis is one Existing code in scripting language and relied on sequential processes Existing software is written in IDL Peak searching is in IDL Indexing and strain analysis are in IDL calling externals in Fortran Needed to have versions for Streaming data analysis Stream processing -- taking a steam of input data, processing the data in a series of steps, steaming the results out, achieving real time or close to real time performance Needed to solve data storage problem Accumulating large amount raw image data for a long time could cause storage problem Actually, only those peaks in each image are the useful information for the analysis If those peaks can be found during data collection in real time, it might not be necessary to keep the raw images E.g. a typical raw image size is 8MB at CLS, while the peak data for the image is only about 10KB 5
6
How to make the change Our development target To port existing software for XRD data analysis to a Cell system at SHARCNET to achieve stream processing for XRD data analysis SHARCNET’s Cell system Including 8 Cell blades (QS22) -- 2 Cell processor chips on each Cell blade, i.e. total 16 Cell processors Cell processor -- a heterogeneous multi-core architecture Two types of cores optimized for different tasks 1 Power Processing Element (PPE) and 8 Synergistic Processing Elements(SPE) PPE -- Power PC architecture, acts as a controller to perform control-intensive tasks SPEs -- simpler cores devote more resources, perform computation intensive tasks Cell processor can be programmed to achieve streaming processing 6
7
Basic Cell Programming Model 7 XRD data analysis procedures Diffraction pattern OrientationStrain Resultant Maps
8
Challenges Cell only runs Linux and compiled code in C/C++ PPE and SPE execute different instruction sets Compile code for PPE and SPE use different compiler Existing software is written in IDL Peak searching is in IDL Indexing and strain analysis are in IDL calling externals in Fortran Challenges No algorithm provided: rewrite code in C using only the source code in IDL Programming on Cell is new and challenge because of Cell’s special architecture Need knowledge of programming at assembly level Limited function libraries available for Cell’s SPE 8
9
Development plan Rewrite code in C Validate the results produced by the C code Comparing with results from existing software Make the code run on Cell’s PPE Design for parallel processing on Cell Identify strategy for parallel computation Identify what should be executed on Cell’s SPEs Implement the design Validate the results produced by Cell Performance measurement 9
10
Progress Report Peak searching and Indexing procedures have been rewritten in C Results produced by the C code for both procedures have been validated at least with our limited data set Peak searching has been ported on Cell successfully Threshold finding and blob searching are carried out by PPE 2D Curve fitting (Lorentz fitting) for each blob is carried out by SPUs Typical number of blobs found on each image is about 100 ~200 depending on the threshold setting Some preliminary performance measurements have been done on Cell system for peak searching procedure 10
11
Some preliminary performance measurement (2) Peak searching on CLS XRD data: 8MB/image, 2084X2084 pixels/image, Desktop speed: 9.34 sec./image Total number of Images Total Number of blade(s) used (2 Cell chips per blade) Parallel images per blade (16 SPEs per blade) Number of SPEs per image Total time (sec.) Cell speed (Time / image in sec.) (TotalTime/TotalImages) Speed up (desktopSpeed /CellSpeed) 32 (32X1X1)1116107.293.352.78 32 (16X1X2)12865.652.054.55 32 (8X1X4)14461.491.924.86 32 (4X1X8)18256.821.775.26 32 (2X2X8)28231.480.989.49 32 (1X4X8)48217.240.5417.33 64 (1X8X8)88217.230.2734.69 11
12
More work to do.. Continue rewrite code in C for strain analysis on XRD data Port indexing and strain analysis procedures onto Cell Design programming model for Cell to achieve streaming processing for all procedures in XRD data analysis Implement the design Integrate the streaming processing on XRD data with Science Studio 12
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.