Presentation is loading. Please wait.

Presentation is loading. Please wait.

Report on PaNdata Work Package 8 Scalability Acknowledgements: Mark Basham, Nick Rees, Tobias Richter,Ulrik Pedersen, Heiner Billich et al (PSI/SLS), Frank.

Similar presentations


Presentation on theme: "Report on PaNdata Work Package 8 Scalability Acknowledgements: Mark Basham, Nick Rees, Tobias Richter,Ulrik Pedersen, Heiner Billich et al (PSI/SLS), Frank."— Presentation transcript:

1 Report on PaNdata Work Package 8 Scalability Acknowledgements: Mark Basham, Nick Rees, Tobias Richter,Ulrik Pedersen, Heiner Billich et al (PSI/SLS), Frank Schuenzen, Thorsten Kracht et al(DESY)

2 WP8 Scalability – The List of Deliverables and Timescales WP8 is led by Diamond with involvement principally from DESY Hamburg and the Swiss Light Source(SLS). The WP8 Package timeline: D8.1: Definition of pHDF5 capable Nexus implementation (June 2012) Report D8.1 delivered D8.2: Evaluation of Parallel file systems and MPI I/O implementations (June 2012) Report D8.2 delivered November 2012 – 1st year report D8.3: Implementation of parallel NEXUS (pNEXUS) and MPI I/O on parallel file systems (June 2013) - Prototype software delivery D8.5: Examination of Distributed parallel file system (June 2013) - Report D8.6: Demonstrate capabilities on selected applications (June 2013) – Demonstrator application D8.7: Evaluation of coupling of prototype to multi-core architectures (March 2014) - Report

3 Meetings and Events Oct 2011 – Oct 2012 DateMeeting and VenueBrief Focus 03-04 Nov 20111st PaNdata ODI meeting STFC Kick off 08 Dec 2011Harmonization PANData & CRISP meeting Discussion and planning 27-28 Feb 2012Joint HDRI & 2nd PANData 2 workshop – DESY, Hamburg Initial evaluation of file systems, high performance processors and NEXUS format and metadata. 30-31 May 2012Workshop at Swiss Light Source on high performance file systems and HDF5 implementations. Detailed evaluation of file systems (Lustre and GPFS) 13 Jun 20123rd PANData & CRISP Harmonisation Meeting - Zurich Airport Coordination with SLS and DESY 27-28 Jun 20123rd PANData ODI meeting - Trieste Presentation of results so far 24-26 Sep 2012NOBUGS 2012 - 4th PANData ODI meeting NEXUS international advisory committee, ICAT workshop, Scalability communications

4 D8.1 Definition of pHDF5 capable NEXUS implementation This work involved an analysis of requirements for the NEXUS library as specified in report D5.1, and the current state of deployment of NEXUS at Diamond and DESY. a)A significant number of beamlines writing NEXUS files including NCD, Spectroscopy and Tomography. New beamlines by default write NEXUS. b)The Metadata derived from the experience gained from a) have been submitted to the NEXUS International Advisory Committee (NIAC) including NEXUS file links. c)Analysis tools now available from both Diamond and DESY that read and write NEXUS format files. Examples: DAWN science (Diamond/ESRF/EMBL), DPDAK (DESY) d)A coordination of metadata understanding has been reached between NEXUS metadata and imgCIF/CBF (IUCR standard) enabling possibility of scientific applications to be modified to use either format. An example Diamond has 17 beamlines that write NEXUS files and 7 that write CBF (primarily crystallography). e)There is a continuing dialogue between the collaborating facilities and the Nexus International Advisory Committee to add the Metadata definitions found to be necessary from experience in the scientific disciplines in a) above.

5 Mapping between HDF5, Nexus and ImageCIF/CBF Data UnitHDF5NEXUSimgCBF/CBF Top level containerFile 2 nd level unitUnique Root Group - 3rd level unitGroup, Dataset or DataType NXentry Data Group Data Block 4th level unitGroup, Dataset or Named DataType Data Group or Data Field Category 5th level unitGroup, Dataset or Named DataType Data Group or Data Field Column 6th level unitGroup, Dataset or Named DataType Data Group or Data Field Value Agreed between NEXUS International Advisory Committee and Herbert Bernstein

6 SDA Image viewing NEXUS data elements Direct read from ICAT NEXUS metadata DAWN peak profile DAWN Integrating archived data, descriptive metadata, image views and analysis in a consistent and coordinated view for Non Crystalline Diffraction NEXUS enabled software DAWN Science Data Analysis suite.

7 NEXUS data elements SDA Integrating archived data, descriptive metadata, image views and analysis in a consistent and coordinated view for I22 – Non Crystalline Diffraction NEXUS enabled software DAWN Science Data Analysis suite. NEXUS data elements

8 Some current NEXUS file software library implementations The NEXUS file consortium produce a reference implementation for most platforms.NEXUS DESY provides an optimized library for writing NEXUS files directly. – libpniutils - http://sourceforge.net/projects/libpniutils/http://sourceforge.net/projects/libpniutils/ – libpninx - http://sourceforge.net/projects/libpninx/http://sourceforge.net/projects/libpninx/ – libpninx-python - http://sourceforge.net/projects/libpninxpythonhttp://sourceforge.net/projects/libpninxpython A single process HDF5 writer already available as part of the EPICS areaDetector distribution since version 1-7.EPICS areaDetector – It runs natively on Linux and Windows as an areaDetector plug-in – The performance is adequate for many low to medium data rate detectors. – NEXUS file I/O is implemented directly over HDF5 NEXUS file I/O

9 D8.2 Evaluation of Parallel file systems and MPI I/O implementations The D8.2 report has been completed. This work analysed the Data Acquisition systems at Diamond, DESY and PSI including some performance testing of data rates for GPFS and Lustre and analysis of requirements for reading and writing of pHDF5 was undertaken. a)The first part of the work performed was to investigate the processing of Macromolecular Crystallography and Tomography to establish the overall requirements for a software library to read and write pHDF5 files. b)Specifications for short term improvements have now been written to be reviewed by our collaborators that should lead to substantial gains particularly in the logistics and flexibility of the future software. c)Implementation of beta libraries for pHDF5 is now underway and is in test use at Diamond; these currently concentrate on synchronous input/output pending an improvement requested of the HDF5 basic developers. d)A review of the currently available high performance file systems was performed jointly by Diamond, SLS and DESY. The latter being very active in the area. e)The initial focus of these studies has been to establish a set of benchmarks to measure the performance of candidates. Significant effort by collaborating system administration teams has been devoted to find current solutions normally at above mentioned workshops. This will continue since this is an area that is continually evolving.

10 Current HDF5 Capabilities – Why use it? Advantages: HDF5 has been available for some time in the HPC community. In addition to the basic portable, hierarchical data format, it has rich feature set being able to define types, layout data on disk in a non-linear way, filter the data before reading and writing (to do compression or change the stored data type). It has a variety of file storage backends, including one that has the possibility to enable coordinated parallel I/O between machines and processes using the MPI library. Problems: Due to its history the code is non-reentrant, single threaded with a global lock to protect against multi-threaded use. Changing this is very difficult (estimated 6 man years of a good programmer) and so any parallelisation it currently offers must be process based (not thread based). They also use the MPI libraries for communication, which is also process based. Another complexity is that parallel writing and compression are currently mutually exclusive, since the parallel writing relies on the layout on disk being predictable.

11 Example High Performance Detectors SpecificationPCO4000PCO-EdgePilatus6MExcalibur Frame2D 2-3D Scan Size1D 1-3D1-2D Data Rate100MB/s700MB/s300-(1200MB/s)600MB/s StatusCompleteIn DevelopmentCompleteIn Development The support for these detectors provides important top level use cases for the work with parallel HDF5, particularly in the fields of Macromolecular Crystallography and Tomography: a)The file storage must be able to support at least these write speeds b)The technology must also enable read speeds sufficient for almost simultaneous data evaluation and analysis. c)There are currently detectors on the horizon such as Eiger and Pilatus6M at 100hz that promise the double the above.

12 pHDF5 writer short term requirements Implement Single Writer Multiple Reader (SWMR) is being developed at the moment with the requirement that readers can see and act on a data file being written somewhere else without any other message path between processes. This is important for sites that need to read and write in parallel Scalable to support future detectors with increasing data rates – pHDF5 run as MPI jobs so scale easily to run as multiple processes Focus on optimizing performance of the parallel file system – Optimize for underlying parallel file system performance – Use best practices in the use of HDF5 libraries – Optimize file writing software with respect to the file system (e.g automatic or user configurable data chunk sizes and boundaries – Store N-dimensional datasets and 2-3 Dimensions per frame + M scan dimensions Implement and use extendible datasets – Grow the dataset as new data arrives – Handle dropped data frames by filling in a blank frame Fix HDF5 bugs

13 Challenge of non Linux Platform Detectors It has been necessary to develop pHDF5 I/O routines for Windows platforms to support PCO4000 and PCOEdge as examples. – New standard Windows platform is now available Windows Server 2008 R2 64 bit Tool chain: MSVC 2010 – Lab results with Windows-Linux 10Gb network are promising Around 95% bandwidth on simple TCP data transfer between servers Up to 300MB/s HDF5 file writing (single process, samba share)

14 Example Experiment using direct mode pHDF5 at DLS NEXUS file links to simple optimized hdf5 files

15 Example metadata capture using Nexus API to HDF5

16 Example metadata capture using libpniutils from DESYlibpniutils

17 Parallel Filesystems compared Currently there are a number of candidates including a)GPFS (http://www-03.ibm.com/systems/software/gpfs) b)Lustre (http://wiki.whamcloud.com/display/PUB/Wiki+Front+Page) c)Fraunhofer fhgfs http://www.fhgfs.com/cms/ d)dCache (https://www.gridpp.ac.uk/wiki/DCache) Much recent work has been performed at Lawrence Livermore (LLNL) evaluating the GPFS and Lustre with the results are provided in the following hyperlinks: http://www.pdsi-scidac.org/events/PDSW10/resources/posters/parallelNASFSs.pdf https://e-reports-ext.llnl.gov/pdf/457620.pdf Also DESY - https://indico.cern.ch/materialDisplay.py?contribId=4&sessionId=4&materialId=slides&confI d=212228

18 Evaluation of parallel file systems Evaluation of the available file depends on a number of considerations including: – Basic hardware components and network interconnection – File system and low level strategy – Adaptation to file size (large files are often more efficient) – Implementation at the site concerned – Number of parallel I/O streams Important to define stable and well chosen benchmarks that can be portable to take to supplier sites. Obviously must fulfil both the basic requirements specified above, the possibility to expand to accommodate new demands and budget.

19 Benchmark System Benchmark system in design stage – Measure pHDF5 writer application performance – Measure parallel file system performance – Must be simple for sysadmin to re-run benchmarks to verify performance when making updates or tweaks to file system or cluster Simulate new detector systems – Produce simulated data at configurable resolution and rate – Determine requirements for computing resources when investigating or purchasing new detector systems – Verify file system performance related to a new detector system

20 Two Useful benchmarks 1.Basic Hardware test Name: mdtest Version: 1.8.3 Url: http://sourceforge.net/projects/mdtest/http://sourceforge.net/projects/mdtest/ Description: mdtest is an MPI-coordinated metadata benchmark test that performs open/stat/close operations on files and OS: Scientific Linux 6.2 MPI: openmpi 1.5.3 Platform: DESY-HPC 2011 Name: mdtest Version: 1.8.3 Url: http://sourceforge.net/projects/mdtest/http://sourceforge.net/projects/mdtest/ Description: mdtest is an MPI-coordinated metadata benchmark test that performs open/stat/close operations on files and OS: Scientific Linux 6.2 MPI: openmpi 1.5.3 Platform: DESY-HPC 2011 Pilatus 6M detector simulation Currently the most demanding detector at Synchrotrons and running EuroFEL lightsources Can operate at ~25-100Hz Data format either raw (tiff) or compressed (cbf) Data rates @25Hz: 1Gb/s for cbf, twice as much for tiff Multlple beamlines equipped with Pilatus 6M, so up to 4 parallel/concurrent streams Most systems start to suffer Execution : pssh –t 0 –H “host1 host2” pilatus.sh Pilatus 6M detector simulation Currently the most demanding detector at Synchrotrons and running EuroFEL lightsources Can operate at ~25-100Hz Data format either raw (tiff) or compressed (cbf) Data rates @25Hz: 1Gb/s for cbf, twice as much for tiff Multlple beamlines equipped with Pilatus 6M, so up to 4 parallel/concurrent streams Most systems start to suffer Execution : pssh –t 0 –H “host1 host2” pilatus.sh 2.Practical use test

21 Current Status Single process HDF5 writer already available – Part of the EPICS areaDetector distribution since version 1-7 – Runs natively on Linux and Windows as an areaDetector plug-in – Performance is reasonable for a number of detectors in use at DLS Parallel HDF5 file writer application in development – Initial design and some development done – Initial prototype is running but still lacking some features Integration of Windows based detector system – TCP transfer of N dimensional arrays with attached metadata – Currently using a prototype implementation of the new EPICS CA4 protocol

22 Potential Short Term Improvements 1.Allowing the user to create their own dynamically loadable filters to improve compression performance for special applications. (e.g. Shareable libaries). 2.Allowing the user to provide pre-compressed data chunks to be written by the library. A chunk is an n-dimensional subset of a dataset, and represents how the file is physically laid out on disk. (i.e. the layout is not necessarily linear). 3.Implementing a meta-data server to reduce the constraints on parallel I/O.  This would allow the disk layout to be more flexible, at the cost of having to access metadata through a server message. Currently metadata is kept in memory and only flushed to disk infrequently. This would allow parallel writers to write compressed chunks. 4.The other approach to compressed parallel file writing is to use sparse files and pre-allocate the data chunks at their un-compressed size, but only write the data at the compressed size. (would need sparse file support for file utilities)

23 Next year Each of the three partners active in WP8 now has significant resources working on the implementation of the prototypes that will form D8.3 and then contribute to future deliverables. It is now the intention to hold a further hands-on workshop in Q1 2013 to harmonize the work where practical and profit from best practise. Current suggested dates: Mar 4th 9:00 - Mar 5th 12:00 associated with HDRI-steering committee at DESY. In addition industrial concerns such as Dectris, who supply very high data rate detectors, are being included in discussions to gain any effort that they can provide and ensure as smooth a rollout of the technology as possible.

24 Thank You

25 Platforms – Linux (RHEL) is preferred platform with native Lustre client – Several detectors only vendor supported with Windows binary drivers PCO cameras in particular – EPICS areaDetector can run on Windows – Currently the supported Windows platforms Windows XP and Server 2003 32 bit platform is limiting Poor network performance on modern 10Gb network Out of date platform and tool-chain causing problems building and linking recent libraries 17 February 2012 25

26 pHDF5 Writer Requirements Configurable definition of the file structure – XML configuration description allow simple raw dataset or more complex NeXus compatible layout – Attach meta data, collected during detector scan Must work with the EPICS areaDetector framework – Most if not all DLS imaging detectors are controlled through EPICS areaDetector – However, coupling must be lose to allow integration with other control frameworks (Tango/Lima) Linux based MPI application – Must integrate well with Windows based detector systems – Run as network service

27 NCD Calibration and Reduction


Download ppt "Report on PaNdata Work Package 8 Scalability Acknowledgements: Mark Basham, Nick Rees, Tobias Richter,Ulrik Pedersen, Heiner Billich et al (PSI/SLS), Frank."

Similar presentations


Ads by Google