Presentation is loading. Please wait.

Presentation is loading. Please wait.

HDF5 format at European XFEL

Similar presentations


Presentation on theme: "HDF5 format at European XFEL"— Presentation transcript:

1 HDF5 format at European XFEL
Djelloul Boukhelef Data processing workshop Eu-XFEL, Feb. 2016

2 The Big Picture The HDF5 format to be used at European XFEL

3 Data repository structure
Reflects metadata models for raw and calibration data Data retention policy, different life time of data depends on data type Time period: facility cycle (campaign) [yyyyNN] year + 00 [yyyy00] for non facility cycle related data Instrument (SPB, SQS, FXE, HED, MID, SCS, TestBed1, …) Data types: raw/intermediate/resultant Raw data repository structure per instrument reflects metadata model BeamTime + Experiment + Run Calibration repository structure per instrument reflects calibration DB model Possible raw data repository structure: Facility BeamTime Experiment Id Run Type TimePeriod XFEL bt0110 exp01 r0023 raw 201601 Instrument FXE

4 Data categories and Naming convention
/detlab_det_lpd/fem/1/femVoltage0 Control data Instrument data Domain Type Instance property FEM Ctrl device header images PC layer device descriptor detector raw Digitizer device processed

5 File structure RAW Data structure within file
facility – description of facility, beamline and instrument user – description of user (PI, operator, …) control – slow data instrument – detectors data User data (example) address affiliation facilityUserId faxNumber name role telephone Number Instrument data UniqueDataSourceId – type of the data source and its instance /instrument/detlab_det_lpd/2d/lpd/ DetectorDataFormat – generalized 2D detector data format /instrument/detlab_det_lpd/2d/lpd/var_pulse_data/data /instrument/detlab_det_lpd/2d/lpd/var_pulse_data/pulseId /instrument/detlab_det_lpd/2d/lpd/pulse_data/cellId

6 Physical HDF5 file structure
Sequencing Possibility of storing a sequence of data records Unique sequence id at XFEL (trainId, pulseId) Pulse data (Descriptors, image) Train data (header, trailer, det. spec., monitor data) Idx train Train Id Pulse Id Image cellId 234 150 2 843 34 1 250 14 45 55 91 32 Train Id A B C D 234 1003 11.85 44.34 12.8 1 250 1005 12.16 Issue: what to do with removed images?

7 Internal HDF5 file structure
Sequencing Follow relational database model? How far we can go? Train data (header, trailer, det. spec., control data) Train Id A B C D 234 1003 11.85 44.34 12.8 250 1005 12.16 Optimize for size and create indexes to navigate between different entities Pulse data (descriptors) Variable pulse data (Images) Idx Variable pulse data Idx train data Train Id Pulse Id cellId status -1 234 150 2 1 843 34 250 14 45 3 55 4 91 32 Idx pulse data Train Id Pulse Id Image 1 234 2 843 4 250 45 5 55 6 91

8 Data acquisition and aggregation at PC layer
Metadata Catalog Register data files and metadata List of files Run-number PCL0 PCL1 PCL2 PCL3 AGG0 Reader /r0001/r0001.h5 /pcl0 /pcl1 /pcl2 /pcl3 /agg0 0 1 … Full images Dispatcher Match train-id to filename using Seq(T0,NC,NTF) Match filename to node/folder using inverted index Using symbolic links Using reverse formula (discovered from data) Using dictionary (filenamelocation) Modules-wise Cal0 Cal1 Cal2 Cal3

9 Correlation of instrument and control data
Sequence number can be calculated assuming statically configured distribution of train data over multiple channels T Train Id T0 First train Id within the run Nc Number of data acquisition channels. N Number of train blocks per file c Channel id. Channels are ordered. i Record id for train based data within a table {i=0,…,N-1} m File sequence number within the channel s File sequence number within the run For the consecutive train ids: 𝑐 𝑇 = 𝑇− 𝑇 0 𝑚𝑜𝑑 𝑁 𝑐 𝑚 𝑇 = 𝑇− 𝑇 0 𝑁 𝑐 ×𝑁 𝑖 𝑇 = 𝑇− 𝑇 0 𝑚𝑜𝑑 𝑁 𝑐 ×𝑁 𝑁 𝑐  𝑠 𝑇 =𝑚 𝑇, 𝑁 𝑐 ,𝑁 × 𝑁 𝑐 + 𝑐 𝑇, 𝑁 𝑐 Can be generalized for periodic train patterns If random trains are delivered use tabular index (not likely) Example: T0 = 0, Nc=3, N=2 Where is train T = 10? c = 1, m = 1, i = 1, s = 4 T0 = 0, Nc=3, N=2 These parameters are stored as group attribute in the main file

10 Run data files (aka data group)
Data files stores datasets Each may contain data from one or multiple data sources or properties Stub file Unique per data group or run Point to actual files that store datasets Indicate formula (e.g. HDF5 attributes to calculate the actual set of files that store a given dataset Dataset in different files groups might be index using different parameters /detlab_det_lpd/2d/lpd/images/ /detlab_det_lpd/fem/fem1/… /detlab_det_alas1/vac/… /detlab_det_alas1/motor/… R0001.h5 Stub file r0001-agg1-s0000.h5 AGG1 r0001-agg1-s0001.h5 r0001-agg1-s0002.h5 r0001-agg1-s0003.h5 r0001-agg0-s0000.h5 AGG0 r0001-agg0-s0001.h5 r0001-lpd-s0000.h5 r0001-lpd-s0001.h5 r0001-lpd-s0002.h5 r0001-lpd-s0003.h5 r0001-lpd-s0004.h5 r0001-lpd-s0005.h5 PCL0 PCL1 PCL2 PCL3

11 Full path to data Path to data contain four parts:
Filesystem path (prefix): this is the same for one data group File location and filename Due to technical reasons, PCLayer nodea or aggregatora stores data from the same run int separate folders (e.g. local folders in HS setup) File location is configuration parameter: e.g. PCL0, PCL1, AGG0, … File names are calculated using Seq(T0,NC,NTF) Data-source id Experiment-wide unique identifier of Karabo data-source Hierarchical: location/data-source/property Data file contain the full data-source id Names of files contain (infix) the location/instance-id Example: r0001-lpd-s0001.h5, r0001-agg0-s0000.h5 Infix: configuration parameter that identifies the group of datasets Stored in the stub file along with the formula

12


Download ppt "HDF5 format at European XFEL"

Similar presentations


Ads by Google