HDF5 format at European XFEL

Slides:



Advertisements
Similar presentations
Chapter 20 Oracle Secure Backup.
Advertisements

10. NLTS2 Documentation Overview. 1 Prerequisites Recommended modules to complete before viewing this module  1. Introduction to the NLTS2 Training Modules.
File Systems.
1 Databases in ALICE L.Betev LCG Database Deployment and Persistency Workshop Geneva, October 17, 2005.
Logical Data Elements Employee Record 2 Employee Record 1 Employee Record 3 Employee Record 4 NameSSSalaryNameSSSalaryNameSSSalaryNameSSSalary Data Personnel.
University of Southern California Enterprise Wide Information Systems ABAP/ 4 Programming Language Instructor: Richard W. Vawter.
Oct 31, 2000Database Management -- Fall R. Larson Database Management: Introduction to Terms and Concepts University of California, Berkeley School.
Chapter 12 File Management
CORE 2: Information systems and Databases HYPERTEXT/ HYPERMEDIA.
7. German CDISC User Group Meeting Define.xml Generator ODM Validator (define.xml validation) 2010/03/11 Dimitri Kutsenko Marianne Neumann.
Distributed Computing COEN 317 DC2: Naming, part 1.
GTECH 361 Lecture 13a Address Matching. Address Event Tables Any supported tabular format One field must specify an address The name of that field is.
Chapter Oracle Server An Oracle Server consists of an Oracle database (stored data, control and log files.) The Server will support SQL to define.
SAP ABAP DemoNawin's Training Acadamy1. Enterprise Wide Information Systems ABAP/ 4 Programming Language Mr. RG Nawin Krishna, Bsc(cs);Msc(psychology);MBA(HR);SAP(HCM/HR),
Distributed Computing COEN 317 DC2: Naming, part 1.
Firmware Storage : Technical Overview Copyright © Intel Corporation Intel Corporation Software and Services Group.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Information Systems & Databases 2.2) Organisation methods.
 Three-Schema Architecture Three-Schema Architecture  Internal Level Internal Level  Conceptual Level Conceptual Level  External Level External Level.
JANA and Raw Data David Lawrence, JLab Oct. 5, 2012.
+ Information Systems and Databases 2.2 Organisation.
Chapter 10 Designing the Files and Databases. SAD/CHAPTER 102 Learning Objectives Discuss the conversion from a logical data model to a physical database.
1 Guide to Oracle10G CHAPTER 7: Creating Database Reports 7.
Create Content Capture Content Review Content Edit Content Version Content Version Content Translate Content Translate Content Format Content Transform.
Data Recording Model at XFEL CRISP 2 nd Annual meeting March 18-19, 2013 Djelloul Boukhelef 1Djelloul Boukhelef - XFEL.
Linux+ Guide to Linux Certification, Third Edition
Online Databases General Functions Database Categories Tools Conditions & Configurations DB Technical Issues Activities of Interest & Need R. Jeff Porter.
C LUSTER OF R ESEARCH I NFRASTRUCTURES F OR S YNERGIES IN P HYSICS Prototype for High-Speed Data Acquisition at European XFEL CRISP 3 rd Annual meeting.
Howard Paul. Sequential Access Index Files and Data File Random Access.
Naming CSCI 6900/4900. Names & Naming System Names have unique importance –Resource sharing –Identifying entities –Location reference Name can be resolved.
Active Directory Domain Services (AD DS). Identity and Access (IDA) – An IDA infrastructure should: Store information about users, groups, computers and.
Data Resource Management Lecturer: Dr Mohammad Nabil Almunawar.
Database (Microsoft Access). Database A database is an organized collection of related data about a specific topic or purpose. Examples of databases include:
CT101: Computing Systems Introduction to Operating Systems.
Databases and Database User ch1 Define Database? A database is a collection of related data.1 By data, we mean known facts that can be recorded and that.
Day 28 File System.
Sample Registration - Introduction
Architecture Review 10/11/2004
WP18, High-speed data recording Krzysztof Wrona, European XFEL
XINFO – How to use XINFO in Development
From LSE-30: Observatory System Spec.
Process concept.
Module 11: File Structure
MPS - Archive Structure
7. German CDISC User Group Meeting Define
Databases Chapter 16.
Chapter 16 UML Class Diagrams.
Database Management:.
HDF5 Metadata and Page Buffering
Computing Infrastructure for DAQ, DM and SC
The System Catalog Describing the Data Copyright © Curt Hill
Central Document Library Quick Reference User Guide View User Guide
Installing the HI 6600 into the CompactLogix System.
Persistent identifiers in VI-SEEM
DATABASE LINK DISTRIBUTED DATABASE.
Big Data The huge amount of data being collected and stored about individuals, items, and activities and to the process of drawing useful information from.
Introduction to Database Systems
Database Linked List Representation
MANAGING DATA RESOURCES
Cloud computing mechanisms
Process Description and Control
Analysis models and design models
Advanced UNIX progamming
Open Science: the crucial importance of metadata
2018, Spring Pusan National University Ki-Joune Li
Database Design Hacettepe University
A QUICK START TO OPL IBM ILOG OPL V6.3 > Starting Kit >
CORBA Programming B.Ramamurthy Chapter 3 5/2/2019.
Publishing image services in ArcGIS
Research Data Dr Aoife Coffey, Research Data Coordinator
Presentation transcript:

HDF5 format at European XFEL Djelloul Boukhelef Data processing workshop Eu-XFEL, 03-04 Feb. 2016

The Big Picture The HDF5 format to be used at European XFEL

Data repository structure Reflects metadata models for raw and calibration data Data retention policy, different life time of data depends on data type Time period: facility cycle (campaign) [yyyyNN] year + 00 [yyyy00] for non facility cycle related data Instrument (SPB, SQS, FXE, HED, MID, SCS, TestBed1, …) Data types: raw/intermediate/resultant Raw data repository structure per instrument reflects metadata model BeamTime + Experiment + Run Calibration repository structure per instrument reflects calibration DB model Possible raw data repository structure: Facility BeamTime Experiment Id Run Type TimePeriod XFEL bt0110 exp01 r0023 raw 201601 Instrument FXE

Data categories and Naming convention /detlab_det_lpd/fem/1/femVoltage0 Control data Instrument data Domain Type Instance property FEM Ctrl device header images PC layer device descriptor detector raw Digitizer device processed

File structure RAW Data structure within file facility – description of facility, beamline and instrument user – description of user (PI, operator, …) control – slow data instrument – detectors data User data (example) address affiliation email facilityUserId faxNumber name role telephone Number Instrument data UniqueDataSourceId – type of the data source and its instance /instrument/detlab_det_lpd/2d/lpd/ DetectorDataFormat – generalized 2D detector data format /instrument/detlab_det_lpd/2d/lpd/var_pulse_data/data /instrument/detlab_det_lpd/2d/lpd/var_pulse_data/pulseId /instrument/detlab_det_lpd/2d/lpd/pulse_data/cellId

Physical HDF5 file structure Sequencing Possibility of storing a sequence of data records Unique sequence id at XFEL (trainId, pulseId) Pulse data (Descriptors, image) Train data (header, trailer, det. spec., monitor data) Idx train Train Id Pulse Id Image cellId 234 150 2 843 34 1 250 14 45 55 91 32 Train Id A B C D 234 1003 11.85 44.34 12.8 1 250 1005 12.16 Issue: what to do with removed images?

Internal HDF5 file structure Sequencing Follow relational database model? How far we can go? Train data (header, trailer, det. spec., control data) Train Id A B C D 234 1003 11.85 44.34 12.8 250 1005 12.16 Optimize for size and create indexes to navigate between different entities Pulse data (descriptors) Variable pulse data (Images) Idx Variable pulse data Idx train data Train Id Pulse Id cellId status -1 234 150 2 1 843 34 250 14 45 3 55 4 91 32 Idx pulse data Train Id Pulse Id Image 1 234 2 843 4 250 45 5 55 6 91

Data acquisition and aggregation at PC layer Metadata Catalog Register data files and metadata List of files Run-number PCL0 PCL1 PCL2 PCL3 AGG0 Reader /r0001/r0001.h5 /pcl0 /pcl1 /pcl2 /pcl3 /agg0 0 1 2 3 4 5 … 0 1 … Full images Dispatcher Match train-id to filename using Seq(T0,NC,NTF) Match filename to node/folder using inverted index Using symbolic links Using reverse formula (discovered from data) Using dictionary (filenamelocation) Modules-wise Cal0 Cal1 Cal2 Cal3

Correlation of instrument and control data Sequence number can be calculated assuming statically configured distribution of train data over multiple channels T Train Id T0 First train Id within the run Nc Number of data acquisition channels. N Number of train blocks per file c Channel id. Channels are ordered. i Record id for train based data within a table {i=0,…,N-1} m File sequence number within the channel s File sequence number within the run For the consecutive train ids: 𝑐 𝑇 = 𝑇− 𝑇 0 𝑚𝑜𝑑 𝑁 𝑐 𝑚 𝑇 = 𝑇− 𝑇 0 𝑁 𝑐 ×𝑁 𝑖 𝑇 = 𝑇− 𝑇 0 𝑚𝑜𝑑 𝑁 𝑐 ×𝑁 𝑁 𝑐  𝑠 𝑇 =𝑚 𝑇, 𝑁 𝑐 ,𝑁 × 𝑁 𝑐 + 𝑐 𝑇, 𝑁 𝑐 Can be generalized for periodic train patterns If random trains are delivered use tabular index (not likely) Example: T0 = 0, Nc=3, N=2 Where is train T = 10? c = 1, m = 1, i = 1, s = 4 T0 = 0, Nc=3, N=2 These parameters are stored as group attribute in the main file

Run data files (aka data group) Data files stores datasets Each may contain data from one or multiple data sources or properties Stub file Unique per data group or run Point to actual files that store datasets Indicate formula (e.g. HDF5 attributes to calculate the actual set of files that store a given dataset Dataset in different files groups might be index using different parameters /detlab_det_lpd/2d/lpd/images/ /detlab_det_lpd/fem/fem1/… /detlab_det_alas1/vac/… /detlab_det_alas1/motor/… R0001.h5 Stub file r0001-agg1-s0000.h5 AGG1 r0001-agg1-s0001.h5 r0001-agg1-s0002.h5 r0001-agg1-s0003.h5 r0001-agg0-s0000.h5 AGG0 r0001-agg0-s0001.h5 r0001-lpd-s0000.h5 r0001-lpd-s0001.h5 r0001-lpd-s0002.h5 r0001-lpd-s0003.h5 r0001-lpd-s0004.h5 r0001-lpd-s0005.h5 PCL0 PCL1 PCL2 PCL3

Full path to data Path to data contain four parts: Filesystem path (prefix): this is the same for one data group File location and filename Due to technical reasons, PCLayer nodea or aggregatora stores data from the same run int separate folders (e.g. local folders in HS setup) File location is configuration parameter: e.g. PCL0, PCL1, AGG0, … File names are calculated using Seq(T0,NC,NTF) Data-source id Experiment-wide unique identifier of Karabo data-source Hierarchical: location/data-source/property Data file contain the full data-source id Names of files contain (infix) the location/instance-id Example: r0001-lpd-s0001.h5, r0001-agg0-s0000.h5 Infix: configuration parameter that identifies the group of datasets Stored in the stub file along with the formula