Download presentation
Presentation is loading. Please wait.
Published byCuthbert Stokes Modified over 9 years ago
1
Euratom – ENEA Association Commonalities and differences between MDSplus and HDF5 data systems G. Manduchi Consorzio RFX, Euratom-ENEA Association, corso Stati Uniti 4, 35127 Padova, Italy Seventh IAEA Technical Meeting on Control, Data Acquisition, and Remote Participation for Fusion Research 15 - 19 june 2009, Aix-en-Provence, France Abstract - MDSplus is a data acquisition system widely used in nuclear fusion experiments. It defines a file format for pulse files and provides a set of tools for data acquisition and management. The whole MDSplus package is used in several fusion experiments to set-up and supervise the data acquisition process. Other experiments use only the data management layer of MDSplus to provide a common format for data exchange between plasma fusion laboratories. HDF5 is a file format and a data access library used by a larger community, mainly outside fusion. HDF5 is used, for example, in earth science research, defense applications and weather services. HDF5 allows managing large and complex data sets and provides a common data format among heterogeneous applications. Both MDSplus and HDF5 support a rich set of data types and a hierarchical data organization, as well as multi-language data access libraries. There are however several significant differences between the two system architectures, making each system better suited in different application contexts. The paper provides a brief overview of the data architectures of MDSplus and HDF5 and analyzes more in detail the peculiar aspects of the two systems. CONCLUSIONS HDF5 provides a very complete and complex data interface defining some hundreds of data access routines in its library, while MDSplus defines approximately 40 data classes with very few generic methods. In practice, there is no need to know the details of data classes and users deal with three MDSplus classes: Tree, TreeNode and Data. This is a consequence of the underlying philosophy in data management: strongly typed, C-like, in HDF5 and loosely typed, Python-like, in MDSplus. The two intended targets are different: data storage management for data sets produced by analysis programs in HDF5, data acquisition in MDSplus. Requirements may overlap when MDSplus databases are later used for off-line analysis. In this case, the applicability of either system depends on the specific requirements. For example, if an analysis program deals with huge datasets during its execution, possibly using a parallel file system, and such data are not required to be shared in real time with other applications, HDF5 is preferable. On the other side, the flexibility of MDSplus expressions and the native support for remote data access make MDSplus preferable for sharing common experimental databases among laboratories. Not surprisingly, this is the current “de facto” situation in the usage of the two systems in the plasma research community. Hierarchical Data Organization MDSplus A data item is referred to as a tree node. All data descend from a root node named TOP. The path defines nodes and members. It is possible to associate a unique tag with each data item. HDF5 A data item is referred to as a dataset. The hierarchical data organization is similar to the UNIX file system. Directories are represented by groups. A single data item can be declared as belonging to different groups via the link operation. Data Types MDSplus Supports commonly used scalars and multidimensional arrays. In addition it defines data types which are specific to data acquisition, such as signals. Every data item in MDSplus is considered an expression. At any time expressions can be evaluated, possibly involving the activation on the flight of user supplied code. A new data item is created by instantiating the corresponding data class. Complex expressions are represented by a hierarchy of linked data instances. HDF5 Provides support for a variety of data types, ranging from scalars to multidimensional arrays. It provides a fine tuning of many properties such as size, precision, padding and byte order. It is possible to define references to other data items, but no generic expressions are allowed. Every dataset defines a datatype and a dataspace component. The datatype describes the type of the atomic data. The dataspace describes the number of dimensions and for each dimension the maximum and current size. Writing an integer 3x2 array: HDF5 Extending Arrays MDSplus Appending new chunks of data is achieved via segments. A segment is characterized by Start and end time, Dimension description, Actual data array. Start and end time are used for the efficient retrieval of portions of large stored signals. HDF5 Extending an array is achieved by defining an hyperslab. Hyperslab define the mapping between the data in memory and data stored on disk. Hyperslabs are also used to select portions of arrays to be read in memory. It is possible to define arbitrary mappings referred to all the dimensions of the array. Physical I/O MDSplus In its simplest configuration, the MDSplus physical storage is represented by a triplet of files. MDSplus allows portions of the main tree (subtrees) being mapped onto different file triplets. Even if there is no API for the configuration of file parameters, there is full control in the physical destination of data, i.e. of which data items get stored in each file triplet. A TCP/IP based protocol has been defined and it is implemented by the lower levels of the data access interface. MDSplus provides two main remote data access configurations: the “Thin Client” and the “Thick Client”. In the Thin Client configuration, the client requests the evaluation of expressions to the remote data server. In the Thick Client configuration most operations are performed at the client side, except for the actual data readout which is handled by a remote data server. Supports concurrent read and write access. HDF5 In HDF5 the underlying physical storage can be either: A single file in a standard file system; Multiple files in a standard file system; Multiple files in a parallel file system; A Block of memory within the application memory space. HDF5 allows selecting the low level file driver as well as configuring it via a property list. Using property lists is possible to achieve a fine tuning of the physical storage parameters, but a detailed knowledge of the underlying system is required. A recent HDF5 project aims at integrating the Storage Resource Broker (SRB) in the HDF5 physical I/O layer SRB is a data management middleware that provides users with a global virtual file system for accessing remote heterogeneous storage resources across the network. Does not support concurrent write access. Backward Compatibility MDSplus MDSplus provides full backward compatibility both in data format and in data access interface. The new Object Oriented interface of MDSplus still allows reading old pulse files. The previous interface will be maintained in the next releases. HDF5 HDF5 ensures backward compatibility for data formats, not for library interfaces. Compiler flags for backward compatibility are provided, however compatibility flags are maintained only for the releases associated with the immediately following minor version number. User code therefore has to be changed prior a newer release. #include void main(int argc, char *argv[]) { //The data array int array[6] = {1,2,3,4,5,6}; //The dimension array int dims[2] = {3,2}; //Open pulse file MY_DATA, shot 100 //This operation is carried out by the //instantiation of a Tree object Tree *tree = new Tree("MY_DATA", 100); //Get the data item in the database. Tree items //are represented by a TreeNode instance TreeNode *node = tree->getNode("group1:data1"); //Create an initialized 3x2 integer array Int32Array *data = new Int32Array(array, 2, dims); //Write the array in the database node->putData(data); //Free dynamically allocated objects delete data; delete node; delete tree; } Writing an integer 3x2 array: MDSplus HDF5 implements a high-level API with C, C++, Fortran 90, and Java interfaces. MDSplus implements C, Java, IDL, MATLAB and Fortran interfaces, and a new Object Oriented interface has been developed providing an uniform Object representation of data in C++, Java and Python. The difference in data management philosophy between HDF5 and MDSplus is similar to that between a typed language, such as C, and a non typed one, such as Python. In C a data container, i.e. a variable, has a type and a dimension. Likewise in HDF5 a datatype and a dataspace are always associated with a dataset describing a data type. This is not true in MDSplus which follows a Python-like approach: data items are generic objects and their evaluation can be deferred until really needed, e.g. for transferring their content into a program variable. #include main(int argc, char *argv[]) { hid_t file; //the file identifier // other used HFDF5 identifiers hid_t dataset, datatype, dataspace, group; // Row first 3x2 array data int data[6] = {1,2,3,4,5,6}; // Dimensions hsize_t dimsf[2]; int status; // Open a HDF5 file named MY_DATA.hd5 for read/write using the default property list file = H5Fopen("MY_DATA", H5F_ACC_RDWR, H5P_DEFAULT); // Create a dataspace: for 2x3 array // and create the dataspace for a fixed-size dataset. dimsf[0] = 3; dimsf[1] = 2; dataspace = H5Screate_simple(2, dimsf, NULL); //Define a datatype for the data in the dataset. datatype = H5Tcopy(H5T_NATIVE_INT); //Create a new dataset within the file using the // defined dataspace and datatype and default //dataset creation properties. dataset = H5Dcreate(file, "/group1/data1", datatype, dataspace, H5P_DEFAULT, H5P_DEFAULT, H5P_DEFAULT); // write data in the dataset status = H5Dwrite(dataset, H5T_NATIVE_FLOAT, H5S_ALL, H5S_ALL, H5P_DEFAULT, data); // close handles H5Sclose(dataspace); H5Tclose(datatype); H5Dclose(dataset); H5Fclose(file); }
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.