Database structure for the European Integrated Tokamak Modelling Task Force F. Imbeaux On behalf of the Data Coordination Project of the Task Force
Gather multi-machine experimental data for code benchmarking –With detailed information about machine characteristics in a standard format Storage of simulation results : reproducibility and flexibility –Including detailed information about code parameters (reproducible simulation) –Suited to all possible kind of stand-alone or integrated simulation request Accessible from several programming languages –Access layer available in several languages with the same call request Can ultimately become a database for predictive ITER simulations Goals for the ITM-TF Database
Multi-machine experimental data : not only profiles of physical quantities, but characteristics of the various tokamak subsystems (magnetic coils, heating systems, diagnostics). Storage of simulation results : not only general transport simulations, but suited to all possible kinds of simulations (turbulence, MHD, equilibria, heating & current drive, …). Includes detailed information about code parameters. Aiming at a better consistency of simulations with original experimental data (less preliminar processing by black-box type codes), detailed bookkeeping of simulations higher quality of the benchmarking exercise. Comparison to the ITPA Profile Database
Full description of a tokamak : physics quantities + subsystems characteristics + diagnostics measurements Object oriented data structure : High degree of organisation : several subtrees corresponding to « Consistent Physical Objects » (avoid flat structures with long list of parameter names). Substructures correspond to Consistent Physical Object : –Subsystem : (e.g. a heating system, or a diagnostic) : will contain structured information on the hardware setup and the measured data by / related to this object. –Code results (e.g. a given plasma plasma equilibrium, or the various source terms and fast particle distribution function from an RF code) : will contain structured information on the code parameters and the physics results. Programming Language flexibility : use of recent software technologies : Database structure is defined using XML schemas How to do it ?
XML is a generic and standardised object-oriented language, quite convenient to describe structures XML files can also contain the actual data, but we do not use this possibility (ASCII format not convenient for large size numerical data) XML schemas are used to define the data structure (arborescence, type of the objects, …). User-friendly tools (XML editors) allow fast and easy design of the data structure. Small translations scripts “parsers” allow to translate the schema in other languages (HTML, Fortran, C, …) automated translation of the structure in any programming language. Use of XML schemas
The data is presently stored on an MDS+ server –Widely used data access system in the fusion community –Interfaces already exist with many languages –Convenient for storing multi-dimensional arrays, no problem with large data size –Not really object oriented (arrays of objects not possible), slow for large number of data calls The data storage system may evolve in the future The XML schemas defining the data structure are used to build the MDS+ model tree (automated script) Data storage
The Data Structure (a part of it) XML schema as displayed by XMLSpy®
Unique data structure for experimental data and all kinds of simulations Each entry of the database corresponds to a unique consistent physics dataset –Each new simulation or version of the experimental data creates a new entry –do not allow competing codes to write their results in the same entry –do not allow coexistence of different versions of experimental data in the same entry Use of the MDS+ shot number as a Generalised Pulse Number (GPN) containing information on : –The shot number –Whether the data is experimental or simulation results –The version of the data / reference number of the simulation Referencing system (draft proposal)
Guarantee data consistency within one entry each new simulation or version of the experimental data creates a new entry. Copying all data present in the structure would cost a lot of storage space. Only data that are modified are explicitly written in the « output » GPN The unmodified data can be tracked down using a signal referencing the « input » GPN. –This signal would be located at the top of the tree –Valid for all subtrees (subtrees of different origin not allowed, since it may violate data consistency) simple and efficient bookkeeping Referencing system (draft proposal)
Referencing system (draft proposal) Exp. Data Ref : none Ref : Simulation # Ref : Simulation # Ref : Simulation # Ref : Simulation # Exp. Data Ref : none Guarantees data consistency Referencing system recursive search, hidden from the user if he does not want to know about it
MDS+ server operational (Frascati, ENEA) XML philosophy and tools operational Work has started with IMP1 (equilibrium and linear MHD stability) : –Some experimental data has been put on the server using a temporary data structure (one ITER equilibrium done, JET and MAST data ongoing) –Equilibrium codes coupled to the database structure, first simulations reading the data from the database have been produced –First benchmarking exercise to be carried out for the EPS (equilibrium codes benchmarking) Status of the ITM Database
Update existing database structure in line with the data referencing system Set up the referencing system tools and provide generic access tools to the users Gather experimental data from the various machines Extend the database structure to the other « Integrated Modelling Projects » of the Task Force Evaluate possible alternatives to MDS+ for data access / storage Ongoing Work and perspectives