© Geodise Project, University of Southampton, Data Management in Geodise Zhuoan Jiao, Jasmin Wason & Marc Molinari { z.jiao, j.l.wason, m.molinari
© Geodise Project, University of Southampton, Providing Data Management Services for Engineering Engineering design and optimisation is a computationally intensive process. Large quantities of data may be generated at different locations with different characteristics. Engineering data is traditionally stored in flat files with little descriptive metadata provided by the file system. Our focus is on leveraging existing database tools not commonly used in engineering … …and making them accessible to users of the system.
© Geodise Project, University of Southampton, Tools and Services (1) File storage Applications can archive data sent over GridFTP in file systems for benefits of: Accessibility by a larger community (via authorisation) Storage capacity Additional metadata storage and query facilities Metadata management service The data can be stored with additional descriptive information detailing standard metadata (e.g. file format, description) and application domain specific metadata (e.g. grids, flux_order). An XML database is used as is it flexible enough to store nested, complex engineering data.
© Geodise Project, University of Southampton, Tools and Services (2) Query service Queries can be performed over the metadata database to help the user locate required data intuitively and efficiently. Authorisation service Access rights to data can be granted to an authenticated user based on information stored in an authorisation database. Location service Files are referenced with a unique handle. The location service provides access to a database of file locations mapped to handles.
© Geodise Project, University of Southampton, Data Management Implementation for MATLAB To increase the usability of file and metadata management services for Engineers we have implemented a MATLAB Toolkit for archiving, querying and retrieval of data to and from a Geodise repository.
© Geodise Project, University of Southampton, Geodise Database Toolkit for MATLAB – Archive gd_archive – Store a file with some metadata. gd_datagroup – A datagroup is a collection of related files that may be logically grouped together – this can also have associated metadata. Syntax: groupID = gd_datagroup(, [ ]) fileID = gd_archive(,[ ],[ ]) Examples: m.dimension = ‘2D’; m.component.gamma = 1.4; groupID = gd_datagroup(‘2D-LP turbine rotor job9’, m) meta.grids = 1 meta.flux_order = 2 fileID = gd_archive(‘input.dat’, meta, groupID) fileID = gd_archive(‘mesh_ns.grid.1.adf’, [], groupID) fileID = gd_archive(‘airfoil.msh’)
© Geodise Project, University of Southampton, XML Toolbox for MATLAB Marc Molinari – GEM project. xml_format(): Convert a MATLAB variable into an XML string. xml_parse(): Convert an XML string into a MATLAB variable. Example: >> A.b = ‘Hello World’; >> A.c.aa = [1 2; 3 4; 5 6]; >> X = xml_format(A) X = Hello World >> Y = xml_parse (X); >> str = Y.b str = Hello World
© Geodise Project, University of Southampton, Application of XML Toolbox for MATLAB Metadata set by user as a MATLAB structure. More natural format for MATLAB user. MATLAB structure Type-based XML Element names = variable types (e.g., ) Easier for conversion to and from structures. Type-based XML Name-based XML Element names = variable names (e.g., ) Easier for database query. MATLAB xml_format.m Type-based XML Name-based XML xml_parse.m Type-based XML Name-based XML XSLT type2name XSLT name2type
© Geodise Project, University of Southampton, Geodise Database Toolkit for MATLAB – Query gd_query Text based query expressed over MATLAB variables for use in MATLAB scripts. Converted to XPath to query XML database. XML Toolbox used to convert results into a list of metadata structures. Syntax: Results = gd_query(,[‘file’|‘datagroup’] ) Example 1: datagroup Results = gd_query(‘dimension = 2D’, ‘datagroup’) Results{1}.standard.files.fileID ans = input_dat_632d05be-ba26-479b-9607-d1845f3c78ff ans = mesh_ns_cs_adf_ce b7-4e25-a5f7-9a8adf8f21b6 Example 2: file r = gd_query(‘standard.userID = me & grids < 2’); r{1}.grids ans = 1
© Geodise Project, University of Southampton, Geodise Database Toolkit for MATLAB – Retrieve gd_retrieve Retrieve a file from the repository using unique handle. Asks Authorisation service whether user has permission to retrieve the file. Asks Location service where the file is. File transferred back to local file system using GridFTP. Syntax newFileLocation = gd_retrieve(, ) Examples gd_retrieve(‘input_dat_632d05be-ba26-479b-960…’, ‘E:\tmp’) ans = E:\tmp\input.dat gd_retrieve(‘input_dat_632d05be-ba26-479b-960…’, ‘E:\tmp\control42.dat’) ans = E:\tmp\control42.dat
© Geodise Project, University of Southampton, Authorisation Data Authorisation Globus certificate subject mapped to user ID. User sets access rights for the data they archive, so it can be queried and retrieved by others. Access rights stored in a relational database, accessed through Authorisation web service. Grant users and groups access rights by including their user ID or group ID in the metadata structure. Example m.grids = 1 m.access.users = {‘userA’,’userB’} m.access.groups = {‘groupC’} gd_archive (‘input.dat’, m)
© Geodise Project, University of Southampton, Future Work Archive structures as XML Cannot query inside archived files. Archive MATLAB structures as XML and query them. OGSA DAI integration Replace and enhance some of our functionality with that provided by OGSA DAI. E.g. Name mapping interface for authenticating Grid credentials to local ids (system and relational database ids). Change database system Xindice XML database – flexible and good for prototyping but not scalable and no security. Will choose a relational database with XML capabilities – Oracle, DB2, SQL Server.