© Geodise Project, University of Southampton, Data Management in Geodise Zhuoan Jiao, Jasmin Wason and Marc Molinari January 2003, Edinburgh Engineering design and optimisation is a computationally intensive process where data may be generated at different locations with different characteristics. Data is traditionally stored in flat files with little descriptive metadata provided by the file system. Our focus is on providing data management by leveraging existing database tools that are not commonly used in engineering and making them accessible to users of the system. The main objectives are to provide: A data management service Store and retrieve data files securely from a repository using GridFTP. Technical and application specific metadata added so data is easier to search for and locate. Metadata management services Web services provide API access to metadata in databases. Use both relational and XML databases. A familiar interface for engineers Work with functions and variables rather than underlying XML, SOAP, SQL, XPath, etc.
© Geodise Project, University of Southampton, Storage service Allows applications to archive data sent over GridFTP in file systems curated by Geodise for benefits of: accessibility by a larger community (via authorisation), storage capacity, and a uniform query interface. Example: Archive data: >> fileID = gd_archive('C:\input.dat'); Retrieve data: >> gd_retrieve(fileID, 'E:\tmp' ) ans = E:\tmp\input.dat Metadata service The data can be stored with additional descriptive information detailing technical characteristics (e.g. location, format), ownership, and application domain specific metadata. Example: Define metadata and archive file: >> m.grids = 1; >> m.turb_model = 'sa'; >> fileID = gd_archive('C:\input.dat', m); Query service Query over the metadata database can help to locate the needed data intuitively and efficiently. Example: >> r = gd_query('standard.userID = me & grids < 2'); To access a value from first result in cell array r: >> r{1}.turb_model ans = sa Authorisation service Access rights to data can be granted to an authenticated user based on information stored in the authorisation database. Example: >> m.grids = 1; >> m.access.users = {'userA', 'userB'}; >> m.access.groups = {'groupC'}; >> fileID = gd_archive ('C:\input.dat', m); Geodise Database Toolkit
© Geodise Project, University of Southampton, Database Application of XML Toolbox for MATLAB Name-based XML (easy for query) X = <file_metadata type="struct" idx="0" fields="grids turb_model"> 1 sa Type-based XML (easy for converting back to Matlab) X = <struct xmlns=" idx="0“ fields="grids turb_model"> 1 sa Define Matlab variables: >> meta.grids = 1 >> meta.turb_model = ‘sa’ X = xml_format(meta) meta = xml_parse(X) XSLT: type2name XSLT: name2type The following functions are responsible for converting a Matlab variable to and from an XML string: xml_format() - Convert a Matlab variable into an XML string. xml_parse() - Convert an XML string into a Matlab variable.
© Geodise Project, University of Southampton, Authorisation Service Location Service Java Metadata Archive & Query Services Data Management Implementation To increase the usability of file and metadata management services for Engineers we have implemented a MATLAB Toolkit for archiving, querying and retrieval of data to and from a Geodise repository. Matlab Functions Java clients Globus Server Geodise Database Toolkit Metadata Database Browser Client Grid Authorisation Database CoG Apache SOAP Location Database Refers to SOAP GridFTP
© Geodise Project, University of Southampton, Query Example MATLAB commands to retrieve files. Copy and paste
© Geodise Project, University of Southampton, Future Work Grid Data Management Replace and enhance some of our functionality with that provided by OGSA DAI for Grid Database Services. E.g. Name mapping interface for authenticating Grid credentials to local ids. Automatic collection of data and metadata from a higher level engineering problem setup GUI. Manage Matlab Structures Some data may take the form of Matlab structures rather than files. These can be archived as XML in the repository and then queried. The structures can also be retrieved back into the Matlab workspace. Categorisation of metadata based on XML Schemas Group metadata with same XML Schema in the database. Users are not expected to write XML Schemas. Generate simple XML Schema from metadata structure if one does not already exist to describe it. Could help future integration of ontologies.