INFSO-RI Enabling Grids for E-sciencE OGSA DAI Data Access and Integration Marek Ciglan Institute of Informatics, Slovac Academy of Sciences
Grid Application Development, Bratislava, Enabling Grids for E-sciencE INFSO-RI Motivation Different users / applications store data in different formats –Plain files –XML databases –Relational Databases PostgreSQL Oracle DB2 MySql Difficult to work with a lot of different data formats Difficult to integrate data from heterogeneous resources
Grid Application Development, Bratislava, Enabling Grids for E-sciencE INFSO-RI OGSA DAI - Overview Allow different types of data models –Files –XML databases –Relational Databases Allow data to be accessed through uniform interfaces Provide extensible framework for integrating data resources on the Grids Allow metadata about data and the data resources in which they are stored to be obtained Facilitate the integration of data from various sources to obtain the required information
Grid Application Development, Bratislava, Enabling Grids for E-sciencE INFSO-RI Architecture
Grid Application Development, Bratislava, Enabling Grids for E-sciencE INFSO-RI Data Resource Activities Relational Activities –Run an SQL query statement –Run an SQL update statement –… XML Activities –Run an XPath statement against an XML database –Run an XUpdate statement against an XML database –… File Activities –Access a directory –Read data from a file –Manipulate files in a directory –Write data into a file
Grid Application Development, Bratislava, Enabling Grids for E-sciencE INFSO-RI Delivery Activities Retrieve data from a URL Deliver data to a URL Deliver data to a GridFTP server Retrieve data from a GridFTP server Deliver results to a stream …
Grid Application Development, Bratislava, Enabling Grids for E-sciencE INFSO-RI Transformation Activities ZIP compress the results GNU-ZIP compress the results GNU-ZIP decompress results Transform data using an XSLT Break a single block into multiple blocks based on a set of separator characters Aggregate multiple blocks into a single block
Grid Application Development, Bratislava, Enabling Grids for E-sciencE INFSO-RI Data integration MySqlXML databasePostgreSQLText File Oracle Data Warehouse
Grid Application Development, Bratislava, Enabling Grids for E-sciencE INFSO-RI Data integration MySqlXML databasePostgreSQLText File Oracle Data Warehouse How to integrate all those heterogeneous data into central data warehouse ?
Grid Application Development, Bratislava, Enabling Grids for E-sciencE INFSO-RI Data integration MySqlXML databasePostgreSQLText File Oracle Data Warehouse OGSA - DAI
Grid Application Development, Bratislava, Enabling Grids for E-sciencE INFSO-RI Data integration MySqlXML databasePostgreSQLText File Oracle Data Warehouse OGSA - DAI Select data Write data into file Compress file Transfer zip file
Grid Application Development, Bratislava, Enabling Grids for E-sciencE INFSO-RI Data integration MySqlXML databasePostgreSQLText File Oracle Data Warehouse OGSA - DAI Select data Write data into file Compress file Transfer zip file Read subset of file Transform Compress file Transfer zip file
Grid Application Development, Bratislava, Enabling Grids for E-sciencE INFSO-RI Data integration MySqlXML databasePostgreSQLText File Oracle Data Warehouse OGSA - DAI Select data Write data into file Compress file Transfer zip file Read subset of file XLST Transform Compress file Transfer zip file Select data Write data into file Compress file Transfer zip file Read subset of file Transform Compress file Transfer zip file
Grid Application Development, Bratislava, Enabling Grids for E-sciencE INFSO-RI Data integration How to perform data integration ? –Write specialized Java application for data integration –Use OGSA-DAI perform documents Perform Documents –XML documents –Describe activities to be performed select * from littleblackbook where id=10
Grid Application Development, Bratislava, Enabling Grids for E-sciencE INFSO-RI Perform documents Activities integration with perform documents select * from littleblackbook where id<100
Grid Application Development, Bratislava, Enabling Grids for E-sciencE INFSO-RI Data Security Role mapping is the process of authorizing a client's request to access a data resource two-step process: –Check whether the client is allowed to access the data resource –Determine the database user name and password (or role) to be used for this client A role map document contains the information required to undertake this process
Grid Application Development, Bratislava, Enabling Grids for E-sciencE INFSO-RI Data Security Simple OGSA-DAI Role Map Documents <User dn="No Certificate Provided" userid="myUser" password="123"/> <User dn="/C=UK/O=eScience/OU=Aspatria/L=AeSC/CN=tom“ userid="superUser" password="myPassword"/>
Grid Application Development, Bratislava, Enabling Grids for E-sciencE INFSO-RI The End Thank you for your attention.