1 OGSA-DAI: Status and Future Plans Neil Chue Hong
2 Status Project started 1 st October 2003 –DAIT Funding for 2 years –EPCC, NeSC, Newcastle, Manchester Industry Collaboration –IBM engaged at start of 2004 –Oracle and Fujitsu on Programme Board People: –5 FTEs at EPCC, 5 FTEs at IBM –1 PDRA at each of NeSC, Newcastle, Manchester Releases: –6 monthly major releases, next is April
3 3. OGSA-DAI/T Implementation 2. DAIS 1. Exploitation Project cycle
4 Milestones Jan '04 - Feb '04 - Mar '04 - Apr '04 - May '04 - Jun '04 - Jul '04 - Aug '04 - Sep '04 - Oct '04 - Nov '04 - Dec '05 - Jan '05 - Feb '05 - Mar '05 - Apr '05 - May '05 - Jun '05 - Jul '05 - Aug '05 - Sep '05 - Oct '05 - Nov '05 - Dec '05 - Jan '06 - Feb '06 - Mar '06 - Stable Database Services Specification and Primer put forward as a Proposed Recommendation R3.1 R7.0 R5.0 R6.0 WS-RF Announced R4.0 R3.1: Technical preview of parts of R4 R5: Compliance with DAIS?, distributed query and transactions, improved dependability and security, coordinated contributor community R6: Features depend on user priorities, context and research R7: Maintainable release for the user community Stable Database Services Specification and Primer put forward as a Proposed Recommendation
5 Release 4 Built on top of Globus Toolkit 3.2 –No major change to perform doc schema Supported Client Toolkit library –Easier development of applications Updated ActivityEngine –Have addressed some memory and performance issues Additional DBMSs supported –SQL Server, Postgres New GUI data browser –contributed by FirstDIG Project Bulk load supported –Allows distributed join scenarios using data browser
6 Release 4 Secure Grid Data Transport implementation Documentation now in XHTML format Metadata registered with DAISGR Updated Exception Hierarchy Stored Procedure support DBMS Specific management operations –Not DBMS transparent operations More file access prototypes User support, courses, training material Updated performance report Release date: end April / early May
7 DAIT Roadmap R5 R5 October 2004 –Possible alignment with WS-RF and DAIS Specs Assuming they settle in time –At least a tech preview of OGSA-DAI and GT4 –Possible WS-I interface implementation? –Distributed Relational Query Processing Looking to integrate the OGSA-DQP code –Improved dependability and security integration –Extended & integrated XML and relational facilities –Transaction participation –Coordinated OGSA-DAI contributor community
8 R6 R7 R6 April 2005 –Integrated with GT4? –Functionality driven by user group (i.e. YOU) –New facilities depend on user priorities, context and research –OGSA-DAI components from contributor community –Increased data integration tools R7 October 2005 –Maintainable release for the user community
9 OGSA-DAI and WS-RF OGSA-DAI R4 will still be OGSI OGSA-DAI R5 will probably have a WS-RF interface Currently investigating options for supporting different interfaces in the future –WS-RF, WS-I, legacy OGSI, … OGSA-DAI codebase and architecture makes it easier to port –But still costly to support multiple versions
10 Issues Important facets that need to be taken into account: –Metadata Data structure, access policies, data content, provenance, physical properties, etc. –Registries required for resource discovery/matching Allow dynamic binding to data sources based on provision of metadata –Different modes of operation/delivery JDBC, ODBC, bulk data transfer, third-party data transfer, incremental data transfer, delayed data transfer –Security mechanisms Authentication, authorisation, accounting, audit, privacy/encryption –Data transformation processes Reformatting, compression –Facilitate transaction / workflow participation arrangements
11 Protecting Your Users Very rapidly moving field –Technology changes –Standard changes –Middleware changes Need to ensure: –Technology adopters investment in OGSA-DAI is protected –Shielded from future change Positives: –Document interface helpful –Client toolkit
12 DAIS Documents Perform Document
13 OGSA-DAI Services OGSA-DAI uses three main service types –DAISGR (registry) for discovery –GDSF (factory) to represent a data resource –GDS (data service) to access a data resource accesses represents DAISGR GDSF GDS Data Resource locates creates
14 Supported Data Resources RelationalXMLOther MySQL Xindice Files DB2 eXist ? Oracle PostgreSQL SQLServer
15 Client Toolkit Why? Nobody wants to write XML! A programming API which makes writing applications easier –Now: Java –Next: Perl, C, C#? // Create a query SQLQuery query = new SQLQuery(SQLQueryString); ActivityRequest request = new ActivityRequest(); request.addActivity(query); // Perform the query Response response = gds.perform(request); // Display the result ResultSet rs = query.getResultSet(); displayResultSet(rs, 1);
16 We have delivered several releases of OGSA-DAI There is much to be learned this year from taking the application view Tasks –User Group Meetings Input to DAIS and DAI/T implementation –Support to projects Skyserver, eDiaMoND, SIMDAT etc.. –Publications and presentations Globus World, GGF Workshops, All Hands etc –Surveys and questionnaires Exploitation
17 DAIS has been through a number of revisions and is yet to settle down. We need to review the roadmap and absorb the new WS directions. Tasks –Vision and roadmap –Compliance with Data Services –Compliance with WSRF –Compliance with WS Agreement –Compliance with CIM/DMTF –Compliance with other data standards, e.g SQL/XML –Integration with GDD DAIS
18 Requirements What do you consider important? –Data access –Data integration –Application support –Service Extensibility –Performance –Security –Useability –User Support
19 Versioning Which is more important? –OGSI / WS-RF /WS-I –DAIS specifications –Globus Tookit –Java –Java libraries –Databases –RowSet –XML Schema –Client Toolkit API –OGSA-DAI interfaces How many versions should we support?
20 Platforms Which is more important? –Ease of installation and support –Ability to access underlying platforms Black box versus full extensibility –How many people need to know whether its running on Tomcat? On Axis?
21 Interfaces Which is more important? –Continued support for OGSI –Port to WS-RF –Port to WS-I –Port to… Web Services –WSDL/SOAP/OGSA –REST –…
22 Data Integration Which is more important? –Distributed Query Processing –Distributed Queries –Distributed Data Management –Federated databases –Virtual federation –Data virtualisation –Data provenance –Transactions
23 Languages Which is more important? Query languages –SQL (92, 99, …), XPath, XQuery, RegExp, … Programming languages –Java –C / C++ –C# –Fortran –Perl / Python –…
24 Security Which is more important? –Transport Level Security –Message Level Security –Standards (GSI, GSS, Kerberos, WS-Security, …) –Scalability –Robustness –Performance –Definition –Management –Usability
25 Useability Which is more important? –Client libraries for usability –Exemplar client framework –Additional Client Toolkit Activity implementation –Support for Client Toolkit version changes –Graphical components
26 User Support Which is more important? –Continue support for community via web site, mail list, announce, help, consultations –Training courses –Talks, demonstrations and presentations –User documentation –"How To" guides –Installation and configuration wizards –Different Installers and Uninstallers (e.g. WAR, …)
27 Requirements (again) Which is more important? –Data access SQL refactoring Interoperability of results –Data integration Schema integration tools Data description tools –Application support –Service Extensibility –Performance –Security –Useability –User Support
28 Summary Many paths which OGSA-DAI can take –Which one to take Future releases – need to be driven by your requirements –You need to tell us It is a trade off –No infinite development team –Port versus support –Functional and non-functional requirements
29 Performance Performance analysis and optimisation –Performance Optimisation –Performance and diagnostic monitoring support –Progress notification –OGSA-DAI performance and benchmarking –Extended performance tests –Progress notification to client side –Instrumentation points through code
30 DBMS DBMS –Access to DBMS management operations, Archive and restore –Access to DBMS native bulk load operations –Full DDL coverage with DBMS transparency –Support for additional databases (relational, XML, object) –Support for portable stored procedures –Transactions over single resource –Transactions over multiple resources –Distributed joins as server activity –Extremely large dataset support –Support for additional SQL types –Keeping up to date with new RowSet schema
31 Files Files –Simple file access –CSV implementation version 2 (Mike?) –Files support (not CSV) e.g. BinX –Regular expression driver implementation for arbitary files –XML file driver implementation
32 User Support User support and training –Continue support for community via web site, mail list, announce, help, consultations –Training courses –Talks, demonstrations and presentations –User documentation –"How To" guides –Installation and configuration wizards –Uninstaller –WAR installer
33 Client tooling Client Tooling (Developer Usability) –Client libraries for usability –Exemplar client framework –Additional Client Toolkit Activity implementation –Support for Client Toolkit version changes –Graphical components
34 XML Data Services R&D 1Investigate viable options for improved XML data services Distributed Query Processing 1Performance evaluation suite 2Push more sustantial sub queries to data services for evaluation 3Improved client 4Plan for XML DQP deliverables 5Get DQP code integrated into DAI CVS? 6DQP productisation Testing 1Automated system tests 2Dashboard 3Completion of missing unit tests Refactoring 1Improve NLS support 2Error handling and error codes 3Jar refactoring 4Command Line Client refactoring 5Refactoring of SQLActivity Security 1Simple Encrypted Rolemapper 2GDT security Runtime useability 1Include FirstDIG SQL GUI as tech preview Performance analysis and optimization 1Performance Optimisation DBMS 1Distributed transaction coordination Files 1Extended file facilities Client Tooling (Developer useability) 1JDBC based client 2ODBC/JDBC Driver as front end to OGSA-DAI 3WSAD/Eclipse plugins User support and Training (Developer useability) XML Data Services R&D Distributed Query Processing Testing 1Update automated system tests 2Produce unit testing guidelines (especially for activities) 3Produce consistent approach for integration testing 4Dashboard for unit tests 5Increased Testing Coverage Refactoring 1Engine Mark II 2GUI Client refactoring 3Better exception hierarchy Security 1Security Audit Document Runtime useability 1Management interfaces/CIM/WSDM 2Monitoring 3Extending FirstDIG SQL GUI Compliance 1Compliance with the DAIS specification post GGF11 2Compliance with OGSI changes Dependability 1Code hardening Extended relational facilities Extended XML facilties Meta Data 1Advertising and discovering data resources Data Virtualization 1Distribution 2Replication 3Caching 4Federation Compliance 1Compliance with WS-Agreement Integration with GT3 Resource registration Distributed XML query TBD Initial investigations
35 Goals of the session Three stages: –list the requirements of each project –identify requirements common to projects –identify requirements of high priority We should try and avoid the politics and keep an eye on the clock!