Neil Chue Hong Project Manager, EPCC +44 131 650 5957 OGSA-DAI Status and Benchmarks All Hands Meeting 2005 Nottingham, 22 September.

Slides:

Advertisements

Similar presentations

Instant JChem - current status and what's coming soon. Tim Dudgeon Solutions for Cheminformatics.

Advertisements

Tom Sugden EPCC OGSA-DAI Future Directions OGSA-DAI User's Forum GridWorld 2006, Washington DC 14 September 2006.

© 2007 Open Grid Forum Data Management Challenge - The View from OGF OGF22 – February 28, 2008 Cambridge, MA, USA Erwin Laure David E. Martin Data Area.

1 OGSA-DAI Platform Dependencies Malcolm Atkinson for OMII SC 18 th January 2005.

OMII-UK Steven Newhouse, Director. © 2 OMII-UK aims to provide software and support to enable a sustained future for the UK e-Science community and its.

An Overview of OGSA-DAI Kostas Tourlas

BiodiversityWorld GRID Workshop NeSC, Edinburgh – 30 June and 1 July 2005 Resource wrappers, web services, grid services Jaspreet Singh School of Computer.

EGEE is a project funded by the European Union under contract IST International Summer School on Grid Computing Vico Equense, 16 th July 2005.

Amy Krause Applications Consultant, EPCC Tom Sugden Applications Consultant, EPCC OGSA-DAI Client Toolkit Principles.

Distributed Heterogeneous Data Warehouse For Grid Analysis

Intelligent Grid Solutions 1 / 18 Convergence of Grid and Web technologies Alexander Wöhrer und Peter Brezany Institute for Software.

Technical Architectures

1 An Introduction to OGSA-DAI Konstantinos Karasavvas 13 th September 2005.

Mike Jackson EPCC OGSA-DAI Today Release 2.2 Principles and Architectures for Structured Data Integration: OGSA-DAI.

17 July 2006ISSGC06, Ischia, Italy1 Agenda Session 26 – 14:30-16:00 An Overview of OGSA-DAI OGSA-DAI today – and future features How to extend OGSA-DAI.

Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.

Globus 4 Guy Warner NeSC Training.

1 Web Database Processing. Web Database Applications Static Report Publishing a report is prepared from a database application and exported to HTML DB.

Introduction to OGSA-DAI Neil Chue Hong 15 th February 2006 GGF16, Athens.

Database System Concepts and Architecture Lecture # 3 22 June 2012 National University of Computer and Emerging Sciences.

1 OGSA-DAI: Status and Future Plans Neil Chue Hong.

OGSA-DAI: Future Work and Wrap-up The OGSA-DAI Team

Native Support for Web Services  Native Web services access  Enables cross platform interoperability  Reduces middle-tier dependency (no IIS)  Simplifies.

1 Dr. Markus Hillenbrand, ICSY Lab, University of Kaiserslautern, Germany A Generic Database Web Service for the Venice Service Grid Michael Koch, Markus.

1 UK NeSC Meeting, November 18 th, 2004 Terry Sloan EPCC, The University of Edinburgh INWA : using OGSA-DAI in a commercial environment.

Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.

Intelligent Grid Solutions GridMiner A Framework for Knowledge Discovery on the Grid – from a Vision to Design and Implementation Peter.

Fundamentals of Database Chapter 7 Database Technologies.

GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.

ES Metadata Management Enabling Grids for E-sciencE ES metadata OGSA-DAI NA4 GA Meeting, D. Weissenbach, IPSL, France.

SharePoint 2010 Search Architecture The Connector Framework Enhancing the Search User Interface Creating Custom Ranking Models.

Chapter 14 Part II: Architectural Adaptation BY: AARON MCKAY.

Introduction to OGSA-DAI The OGSA-DAI Team

Information System Development Courses Figure: ISD Course Structure.

DAIT (DAI Two) NeSC Review 18 March Description and Aims Grid is about resource sharing Data forms an important part of that vision Data on Grids:

OGSA-DAI Scenarios and Requirements OGSA-DAI for Developers GridWorld 2006, Washington DC 11 September 2006.

ILDG Middleware Status Chip Watson ILDG-6 Workshop May 12, 2005.

CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.

OGSA-DAI in OMII-Europe Neil Chue Hong EPCC, University of Edinburgh.

1 1 EPCC 2 Curtin Business School & Edinburgh University Management School Michael J. Jackson 1 Ashley D. Lloyd 2 Terence M. Sloan 1 Enabling Access to.

Web: Minimal Metadata for Data Services Through DIALOGUE Neil Chue Hong AHM2007.

NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.

IODE Ocean Data Portal - ODP  The objective of the IODE Ocean Data Portal (ODP) is to facilitate and promote the exchange and dissemination of marine.

State Key Laboratory of Resources and Environmental Information System China Integration of Grid Service and Web Processing Service Gao Ang State Key Laboratory.

Mike Jackson EPCC OGSA-DAI Architecture + Extensibility OGSA-DAI Tutorial GGF17, Tokyo.

OGSA-DAI Neil Chue Hong 29 th January 2007 OGF19, Chapel Hill.

Amy Krause EPCC OGSA-DAI An Overview OGSA-DAI Technology Update GGF17, Tokyo (Japan)

EGEE User Forum Data Management session Development of gLite Web Service Based Security Components for the ATLAS Metadata Interface Thomas Doherty GridPP.

DGC Paris WP2 Summary of Discussions and Plans Peter Z. Kunszt And the WP2 team.

1 Service Creation, Advertisement and Discovery Including caCORE SDK and ISO21090 William Stephens Operations Manager caGrid Knowledge Center February.

1 OGSA-DAI Status Report Neil P Chue Hong 20 th May 2005.

Introduction to OGSA-DAI Neil Chue Hong OGSA-DAI Project Manager 14 th February 2006 GGF16, Athens.

OGSA-DAI & DAIT projects Update for TAG Prof. Malcolm Atkinson Director 30 th October 2003.

Neil Chue Hong Project Manager, EPCC OGSA-DAI Requirements Gathering Exercise 2 nd DIALOGUE workshop eSI, 9-10.

The OGSA-DAI Project Databases and the Grid Neil Chue Hong Project Manager EPCC, Edinburgh

Mike Jackson EPCC OGSA-DAI Today – Release 8 OGSA-DAI Tutorial GGF17, Tokyo.

Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.

OGSA-DAI Usage Scenarios and Behaviour: Determining good practice Mario Antonioletti EPCC, University of Edinburgh

1 OGSA-DAI: Service Grids Neil P Chue Hong. 2 Motivation  Access to data is a necessity on the Grid  The ability to integrate different data resources.

Data Breakout. OGSA Architecture – databases Eldas, OGSA-DAI and GridMiner implement a slightly old version of OGSA / DAIS –Architecture doc describes.

Neil Chue Hong EPCC Authorization Models for Data Services EGEE Workshop on Management of Rights in Production Grids.

OGSA-DQP Steven Lynden University of Manchester. Data access & integration with OGSA-DAI: GGF 17 2 Introduction OGSA-DQP is a service based distributed.

1 Case Study: Business Intelligence & Customer Data Customer Support Web-based Dashboard VP Marketing SQL XSLT XML Data Grid Customer Data Customer Order.

XML and Distributed Applications By Quddus Chong Presentation for CS551 – Fall 2001.

OGSA-DAI Current Version Guy Warner.

Amy Krause EPCC OGSA-DAI An Overview OGSA-DAI on OMII 2.0 OMII The Open Middleware Infrastructure Institute NeSC,

LOCO Extract – Transform - Load

UK e-Science OGSA-DAI November 2002 Malcolm Atkinson

Presentation transcript:

Neil Chue Hong Project Manager, EPCC OGSA-DAI Status and Benchmarks All Hands Meeting 2005 Nottingham, 22 September 2005

AHM20052 Overview The all new OGSA-DAI overview Benchmarking and profiling work Project collaboration Future plans

AHM20053 OGSA-DAI team IBM Development Team, Hursley NEReSC, Newcastle NeSC, Edinburgh EPCC Team, Edinburgh ESNW, Manchester IBM Dissemination Team

AHM20054 OGSA-DAI In One Slide An extensible framework for data access and integration. Expose heterogeneous data resources to a grid through web services. Interact with data resources: – Queries and updates. – Data transformation / compression – Data delivery. Customise for your project using – Additional Activities – Client Toolkit APIs – Data Resource handlers A base for higher-level services – federation, mining, visualisation,…

AHM20055 MySQL OGSA-DAI service Engine SQLQuery JDBC Data Resources Activities DB2 The OGSA-DAI Framework GZipGridFTPXPath XMLDB XIndice readFile File SWISS PROT XSLT SQL Server Data- bases Application Client Toolkit

AHM20056 MySQL OGSA-DAI service Engine SQLQuery JDBC SQL JDBC SQL JDBC SQL JDBC SQL JDBC Multiple SQL GDS SQLQuery Extensibility Example

AHM Timeline Release 1 interim Release 2 Release 2 interim Release 3 Release 3.1 Release 4 Release 5 OGSI Release 6  Release 1 OGSA-DAI WSRF 1.0 OGSA-DAI WS-I 1.0/ OGSA-DAI WS-I 1.1 (OMII)

AHM20058 Release downloads Data up to 28/07/05

AHM20059 Geographical download profiles OGSIWSRFWS-I China (28%)China (32%)UK (30%) UK (20%)UK (19%)China (28%) US (12%)Germany (8%)US (8%) Unknown (10%)US (7%)Japan (7%) Data up to 29/07/05

AHM Our stakeholders OMII –Current version of OGSA-DAI WS-I 1.0 distribution runs on OMII –Release 1.1 due out soon –Issues when security is introduced Globus –WSRF distribution bundled with GT4.0 –WSRF 1.0 distribution bundled with GT4.0.1 Projects –Number of projects have used/use/will use OGSA-DAI AstroGridBiogridBioSimGridBridgescaGridDataMiningGrid eDiamondFirstDigGEDDMGeneGridGEONGridMiner INWAIU RGRBenchLEADMCS my GridN2Grid ODD-GenesOGSA-WebDBSIMDATGOLD

AHM Out with the old… Client Client Toolkit API Relational XML Files Client Server Data SOAP DAISGR GDS GDSF

AHM … in with the new! Client Generic Client Toolkit API WS-I WSRF DAI Core DSR Data Service WSRF WS-I DSR RelationalXML Files Client Server Data SOAP

AHM Changes in moving to WSRF/WS-I Registry component (DAISGR) no longer supported –Hope to leverage of third party registration services –GRIMOIRES ( –Others … GDS/GDSF roles combined –Use data services –Currently static services but –Reconfigurable services Improvements to the GDS –Data resource abstraction decoupled from the service –Renaming (consistent naming across platform versions) –Ability to enforce control flow constraints (ordering activities) –Refactored exception framework Temporary set-backs (we promise we’ll fix them) –No security model –No concurrency –Previously used GDSs for concurrency –Support now moving to the engine

AHM The Client Toolkit (CTk) Provides programmatic abstraction for perform documents – Do not have to write XML explicitly Abstraction over WSI and WSRF services at client side – don’t need to know what type of service is at the other end (almost) – security model is the remaining issue Currently only Java version of CTk – Stabilising API – Publish an API document – Allow 3 rd parties to develop CTk for other programming languages Client Generic Client Toolkit API WS-I WSRF

AHM The Server Side Server side: – Presentation layer: – Deal with messaging differences – Get one version per distribution – Core/Business Logic: – Common to all distributions – Data Service Resource (DSR) – Data Layer: – Relational databases – XML document repositories – File based repositories New architecture being rolled out – see Malcolm’s talk in next session – concurrency, sessions and transactions DAI Core DSR Data Service WSRF DSR Relational XML Files WS-I

AHM Benchmarking/Profiling Establish benchmark suite to: –Measure performance gains/losses between releases –Reveal implementation issues –Allows focused improvements –Establish best practice –Summer intern (Heather Kelly) produced results Profiling allows us to identify particular areas which are causing poor performance in the benchmarks –Summer intern (Radoslaw Ostrowski) extended Netlogger and did some profiling Most of the results are for OGSA-DAI R6 –one slide showing what is happening in R7

AHM Configuration Measure the time to: –Send SQL query to server –Return nRows –Sum the values in one of the columns Do this 30 times –Calculate mean and standard deviation Repeat the process having increased nRows by stepsize Try various different databases Notes: –Time to establish connection in JDBC runs not included –JDBC does not return results in WebRowSet format –Server is already running Data source little blackbook –Test database included in distributions Windows XP Pro SP2 Intel PIII 863MHz 512Mb RAM Windows XP Pro SP2 Intel PIII 863MHz 512Mb RAM SunOS 5.9 UltraSPARC-IIe 502 MHz 128Mb RAM SunOS 5.9 UltraSPARC-IIe 502 MHz 128Mb RAM Tomcat GT OGSA-DAI OGSI R6.0 j2sdk 1.4.2_01 Tomcat GT OGSA-DAI OGSI R6.0 j2sdk 1.4.2_01 10MBit network

AHM Some benchmarks Relational query – StreamServlet requires two communications – could improve this – FTP not iterating over result set – JDBC scales much better than SOAP ResultSet implementations – Forwards-backwards implementation builds DOM tree; larger memory footprint

AHM MySQL (nRows = 10000, number of runs = 30, stepsize = 500, blockSize = 200)

AHM DB2 (nRows = 10000, number of runs = 30, stepsize = 500, blockSize = 200)

AHM PostgreSQL (nRows = 10000, number of runs = 30, stepsize = 500, blockSize = 200)

AHM SQL Server (nRows = 10000, number of runs = 30, stepsize = 500, blockSize = 200)

AHM Oracle (nRows = 10000, number of runs = 30, stepsize = 500, blockSize = 200)

AHM OGSA-DAI WS-I (nRows = 10000, number of runs = 30, stepsize = 500)

AHM Database comparison (OGSA-Dai WSRF 1.0, nRows = 10000, number of runs = 30, stepsize = 500)

AHM Platform comparison (MySQL database, nRows = 10000, number of runs = 30, stepsize = 500)

AHM Profiling: better RowSet conversion ResultSet to RowSet conversion

AHM R6->R7: removal of RowSet

AHM Challenges Intermediate representation –between multiple models (relational, XML,…) –XML WebRowSet is flexible (c.f. GridMiner) but expansive –DFDL and GridFTP/parallel HTTP? Query definition –translation of queries Data transport and workflow –workflow is typically compute driven Move computation to data –mobile code activities? –data services hosted on DBMS?

AHM caBIG “Object-Oriented” view of data –Data types are well-defined and registered in a repository –Standardized metadata facilitates discovery –custom query language implemented as an activity

AHM LEAD IU NCSA Illinois UA Huntsville Millersville UCAR Unidata Okla Univ Master catalog Each satellite replicates its contents to the master catalog

AHM Users Group and DIALOGUE Workshops 3 rd Users Group meeting –June 1 st – DIALOGUE Workshops –Data Integration Applications: Linking Organisations to Gain Understanding and Experience –Columbus, Edinburgh, Vienna, Indiana –Bringing together Data Integration middleware and application providers with users –

AHM Future plans A new version of the OGSA-DAI Engine –should look mostly the same externally –better support for concurrency, sessions and monitoring –see Architecture paper/talk presented on Monday Implementing new versions of specifications –DAIS Specifications Key things that we will be addressing after Release 7: –Performance –A Security Model which can be applied across platforms –Full Transactions provision, including implementation of compensatory activities, distributed transactions –More data integration facilities –Better abstraction over DBMS variation

AHM Conclusions OGSA-DAI has had to undergo significant refactoring to keep stakeholders happy Refactoring has allowed us to create an extensible framework which can be used for many data related tasks We need to identify the components and improvements which will be useful to users There is obviously room for improvement on performance, and we are working on it

AHM Further information The OGSA-DAI Project Site: – The DAIS-WG site: – OGSA-DAI Users Mailing list –General discussion on grid DAI matters Formal support for OGSA-DAI releases – OGSA-DAI training courses

AHM Core features of OGSA-DAI – I A framework for building applications –Supports data access, insert and update –Relational: MySQL, Oracle, DB2, SQL Server, Postgres –XML: Xindice, eXist –Files – CSV, BinX, EMBL, OMIM, SWISSPROT,… –Supports data delivery –SOAP over HTTP –FTP; GridFTP – –Inter-service –Supports data transformation –XSLT –ZIP; GZIP –Supports security –X.509 certificate based security

AHM Core features of OGSA-DAI – II A framework for building data clients –Client toolkit library for application developers A framework for developing functionality –Extend existing activities, or implement your own –Mix and match activities to provide functionality you need Highly-extensible –Customise our out-of-the-box product –Provide your own services, client-side support and data-related functionality Comprehensive documentation and tutorials Latest release supports GT3.2 (to be deprecated), GT4.0, and Axis 1.2 / OMII_2 using Java 1.4

AHM OGSA-DAI Design Principles – I Efficient client-server communication –Minimise where possible –One request specifies multiple operations No unnecessary data movement –Move computation to the data –Utilise third-party delivery –Apply transforms (e.g., compression) Build on existing standards –Fill-in gaps where necessary

AHM OGSA-DAI Design Principles – II Do not hide underlying data model –Users must know where to target queries –Data virtualisation is hard Extensible architecture –Modular and customisable –e.g., to accommodate stronger security Extensible activity framework –Cannot anticipate all desired functionality –Activity = unit of functionality –Allow users to plug-in their own

AHM Data Integration challenges Metadata extraction –define a common model for e.g. database schema? Intermediate representation –between multiple models (relational, XML,…) –XML WebRowSet is flexible (c.f. GridMiner) but expansive –DFDL and GridFTP/parallel HTTP? Query definition –translation of queries Data transport and workflow –workflow is typically compute driven Move computation to data –mobile code activities? –data services hosted on DBMS?

AHM Contributing to OGSA-DAI Additional functionality: –Provide activities which implement specific functionality –Provide extra client functionality –Provide different security mechanisms –Provide higher level components and applications Different levels of contributions –Based on OGSA-DAI? –Works with OGSA-DAI? –Part of OGSA-DAI?

AHM Distributed Query Processing Queries mapped to algebraic expressions for evaluation Parallelism represented by partitioning queries –Use exchange operators Prototype available from: – Being integrated into OGSA-DAI table_scan (protein) table_scan termID=S92 (proteinTerm) reduce hash_join (proteinId) op_call (Blast) reduce exchange 3,4 12

AHM caBIG “Object-Oriented” view of data –Data types are well-defined and registered in a repository –Standardized metadata facilitates discovery –custom query language implemented as an activity

AHM LEAD IU NCSA Illinois UA Huntsville Millersville UCAR Unidata Okla Univ Master catalog Each satellite replicates its contents to the master catalog

AHM FirstDIG Data mining with the First Transport Group, UK –Example: “When buses are more than 10 minutes late there is an 82% chance that revenue drops by at least 10%” – OGSA-DAI OGSA-DAI Client Application Data Mining Application

AHM GridMiner Test application area: medical –traumatic brain injury treatment –Predicting the outcome of seriously ill patients –analytical part focuses on data mining and On-Line Analytical Processing (OLAP) Target: –provide tools to discover and access relevant knowledge and information from different distributed and heterogeneous data sources –building on and extending OGSA-DAI

AHM GridMiner Scenario Heterogeneities: –Name in A is „First Last“ (as the target format) –Name in C has to be combined Distribution: –3 data sources

AHM Software Process Testing Reqs. Prototype Prioritisation Fix Bugs Use Cases Requests Design ImplementQA Release Support Test Cases Programme Board Technical Review Board Technical Reviewer DEVELOPERS USERS REVIEW Contribs Ingest Dissem. Training Nightly unit + system tests Additional test cases System tests based on reqs Continual process → Deep track features Users’ Group Peer Review and Inspection

AHM Curtin,Australia EPCC,UK INWA Grid Engine BankTelco Grid Engine BankTelco OGSA-DAI TOG Data Browser Telco data Bank data Australian property UK Property