Introduction to OGSA-DAI The OGSA-DAI Team
2 The OGSA-DAI Project A generic framework for integrating data access and computation –Uniform interface to relational, XML, flat file data resources Using the grid to take specific classes of computation nearer to the data Kit of parts for building tailored access and integration applications Investigations to inform DAIS-WG One reference implementation for DAIS Releases publicly available NOW
3 Project Partners Powered by …. Funded by the Grid Core Programme
4 Project Membership Principal Investigators Project Manager Programme Management Board Chair Technical Review Board Chair Research Team IBM Dissemination Team EPCC Team Charaka Mike Ally Amy Mario Malcolm Kostas Norman Paul Neil Andy Simon BrianDave PatrickNeil IBM Development Team
5 OGSA Infrastructure Architecture Grid or Web Service Infrastructure Data Intensive Applications for Science X Compute, Data & Storage Resources Distributed Simulation, Analysis & Integration Technology for Science X Data Intensive X Scientists Virtual Integration Architecture Generic Virtual Data Access and Integration Layer Structured Data Integration Structured Data Access Structured Data Relational XML Semi-structured- Transformation Registry Job Submission Data TransportResource Usage Banking BrokeringWorkflow Authorisation OGSA-DAI
6 Project Status Current release 4.0 –Globus Toolkit 3.2 compliant –Platform and language independent Java 1.4 Document model Work concentrated on data access –Wraps data resources without hiding underlying data model –Provide base for higher-level services Distributed Query Processing (DQP) Data federation services
7 Supported Data Resources RelationalXMLOther MySQL Xindice Files DB2 eXist ? Oracle PostgreSQL SQLServer
8 Web Service Architecture Service Registry Service Consumer Service Provider Publish Bind Discover
9 OGSA-DAI Service Architecture DAISGR Service Consumer GDSF GDS Publish Bind Discover
10 OGSA-DAI Services OGSA-DAI uses three main service types –DAISGR (registry) for discovery –GDSF (factory) to represent a data resource –GDS (data service) to access a data resource This will change accesses represents DAISGR GDSF GDS Data Resource locates creates
11 GDSF and GDS Grid Data Service Factory (GDSF) –Represents a data resource –Persistent service Currently static (no dynamic GDSFs) –Cannot instantiate new services to represent other/new databases –Exposes capabilities and metadata –May register with a DAISGR Grid Data Service (GDS) –Created by a GDSF –Generally transient service –Required to access data resource –Holds the client session
12 Grid Data Service XindiceMySqlOracleDB2 Data source abstraction behind GDS instance –plug in “data resource implementations” for different data source technologies –does not mandate any particular query language or data format Heterogeneity
13 DAISGR DAI Service Group Registry (DAISGR) –Persistent service –Based on OGSI ServiceGroups –GDSFs may register with DAISGR –Clients access DAISGR to discover Resources Services (may need specific capabilities) –Support a given portType or activity
14 Analyst Registry DAISGR Factory GDSF registerServicefindServiceData Data resource publication through registry Data location hidden by factory Data resource meta data available through Service Data Elements Location
15 Interaction Model: Start up OGSI Container GDSF DAISGR 1. Start OGSI containers with persistent services. 2. Here GDSF represents Frog database.
16 Interaction Model: Registration OGSI Container GDSF DAISGR 3. GDSF registers with DAISGR. Frogs: GSH
17 Interaction Model: Discovery OGSI Container GDSF DAISGR 4. Client wants to know about frogs. Can: (i) Query the GDSF directly if known or (ii) Identify suitable GDSF through DAISGR. Frogs: GSH Mmmmm … Frogs? FindService: Frogs GSH: GDSF
18 Interaction Model: Service Creation OGSI Container GDSF DAISGR 5. Having identified a suitable GDSF client asks a GDS to be created. Frogs: GSH GDS CreateService GSH: GDS
19 Interaction Model: Perform OGSI Container GDSF DAISGR 6. Client interacts with GDS by sending Perform documents. 7. GDS responds with a Response document. 8. Client may terminate GDS when finished or let it die naturally. Frogs: GSH GDS Perform Document Response Document
20 Interaction Model: Summary Only described an access use case –Client not concerned with connection mechanism –Similar framework could accommodate service-service interactions Discovery aspect is important –Probably requires a human –Needs adequate definition of metadata Definitions of ontologies and vocabularies - not something that OGSA-DAI is doing …
21 More Complex Behaviour Data Resource Container Client GDS GDT Data Resource Container GDS GDT Deliver data back to the client. Data Resource Deliver data to a third party. Deliver data another GDS. And there's a lot more that you can do …
22 Usage Patterns G A Q S+R Data Q - Query D - Delivery S - Status R - Result U - Update I - Data id Q+D A C G S R G C A Q S D R A G Q+U S RetrieveUpdate/InsertPipeline G2=C G1=P A I Q1 S2 S1 U/R Q2+D Q1+D G2=C A G1=P S2 S1 Q2 U/R Actors - OGSI process - Non-OGSI process A - Analyst C - Consumer G - GDS P - Producer Call Response Data Flow A P G U I Q S A P G U I S Q+D
23 Project Using OGSA-DAI
24 Projects Using OGSA-DAI OGSA-DAI ( AstroGrid ( BioSimGrid ( BioGrid ( Bridges ( eDiaMoND ( FirstDig ( GeneGrid ( GEON ( IU RGRBench ( myGrid ( N2Grid ( ODD-Genes ( OGSA-WebDB ( INWA (
25 Project classification OGSA-DAI Biological Sciences Physical Sciences Commercial Applications Computer Sciences FirstDig INWA Bridges AstroGrid BioSimGrid BioGrid eDiamond myGrid ODD-Genes N2Grid GEON MCS IU RGBench OGSA Web-DB GeneGrid GridMiner
26 Points to Note Feedback from users largely positive –Good suggestions –Fair criticisms –How OGSA-DAI is being used –Where it succeeds and where it fails –Helping us to capture requirements Hope to allow user contributions –Plan to establish a policy/framework for this Engage more with User Community –Meetings scheduled for this year OGSA-DAI mini-workshop at AHM 2004 OGSA-DAI tutorials at various meetings/locations
27 e-Digital MammOgraphy National Database –Mammogram - X-ray of the breast Built prototype of a national database of mammographic images –In support of the UK Breast screening programme Employed Grid technologies to facilitate process Thanks to eDiaMonND project and the Digital Database for Screening Mammography for this image.
28 Breast screening in the UK began in 1988 –Women aged screened every 3 Years –Women aged from 2004 –1 View/Breast → 2 views by 2003 UK has –Over 90 Breast screening units throughout the UK –Each one deals with about women on average p.a. Each centre sees images/year In → –Screened: 1.4M → 1.5M –Recalled for Assessment : → –Cancers detected : → –Lives per year Saved: 300 → 1250 (by 2010) Distributed team of doctors perform the analysis
29 DB2 Content Manager DB2 Content Manager DB2 Content Manager DB2 Content Manager DB2 Federation OGSA-DAI Database Files OGSA-DAI Core Services Core Services Core Services Core Services Data Load Training App Training Services UCL KCLUEDCHU Core API Training API Training Application Core & Training API OGSA-DAI Data Load Training App Core & Training API Data Load Training App Core & Training API Data Load Training App Core & Training API
30 eDiaMoND Findings: –OGSA-DAI provides a flexible framework –Dynamically configure the system through discovery –Activities can operate with different levels of granularity –Federation can be introduced at various levels –Good documentation on how to extend the framework Extended Activities to access IBM DB2 Content Manager –Changes between versions broke some things Low level XML issues
31 FirstDIG Data mining with the First Transport Group, UK –Example: “When buses are more than 10 minutes late there is an 82% chance that revenue drops by at least 10%” –"The results of this exercise will revolutionise the way we do things in the bus industry.“, Darren Unwin, Divisional Manager, First South Yorkshire. OGSA-DAI OGSA-DAI Client Application Data Mining Application
32 INWA Innovation Node: Western Australia –Informing Business & Regional Policy: Grid-enabled fusion of global data and local knowledge Project –Run from Nov Aug 2004 –Involved 10 partners (6 UK + 4 Australia) Aim –Data mine commercially sensitive data –Security an absolute MUST –Employ Grid technologies –Need access to data and computational resources Demonstrator using: –OGSA-DAI Incorporate data resources –Sun DCG's TOG (Transfer-queue Over Globus) Handle job submission to analyse micro array data
33 Curtin,Australia EPCC,UK INWA Grid Engine BankTelco Grid Engine BankTelco OGSA-DAI TOG Data Browser Telco data Bank data Australian property UK Property
34 INWA: Lessons Learned Performing Data Integration: –TimeZone date problems Security issues: –Bugs in JavaCoG in GT3 OGSA-DAI could not switch security for Grid data transfers TOG had no security option –All of these have been fixed Middleware not mature enough for commercial deployment
35 Why OGSA-DAI? Why use OGSA-DAI over JDBC? –Can embed additional functionality at the service end Transformations, compressions Third party delivery The extensible activity framework –Avoiding unnecessary data movement –Common interface to heterogeneous data resources Relational, XML databases, and files –Usefulness of the Registry for service discovery Dynamic service binding process Provision of good meta-data is necessary –Language independence at the client end Do not need to use Java –Platform independence Do not have to worry about connection technology, drivers, etc