Download presentation
Presentation is loading. Please wait.
Published byTracy Richard Modified over 9 years ago
1
Introduction to OGSA-DAI Neil Chue Hong 15 th February 2006 GGF16, Athens
2
Data Services: challenges l Scale u Many sites, large collections, many uses l Longevity u Research requirements outlive technical decisions l Diversity u No “one size fits all” solutions will work l Primary Data, Data Products, Meta Data, Administrative data, … l Many Data Resources u Independently owned & managed u Geographically distributed l and I haven’t even mentioned security yet!
3
Use Cases for Data Services l Data Filtering: u Single source producing large amounts of data distributed to many sites downstream l Data Discovery: u many sources, many query entry points in a linked system l Data Translation: u source to sink, conversion of data model / structure l Data Federation: u many sources, linked to provide view as a single source l Data Replication u full or partial copies to improve throughput l Data Integration (model aggregation) u e.g. integration of time variant data, streams, files l Data Integration (knowledge expansion) u forming links between databases to increase knowledge
4
Trade Offs l Speed vs completeness u do you require the exact answer or an answer? l Application specific vs language specific queries u how will users interrogate a data service? l Static system vs Dynamic Discovery u can you actually have dynamic resources? l Static vs Dynamic data u READ only, INSERT only, UPDATE permitted l Static vs Dynamic queries u optimisation over flexibility l Intranet vs Internet u speed over security l Single data model versus mixed data models u ease/speed over integration l Queries vs Questions u assume that we know the structure when we form the query
5
Requirements on Data Services? l Common Data Model e.g. RowSet l Common Query Language(s) e.g. XQuery, SQL l Standard access to u data resource schema information u physical data resource information for optimisation purposes u data resource descriptive information for discovery / integration l Single, seamless security model l Dynamic publication and discovery l Multiple, efficient delivery methods l Move computation towards data l Data aggregation functionality l Replication information
6
OGSA-DAI In One Slide l An engineered extensible framework for data access and integration. l Expose heterogeneous data resources to a grid through web services. l Interact with data resources: u Queries and updates. u Data transformation / compression u Data delivery. l Customise for your project using u Additional Activities u Client Toolkit APIs u Data Resource handlers l A base for higher-level services u federation, mining, visualisation,…
7
OGSA-DAI Philosophy l We provide the basic, general functionality u e.g. querying relational databases, delivery mechanisms, schema extractors l You add the specialist functionality u e.g. map overlays l Several well-defined extension points u client toolkit u activity plugins u data resource accessor model
8
MySQL OGSA-DAI service Engine SQLQuery JDBC Data Resources Activities DB2 GZipGridFTPXPath XMLDB XIndice readFile File SWISS PROT XSLT SQL Server Data- bases Application Client Toolkit
9
MySQL OGSA-DAI service Engine SQLQuery JDBC SQL JDBC SQL JDBC SQL JDBC SQL JDBC Multiple SQL GDS SQLQuery
10
Distributed Query Processing l Higher level services building on OGSA- DAI l Queries mapped to algebraic expressions for evaluation l Parallelism represented by partitioning queries u Use exchange operators table_scan (protein) table_scan termID=S92 (proteinTerm) reduce hash_join (proteinId) op_call (Blast) reduce exchange 3,4 12
11
DQP architecture
12
Map Retrieval: Integration l Using security and extensibility (overlay) OGC ODS 2GIS Oracle Portlet ODS 1 Oracle Census ODS 3 Application data SO-OGC JDBC SO-OGC SQL/XML NGS Authentication
13
Integrated service for Data & Metadata
14
MDS/GridFTP/GSI Integration l Can publish any OGSA-DAI resource property to a local MDS Index Service u e.g. databaseSchema, activityTypes u information published is on a per-resource basis, and can differ for each resource l Can transfer results via GridFTP rather than via SOAP u still working on tuning options l Can use X509 certificates to secure services u but still a coarse grained security by default
15
Future plans: overview l A new version of the OGSA-DAI Engine u better support for concurrency, sessions, monitoring and notification l Implementing new DAIS specifications l Key things that we will be addressing: u Performance (particularly format representation and transport) u Security Model which can be applied across platforms u Transactions provision u More data integration facilities l Integration with other components u registries (e.g. GRIMOIRES) u workflow editors (e.g. Taverna) l Working with new projects u e.g. CancerGrid, iSpider, GEODE
16
Future plans: Performance l WebRowSet is not efficient u aim to use ResultSet and CSV instead where possible l SOAP is not efficient u aim to use SOAP w/Attachments, MTOM ResultSet to RowSet conversion WebRowSet is larger CSV scales better for output Conversion and validation takes the time work in progress Jan06
17
From contribution to core l One of a group of projects moving to GlobDev project (more later) l Hope to use this as a way of encouraging collaborations and contributions l Different levels of contributions u Based on OGSA-DAI? u Works with OGSA-DAI? u Part of OGSA-DAI?
18
Contributing to OGSA-DAI l Additional functionality: u Provide activities which implement specific functionality u Provide extra client functionality u Provide different security mechanisms u Provide higher level components and applications
19
Further information l The OGSA-DAI Project Site: u http://www.ogsadai.org.uk l The DAIS-WG site: u http://forge.gridforum.org/projects/dais-wg/ l OGSA-DAI Users Mailing list u users@ogsadai.org.uk l Formal support for OGSA-DAI releases u http://bugzilla.globus.org (OGSA-DAI) l OGSA-DAI training courses (live and online)
20
The OGSA-DAI Team IBM Development Team, Hursley NEReSC, Newcastle NeSC, Edinburgh ESNW, Manchester IBM Dissemination Team EPCC Team, Edinburgh
21
Software Process Testing Reqs. Prototype Prioritisation Fix Bugs Use Cases Requests Design ImplementQA Release Support Test Cases Programme Board Technical Review Board Technical Reviewer DEVELOPERS USERS REVIEW Contribs Ingest Dissem. Training Nightly unit + system tests Additional test cases System tests based on reqs Continual process → Deep track features Users’ Group Peer Review and Inspection
22
International Cooperation and Recognition USA: o Globus Alliance o IBM Corporation o caBIG o BIRN o Indiana University o GridSphere o GEON o LEAD o MCS o NCSA o Secure Data Grid o UNC Japan: o AIST o BioGrid o NAREGI Europe: o CERN o DataMiningGrid o GridMiner o GridSphere o inteligrid o N2Grid o OntoGrid o Provenance o SIMDAT UK: o OMII o NGS o NCeSS o NIeeS o AstroGrid o BioSimGrid o BRIDGES o CancerGrid o ConvertGrid o eDiaMonD o EDINA o First Group plc o Fujitsu Labs Europe o GEDDM o GeneGrid o Genomic Technology and Informatics o GOLD o Human Genetics Unit o IBM UK o my Grid o Oracle UK China: o CAS o ChinaGrid o cnGrid o INWA Australia: o Curtin Business School o INWA Tutorials BostonCambridge CERNChicago EdinburghLondon San FranciscoSeattle SeoulSingapore TokyoISSGC 03 to 05 DIALOGUE workshops Columbus, Edinburgh, Indiana, Vienna Chicago, Manchester, San Diego South Korea: o KISTI 1485 registered users 5250+ downloads
23
LEAD GeneGrid caBIG BRIDGES OGSA WebDB FirstDIG ConvertGrid eDiaMoND OGSA-DQP Grid Miner Meeting User Requirements
24
l Number of users u 1485 registered u 5250+ downloads l 3 Users’ Group Meetings u Edinburgh u Brussels u Edinburgh l Contributors u Austria, China, Finland, Poland, Spain, UK, USA Release Statistics 985 downloads of latest release -Actual user downloads not search engine crawlers -Does not include downloads as part of GT3.2 and GT4 releases R1.0 (Jan 03)109 R1.5 (Feb 03)110 R2.0 (Apr 03)254 R2.5 (Jun 03)294 R3.0 (Jul 03)792 R3.1 (Feb 04)686 R4.0 (May 04)1124 R5.0 (Dec 04)766 R6.0 (May 05)985 Meeting User Requirements
25
Core features of OGSA-DAI l A framework for building data clients u Client toolkit library for application developers u Seamless abstraction across WSI and WSRF services l Highly-extensible u Customise out-of-the-box product l A framework for developing functionality u Compose existing activities with application specific activities u Data service concurrency and sessions l Comprehensive documentation and tutorials l Shipped to run on u OMII_2, GT4.0 and Axis 1.2
26
Functionality of OGSA-DAI A framework for data applications l Data access, insert and update u Relational: MySQL, Oracle, DB2, SQL Server, Postgres, … u XML: Xindice, eXist u Files – CSV, EMBL, OMIM, SWISSPROT,… l Data delivery u SOAP over HTTP u FTP, GridFTP u E-mail u Inter-service l Data transformation u XSLT u ZIP, GZIP l Security u X.509 certificates u Message Level u Transport Level
27
OGSA-DAI Motivation l Entering an age of data u Data Explosion l CERN: LHC will generate 1GB/s = 10PB/y l VLBA (NRAO) generates 1GB/s today l Pixar generate 100 TB/Movie u Storage getting cheaper l Data stored in many different ways u Data resources l Relational databases l XML databases l Flat files l Need ways to facilitate u Data discovery u Data access u Data integration l Empower e-Business and e-Science u The Grid is a vehicle for achieving this
28
Goals for OGSA-DAI l Aim to deliver application mechanisms that: u Meet the data requirements of Grid applications l Functionality, performance and reliability l Reduce development cost of data centric Grid applications l Provide consistent interfaces to data resources u Acceptable and supportable by database providers l Trustable, imposed demand is acceptable, etc. l Provide a standard framework that satisfies standard requirements l A base for developing higher-level services u Data federation u Distributed query processing u Data mining u Data visualisation
29
Integration Scenario l A patient moves hospital DB2 Oracle CSV file A: (PID, name, address, DOB) B: (PID, first_contact) C: (PID, first_name, last_name, address, first_contact, DOB) Data A Data B Data C Amalgamated patient record
30
Why OGSA-DAI? l Why use OGSA-DAI over JDBC? u Language independence at the client end l Do not need to use Java u Platform independence l Do not have to worry about connection technology and drivers u Can handle XML and file resources u Can embed additional functionality at the service end l Transformations, Compression, Third party delivery l Avoiding unnecessary data movement u Provision of Metadata is powerful u Usefulness of the Registry for service discovery l Dynamic service binding process u The quickest way to make data accessible on the Grid l Installation and configuration of OGSA-DAI is fast and straightforward
31
Core features of OGSA-DAI l An extensible framework for building applications u Supports relational, xml and some files l MySQL, Oracle, DB2, SQL Server, Postgres, XIndice, CSV, EMBL u Supports various delivery options l SOAP, FTP, GridFTP, HTTP, files, email, inter-service u Supports various transforms l XSLT, ZIP, GZip u Supports message level security using X509 certificates u Client Toolkit library for application developers u Comprehensive documentation and tutorials l Third production release on 3 December 2004 u OGSI/GT3 based u Also previews of WS-I and WS-RF/GT4 releases
32
Activities are the drivers l Express a task to be performed by a GDS l Three broad classes of activities: u Statement u Transformations u Delivery l Extensible: u Easy to add new functionality u Does not require modification to the service interface u Extension operate within the OGSA-DAI framework l Functionality: u Implemented at the service u Work where the data is (do not require to move data back)
33
Client Toolkit l Why? Nobody wants to write XML! l A programming API which makes writing applications easier u Now: Java u Next: Perl, C, C#?, ML!? // Create a query SQLQuery query = new SQLQuery(SQLQueryString); ActivityRequest request = new ActivityRequest(); request.addActivity(query); // Perform the query Response response = gds.perform(request); // Display the result ResultSet rs = query.getResultSet(); displayResultSet(rs, 1);
34
OGSA DAI Plans for 2005 l Transition to new platforms and standards u WS-RF (GT4), WS-I+ (OMII) u Alignment with published DAIS specifications l Data Integration u Implement simple patterns (e.g. AND, OR, PREFERRED, PARTIAL) within service code u Tighter integration of relational, XML and other resources u Better performance for inter-service data transfer l Releases, support and community u Releases provisionally in April and September u Seek contributions in various areas of new architecture u Moving forward to new versions of OGSA-DAI
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.