Web: OGSA-DAI 3.0 Ally Hume, Amy Krause OGSA-DAI Workshop 17th October 2007.

Slides:



Advertisements
Similar presentations
Tom Sugden EPCC OGSA-DAI Future Directions OGSA-DAI User's Forum GridWorld 2006, Washington DC 14 September 2006.
Advertisements

© 2007 Open Grid Forum Data Management Challenge - The View from OGF OGF22 – February 28, 2008 Cambridge, MA, USA Erwin Laure David E. Martin Data Area.
1 OGSA-DAI Platform Dependencies Malcolm Atkinson for OMII SC 18 th January 2005.
CACORE TOOLS FEATURES. caCORE SDK Features caCORE Workbench Plugin EA/ArgoUML Plug-in development Integrated support of semantic integration in the plugin.
An Overview of OGSA-DAI Kostas Tourlas
A Prototype Implementation of a Framework for Organising Virtual Exhibitions over the Web Ali Elbekai, Nick Rossiter School of Computing, Engineering and.
BiodiversityWorld GRID Workshop NeSC, Edinburgh – 30 June and 1 July 2005 Resource wrappers, web services, grid services Jaspreet Singh School of Computer.
GridFTP: File Transfer Protocol in Grid Computing Networks
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
G O B E Y O N D C O N V E N T I O N WORF: Developing DB2 UDB based Web Services on a Websphere Application Server Kris Van Thillo, ABIS Training & Consulting.
Inside the GDS The Engine, Activities, Data Resource Implementations and Role Mapping EPCC, University of Edinburgh Tom Sugden First.
NextGRID & OGSA Data Architectures: Example Scenarios Stephen Davey, NeSC, UK ISSGC06 Summer School, Ischia, Italy 12 th July 2006.
1 An Introduction to OGSA-DAI Konstantinos Karasavvas 13 th September 2005.
Mike Jackson EPCC OGSA-DAI Today Release 2.2 Principles and Architectures for Structured Data Integration: OGSA-DAI.
Microsoft ® Official Course Interacting with the Search Service Microsoft SharePoint 2013 SharePoint Practice.
17 July 2006ISSGC06, Ischia, Italy1 Agenda Session 26 – 14:30-16:00 An Overview of OGSA-DAI OGSA-DAI today – and future features How to extend OGSA-DAI.
Service Broker Lesson 11. Skills Matrix Service Broker Service Broker, provides a solution to common problems with message delivery and consistency that.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
SQL Server 2000 and XML Erik Veerman Consultant Intellinet Business Intelligence.
DAT304 Leveraging XML and HTTP with Sql Server Irwin Dolobowsky Program Manager Webdata Group.
CVSQL 2 The Design. System Overview System Components CVSQL Server –Three network interfaces –Modular data source provider framework –Decoupled SQL parsing.
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
Overview of SQL Server Alka Arora.
 Cloud computing  Workflow  Workflow lifecycle  Workflow design  Workflow tools : xcp, eucalyptus, open nebula.
SSC2: Web Services. Web Services Web Services offer interoperability using the web Web Services provide information on the operations they can perform.
1 Dr. Markus Hillenbrand, ICSY Lab, University of Kaiserslautern, Germany A Generic Database Web Service for the Venice Service Grid Michael Koch, Markus.
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
1 Designing a Data Exchange - Best Practices Data Exchange Scenarios –Sender vs. Receiver-initiated exchanges –Node Design Best Practices: –Handling Large.
Developing Reporting Solutions with SQL Server
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
Stephen Booth EPCC Stephen Booth GridSafe Overview.
COMP3019 Coursework: Introduction to GridSAM Steve Crouch School of Electronics and Computer Science.
1 Overview of the Application Hosting Environment Stefan Zasada University College London.
OGSA-DAI Architecture The OGSA-DAI Team
London e-Science Centre GridSAM Job Submission and Monitoring Web Service William Lee, Stephen McGough.
ILDG Middleware Status Chip Watson ILDG-6 Workshop May 12, 2005.
OGSA-DAI in OMII-Europe Neil Chue Hong EPCC, University of Edinburgh.
1 Geospatial and Business Intelligence Jean-Sébastien Turcotte Executive VP San Francisco - April 2007 Streamlining web mapping applications.
1 1 EPCC 2 Curtin Business School & Edinburgh University Management School Michael J. Jackson 1 Ashley D. Lloyd 2 Terence M. Sloan 1 Enabling Access to.
Web: Minimal Metadata for Data Services Through DIALOGUE Neil Chue Hong AHM2007.
Wrapping Scientific Applications As Web Services Using The Opal Toolkit Wrapping Scientific Applications As Web Services Using The Opal Toolkit Sriram.
OGSA-DAI.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
INFSO-RI Enabling Grids for E-sciencE OGSA DAI Data Access and Integration Marek Ciglan Institute of Informatics, Slovac Academy.
The OGSA-DAI Client Toolkit The OGSA-DAI Team
State Key Laboratory of Resources and Environmental Information System China Integration of Grid Service and Web Processing Service Gao Ang State Key Laboratory.
Mike Jackson EPCC OGSA-DAI Architecture + Extensibility OGSA-DAI Tutorial GGF17, Tokyo.
Metadata Mòrag Burgon-Lyon University of Glasgow.
OGSA-DAI Neil Chue Hong 29 th January 2007 OGF19, Chapel Hill.
Amy Krause EPCC OGSA-DAI An Overview OGSA-DAI Technology Update GGF17, Tokyo (Japan)
EGEE User Forum Data Management session Development of gLite Web Service Based Security Components for the ATLAS Metadata Interface Thomas Doherty GridPP.
© FPT SOFTWARE – TRAINING MATERIAL – Internal use 04e-BM/NS/HDCV/FSOFT v2/3 JSP Application Models.
CSI 3125, Preliminaries, page 1 SERVLET. CSI 3125, Preliminaries, page 2 SERVLET A servlet is a server-side software program, written in Java code, that.
©SoftMooreSlide 1 Introduction to HTML: Forms ©SoftMooreSlide 2 Forms Forms provide a simple mechanism for collecting user data and submitting it to.
REST By: Vishwanath Vineet.
SEcurE access to GEOspatial services OGC-OGF Collaboration workshop Open Grid Forum 21 (OGF21) October, 2007 Chris Higgins (EDINA, University of Edinburgh)
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Evaluating Metadata access strategies with.
Mike Jackson EPCC OGSA-DAI Today – Release 8 OGSA-DAI Tutorial GGF17, Tokyo.
1 CLASS – Simple NOAA Archive Access Portal SNAAP Eric Kihn and Rob Prentice NGDC CLASS Developers Meeting July 14th, 2008 Simple NOAA Archive Access Portal.
Globus Data Storage Interface (DSI) - Enabling Easy Access to Grid Datasets Raj Kettimuthu, ANL and U. Chicago DIALOGUE Workshop August 2, 2005.
Grids, Grid Data Services and OGSA-DAI Mike Mineter NeSC-TOE
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
Apache Cocoon – XML Publishing Framework 데이터베이스 연구실 박사 1 학기 이 세영.
OGSA-DAI.
Amy Krause EPCC OGSA-DAI An Overview OGSA-DAI on OMII 2.0 OMII The Open Middleware Infrastructure Institute NeSC,
Open Source distributed document DB for an enterprise
Optimising the OGSA-DAI Enactment Model
Grid Systems: What do we need from web service standards?
Presentation transcript:

Web: OGSA-DAI 3.0 Ally Hume, Amy Krause OGSA-DAI Workshop 17th October 2007

Web: Overview What is OGSA-DAI? Sharing data in a grid Data-centric workflows Accessing OGSA-DAI Components and customisation Case study – SEE-GEO Performance

Web: What is OGSA-DAI? An extensible framework accessed via web services that executes data-centric workflows involving heterogeneous data resources for the purposes of data access, integration, transformation and delivery within a grid and is intended as a toolkit for building higher-level application-specific data services

Web: OGSA-DAI 3.0 OGSA-DAI has evolved constantly since February 2002 OGSA-DAI 2.2 released April 2006 As the number of users grew so did the requirements o More effective data streaming o Standardisation of activity inputs and outputs o Targeting multiple data resources in a single workflow o Supporting application-specific presentation layers OGSA-DAI 2.2 was not suitable for addressing these OGSA-DAI 3.0 o A complete re-design and re-implementation of OGSA-DAI o A stable framework for the future o Released September 2007

Web: Sharing data in a grid

Web: Motivation Grid is about sharing resources OGSA-DAI is about sharing structured data resources

Web: Sharing data via web site download ZIP up data and put it on a web site Pros o Easy distribution for providers o Easy access for consumers Cons o Consumers have to download all the data o Consumers have to load data into local databases to use it o Static snapshot o Security

Web: Sharing data via direct access Providers tell consumers o Database URL – mycomputer.epcc.ed.ac.uk:3306 o Username – userID o Password – password Pros o Consumers have direct access Cons o Firewall issues o User and password management is hard o No consistent security model o Hard to use in grid/web service workflows

Web: Sharing data via direct access Cons (continued) o No server-side layer in which to standardize database heterogeneities o Myriad drivers o Different APIs across different data types Relational and JDBC XML and XMLDB Indexed files and Lucene

Web: Manipulate data using domain-specific operations, e.g. o Book findByISBN(ISBN) o List findByAuthor(Author) o List findByKeyword(Word) Pros o Fits with grid/web service approach o Abstraction hides back-end database details o Web services are programming language neutral o Operations likely to map well to authorization policies Domain-specific web services

Web: Cons o Slower than direct access Web service layer SOAP transport overhead – especially for large result sets o Domain-specific API prevents use of generic data exploration, mining and manipulation tools Domain-specific web services

Web: OGSA-DAI generic web services Manipulate data using OGSA-DAI’s generic web services Clients sees the data in its ‘raw’ format, e.g. o Tables, columns, rows for relational data o Collections, elements etc. for XML data Clients can obtain the schema of the data Clients send queries in appropriate query language, e.g. SQL, XPath

Web: Getting away from SOAP – workflows

Web: Getting away from SOAP Asides from FTP there is also… SOAP attachments o Data comes along with, but external to, a SOAP message GridFTP …

Web: Data-centric workflows

Web: OGSA-DAI is not just about data access SQLQuery SELECT * FROM Bands WHERE name = Bangles; TupleToWebRowSetCharArrays DeliverToURL ftp:// XSLTransform ObtainFromHTTP esheets/webRowSetToHTML.xsl tuples XSL HTML WebRowSet XML Access Transform Deliver Data streams between activities Activity Request

Web: Data integration with OGSA-DAI workflows Using a single workflow

Web: Data integration with OGSA-DAI workflows Across OGSA-DAI services

Web: Distributed query processing

Web: Workflows in more detail

Web: Activities An activity is a named unit of functionality o A well defined workflow unit o Pluggable Example activities include o Execute an SQL query o ZIP a batch of data o List the files in a directory o Execute an XSL transform on an XML document o Deliver data to an FTP server Comprehensive and consistent standard activity set o Karasavvas, K. Atkinson, M.P. and Hume, A.C. OGSA-DAI – Redesigned and New Activities o AndNewActivitiesV1.9.pdf AndNewActivitiesV1.9.pdf

Web: Activity inputs and outputs An activity can have o 0 or more named inputs o 0 or more named outputs Blocks of data flow from an activity’s output into another activity’s input

Web: Activity inputs and parameters No distinction between inputs and parameters Input literal o Special kind of input o Value is provided by client Client chooses whether input value is o Specified by the client in a request o Is obtained from the output of another activity

Web: Activity input and output types and blocks Inputs expect blocks of specific types Outputs produce blocks of specific types

Web: Block types Java’s basic types o Object, String, Integer, Long, Double, Number, Boolean Binary types o char[], byte[], Clob, Blob Tuple o OGSA-DAI representation of a row of relational data o One element per column MetadataWrapper o OGSA-DAI wrapper for any object to be treated as meta-data o Use application-specific meta-data within OGSA-DAI o Individual activities handle metadata blocks as they see fit Application-specific objects

Web: Blocks and binary data BLOBs o BLOBs obtained from databases are stored as BLOB objects within Tuples o References to entire BLOBs are passed between activities o Keep data grouped as a tuple Byte arrays o Data obtained from FTP o Fits pipeline streaming model used in OGSA-DAI All binary data processing activities can handle both representations

Web: Blocks and lists A list groups related blocks together o Special blocks are used to mark the beginning and the end of a list For example SQLQuery can dynamically take any number of SQL query expressions as input o Lists allow differentiation between the results of query1 and those of query2 Activities define the granularity of their inputs and outputs

Web: Activities and resources Activities can be targeted at OGSA-DAI resources Data resource o OGSA-DAI abstraction of a data resource Session o OGSA-DAI container for state Data source o Exposes data for asynchronous retrieval (pull) Data sink o Receives data for asynchronous delivery (push)

Web: Executing workflows – data streaming Activities in a workflow execute in parallel Data streams through activities in a pipeline-like way Each activity operates on a different portion of a data stream o If the activities are well defined

Web: Types of workflows Pipeline workflow o A set of chained activities executed in parallel with data flowing between the activities Sequence workflow o A set of sub-workflows each executed in sequence o For example Sub-workflow 1 – create a database table Sub-workflow 2 – bulk load data into the table Parallel workflow o A set of sub-workflows executed in parallel

Web: Example workflows

Web: Query – Transform – Deliver

Web: Query – Transform – Deliver

Web: Inter-database data transfer

Web: Get and deliver BLOBs

Web: Federate resources via resource groups

Web: Spawn sub-workflows

Web: Execute complex data-centric workflows Obtain scan data for scans since date d of embryos in stage s showing expression of gene g

Web: Using OGSA-DAI

Web: Accessing OGSA-DAI – executing workflows Client submits workflow (= request) to data request execution service Data request execution service (DRES) o Web service o Exposes a data request execution resource (DRER)

Web: Accessing OGSA-DAI – executing workflows Request status o Returned to client o Status of execution of each activity in the workflow Did it complete? Did it run into an error? o Status of execution of whole workflow Derived from status of individual activities Did they all complete? Did any run into errors? Was the workflow prematurely terminated by the client? o Data – depending upon the activities in the workflow

Web: Accessing OGSA-DAI Data Request Execution Resource (DRER) o Workflow execution engine Parses workflow Creates activities Provides activities with target resources (if any) Executes workflow Builds request status Manages sessions Data resources o OGSA-DAI abstractions of data resources databases, file systems, web services,… o Provides access to the data resource e.g. via JDBC, XMLDB, Java File I/O,…

Web: More OGSA-DAI resources and services Data sources o Expose data for asynchronous retrieval (pull) Data sinks o Receive data from asynchronous delivery (push) Sessions o A state container associated with a set of workflows o Share state between workflows Requests o One per workflow submitted to a DRER o Access request status

Web: Resources and activities revisited Activities can be written to interact with any type of resource o SQLQuery – JDBC data resource o XPathQuery – XMLDB data resource o SQLBag – ResourceGroup data resource o ObtainFromDataSource – data source Some activities can create resources o CreateDataSource o CreateDataSink o CreateResourceGroup

Web: Components and customisation

Web: OGSA-DAI 3.0 Extension Points OMII Transform GTAxisUNICOREWS-DAI? Embedded OGSA-DAI Core Resource management Activity management Workflow engine RDBFile?XMLDB SQLQuery?DeliverToFTPObtainFromGFTP gLite

Web: OGSA-DAI 3.0 Persistence and Configuration OMII ActivitiesData Resources GTAxisUNICOREgLiteWS-DAI?Embedded OGSA-DAI Core Resource management Activity management Workflow engine

Web: Extending OGSA-DAI – activities Additional generic functionality o e.g. deliverToMessageQueue Additional resource-specific functionality o e.g. sqlStoredProcedure Application-specific functionality o e.g. transformToFasta

Web: Example – application-specific activities

Web: Extending OGSA-DAI – data resources A data resource can be anything… o Local or remote o Real or virtual o Persistent or in-memory For example o A view onto a relational database o A new XML database o Open Geospatial Consortium (OGC) data access services o Application specific web service

Web: Extending OGSA-DAI – presentation layers Expose workflows Hide OGSA-DAI behind domain-specific web services o Map service operations to “template” OGSA-DAI workflows o Assist in using OGSA-DAI within workflow engines e.g. Taverna

Web: Example – OGSA-DAI and Globus Security Authorization on incoming SOAP request

Web: Example – OGSA-DAI and Globus Security SecurityContext object o One for each request o By default contains DN and credential Login provider plug-in objects o One for each relational resource o Maps SecurityContext -> database login ResourceAuthorizer plug-in object o Used by ResourceAuthorizerPDP o Listens to event from resource manager RuntimeWorkflowAuthorizer plug-in object o Authorizes dynamically created workflows at runtime

Web: Client Toolkit

Web: Clients and web services Clients interact with web services via SOAP over HTTP o Deduce service interface from service WSDL description o Construct SOAP request to invoke operation o Parse SOAP response from service

Web: OGSA-DAI client toolkit Client-side abstractions of o Activities o Workflows o Resources o Services Get client-side proxies for OGSA-DAI resources exposed by OGSA-DAI services Submit workflows to these proxies Client toolkit manages o Submission of workflow to OGSA-DAI service o Parsing of the request execution status and data from the service Focus on constructing applications

Web: OGSA-DAI case-study – SEE-GEO

Web: SEE-GEO SEcurE access to GEOspatial services o EDINA, NeSC, NCeSS, MIMAS o Access to geospatial information on a grid Open Geospatial Consortium (OGC) web services

Web: OGC Geolinking Interoperability Experiment OGSA-DAI being extended to offer integrated, distributed resource management for geo-spatial tools Using established open interoperability standards Web Feature Service (WFS) and Web Map Service (WMS) integrated into OGSA-DAI The IE is hardening candidate OGC specifications o Geolinked Data Access Service (GDAS) o GeoLinking Service (GLS) Validate Web Coverage Service (WCS) scheduled Extend to support secure access

Web: e-Social Science demonstrator Two data resources o Census statistics Attributes about a region e.g. the cost of a loaf of bread Geo-data access service (GDAS) o Borders data Unique regions encoded as polygons Web feature service (WFS) How to link the attributes to the regions? A geo-linking service o Execute a join across the two data sets Implemented as a Web Processing Service (WPS)

Web: Demographic forecasting Census DB Borders DB WFS GDAS OGSA-DAI getData getFeature geoLink Feature Portrayal GLS Portal Map Server Receive ticket for results Retrieve annotated image Store image on server Send parameterised query FPS Call out to existing FP service Cache attributes Stream polygons Request attributes Request features Run algorithm Stream relevant annotated polygons Concentrate on algorithm Access domain-specific data sets Utilise existing services Efficient delivery methods

Web: What did OGSA-DAI give SEE-GEO? Could implement GLS service without OGSA- DAI But using OGSA-DAI allowed leverage of o Workflow engine o Out-of-the-box activities for Queries Delivery o Security o Other grid technologies e.g. GridFTP

Web: What did OGSA-DAI give SEE-GEO? A toolkit to o Develop domain-specific activities o Develop support for domain-specific data resources o Ability to execute workflows using these o Build OGC Web Processing Services (WPS) Relatively little effort to o Choose different data resources dynamically o Merge GDAS XML into a relational data resource o Transfer data using GridFTP o Protect data using GSI o Experiment!

Web: What next for SEE-GEO? Deployment o Component integration o Bug fixing o Testing o Performance testing of OGSA-DAI as a GLS Complete participation in the OGC Interoperability Experiment Add Web Coverage Service (WCS) Look at security and OGC o Shibboleth o Grid Security Infrastructure (GSI) o PrivilEge and Role Management Infrastructure Standards Validation (PERMIS) OGSA-DAI:Z SRW/U bridge o Ordnance Survey Master Map delivery using a grid

Web: OGSA-DAI and Performance

Web: Performance OGSA-DAI is (another) component that sits between clients and the data they want How can we minimize the overhead of OGSA- DAI? How can OGSA-DAI be used effectively?

Web: Synchronous execution Client submits workflow to OGSA-DAI OGSA-DAI does not return until workflow has executed Request status is then returned to the client Pros o Good for interactive clients which need constant communication with an OGSA-DAI server for small operations Cons o Data is returned via SOAP/HTTP which incurs a performance hit o Not ideal for complex requests with lots of operations o Not ideal for requests which return large volumes of data o Not ideal for responsive clients o Not ideal for clients that need interim results during time-consuming workflows

Web: Asynchronous execution Client submits workflow to OGSA-DAI OGSA-DAI returns immediately with initial request status Client contacts OGSA-DAI later to retrieve final request status Pros o Can avoid returning data via SOAP/HTTP o Good for complex requests with lots of operations o Good for requests which produce large volumes of data o Good for responsive clients o Good for clients that need interim results during time-consuming workflows Cons o Client must poll OGSA-DAI to determine when execution is complete

Web: Request status Synchronous request – returned by data request execution service Asynchronous request – returned by request management service Pros o Get data directly from OGSA-DAI o Easy to manipulate client-side – client toolkit supports extraction of data and parsing into useful objects Cons o Is transferred from server to client via SOAP/HTTP and so incurs serialization/deserialization overhead o Performance (time and memory) quickly degrades if it contains large amounts of data

Web: Improving request status – aggregators Aggregator activities make request status more scalable Group character and byte arrays into larger chunks Improve performance up to 50% in some scenarios

Web: Aggregators

Web: File Transfer Protocol (FTP) Standard OGSA-DAI delivery option Limited by OGSA-DAI throughput and file transfer rate OGSA-DAI streaming model => should be higher than standard FTP as there is data collection and processing being executed simultaneously o e.g. 60MB data transfer FTP is 30s, OGSA-DAI transform to CSV file = +15% hit Pros o Improved scalability over request status o Great for large data sets of 10,000,000+ rows o Useful if using OGSA-DAI in conjunction with third-party systems Cons o Requires an FTP server o Requires clients to pick up the data from the FTP server

Web: Data sources and data sinks Data sources o Data can be streamed from an activity into a data source o Clients pull the data from the data source via a data source service o Options to pull all the data back at once or a set number of blocks at a time Pros o No need for external components e.g. FTP o Can stream back data in small chunks o Client toolkit supports interaction with data sources and parsing data into a useful form o More performant that using request status o Used with asynchronous requests it can handle 1,000,000 row datasets Cons o Data is returned via SOAP/HTTP but aggregators can offset this o Limited storage capacity for synchronous requests – a workflow will block when capacity is reached Data sinks o Complement of data sources o Used for transferring data from clients to OGSA-DAI

Web: Summary

Web: OGSA-DAI 3.0 Releases OGSA-DAI Project o OGSA-DAI 3.0 on Globus Toolkit o OGSA-DAI 3.0 on Apache Axis 1.4 or OMII-Europe (OMII-EU) Project o OGSA-DAI 3.0 on UNICORE 6 o OGSA-DAI 3.0 on gLite 3.0 o

Web: Future Events Training courses offered by OMII-Europe and OMII-UK o e-Science Institute, Edinburgh o Deploying Grid Data Services using OGSA-DAI o Thursday 1 st -Friday 2 nd November 2007

Web: Summary OGSA-DAI not just an out-of-the box application for data access OGSA-DAI is o an extensible framework o accessed via web services o that executes data-centric workflows o involving heterogeneous data resources o for the purposes of data access, integration, transformation and delivery o within a Grid o and is intended as a toolkit for building higher-level application-specific data services

Web: Further information Come and chat to us at the booth Grab anyone wearing our T-shirts OGSA-DAI o WWW site – o Info – o Users list – OMII-UK o WWW site – o Info –