Kepler: Towards a Grid-Enabled System for Scientific Workflows Ilkay Altintas, Chad Berkley, Efrat Jaeger, Matthew Jones, Bertram Ludäscher*, Steve Mock.

Slides:



Advertisements
Similar presentations
Nimrod/K: Towards Massively Parallel Dynamic Grid Workflows David Abramson, Colin Enticott, Monash Ilkay Altinas, UCSD.
Advertisements

NeSCR Dec Bertram Ludaescher Scientific Workflows Based on Dataflow Process Networks (or from Ptolemy to Kepler) (or Workflow Considered Harmful.
Overview of the Science Environment for Ecological Knowledge (SEEK) Ricardo Scachetti Pereira.
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
UCSD SAN DIEGO SUPERCOMPUTER CENTER Ilkay Altintas Scientific Workflow Automation Technologies Provenance Collection Support in the Kepler Scientific Workflow.
The KEPLER Scientific Workflow System Bertram Ludäscher Ilkay Altintas … & the Kepler Team San Diego Supercomputer Center University of California, San.
Chad Berkley National Center for Ecological Analysis and Synthesis (NCEAS), University of California, Santa Barbara February.
Workflow Exchange and Archival: The KSW File and the Kepler Object Manager Shawn Bowers (For Chad Berkley & Matt Jones) University of California, Davis.
6th Biennial Ptolemy Miniconference Berkeley, CA May 12, 2005 Distributed Computing in Kepler Ilkay Altintas Lead, Scientific Workflow Automation Technologies.
February 11, 2010 Center for Hybrid and Embedded Software Systems Ptolemy II - Heterogeneous Concurrent Modeling and Design.
KEPLER: Overview and Project Status Bertram Ludäscher San Diego Supercomputer Center Associate Professor Dept. of Computer Science.
KEPLER: Overview and Project Status Bertram Ludäscher San Diego Supercomputer Center Associate Professor Dept. of Computer Science.
KEPLER Scientific Workflow System Bertram Ludäscher Knowledge-Based Information Systems Lab San Diego Supercomputer Center & Dept. of Computer Science.
Department of Electrical Engineering and Computer Sciences University of California at Berkeley The Ptolemy II Framework for Visual Languages Xiaojun Liu.
Building a KEPLER Extension using Ptolemy II KEPLER Collaboration Staff Presented by: Ilkay Altintas and Efrat San Diego Supercomputer Center,
Biology.sdsc.edu CIPRes in Kepler: An integrative workflow package for streamlining phylogenetic data analyses Zhijie Guan 1, Alex Borchers 1, Timothy.
January, 23, 2006 Ilkay Altintas
SDM Center A Quick Update on the TSI and PIW workflows SDM All Hands March 2-3, Terence Critchlow, Xiaowen Xin, Bertram.
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center National Partnership for Advanced.
Scientific Workflow reusing and long term big data preservation Salima Benbernou Université Paris Descartes Project.
Composing Models of Computation in Kepler/Ptolemy II
Introduction for BEAM Ecological Niche Modeling Working Meeting Deana Pennington University of New Mexico December 14, 2004.
Data R&D Issues for GTL Data and Knowledge Systems San Diego Supercomputer Center University of California, San Diego Bertram Ludäscher
Pipelines and Scientific Workflows with Ptolemy II Deana Pennington University of New Mexico LTER Network Office Shawn Bowers UCSD San Diego Supercomputer.
Introduction to Apache OODT Yang Li Mar 9, What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.
1 Kepler/SPA Extensions for Scientific Workflows – Now and Upcoming Ilkay Altintas SWAT lead San Diego Supercomputer Center Bertram Ludäscher.
Kepler/pPOD: Scientific Workflow and Provenance Support for Assembling the Tree of Life UC DAVIS Department of Computer Science The Kepler/pPOD Team Shawn.
NeSCR Dec Bertram Ludaescher Scientific Workflows Based on Dataflow Process Networks (or from Ptolemy to Kepler) (or Workflow Considered Harmful.
Science Environment for Ecological Knowledge Bertram Ludäscher San Diego Supercomputer Center University of California, San Diego
Science Environment for Ecological Knowledge: EcoGrid Matthew B. Jones National Center for.
SEEK meeting, UCSB, 10/22-26/2003 Towards Scientific Workflows Based on Dataflow Process Networks (or from Ptolemy to Kepler) Towards Scientific Workflows.
Semantic Mediation in SEEK/Kepler: Exploiting Semantic Annotation for Discovery, Analysis, and Integration of Scientific Data and Workflows Bertram Ludäscher.
Accelerating Scientific Exploration Using Workflow Automation Systems Terence Critchlow (LLNL) Ilkay Altintas (SDSC) Scott Klasky(ORNL) Mladen Vouk (NCSU)
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
SEEK EcoGrid l Integrate diverse data networks from ecology, biodiversity, and environmental sciences l Metacat, DiGIR, SRB, Xanthoria,... l EML is the.
Chad Berkley NCEAS National Center for Ecological Analysis and Synthesis (NCEAS), University of California Santa Barbara Long Term Ecological Research.
1 Ilkay ALTINTAS - July 24th, 2007 Ilkay ALTINTAS Director, Scientific Workflow Automation Technologies Laboratory San Diego Supercomputer Center, UCSD.
Towards Semantic Typing Support for Scientific Workflows Bertram Ludäscher Knowledge-Based Information Systems Lab San Diego Supercomputer Center University.
Research Design for Collaborative Computational Approaches and Scientific Workflows Deana Pennington January 8, 2007.
Grid Technologies Arcot Rajasekar (SEEK) Paul Watson (North East eScience Centre)
Rule-Based Programming for VORBs Bertram Ludaescher Arcot Rajasekar Data and Knowledge Systems San Diego Supercomputer Center U.C. San Diego.
Ontologies in Data and Application Integration – an Update Kai Lin Bertram Ludäscher Knowledge-Based Information Systems Lab Data and Knowledge Systems.
Paolo Missier (1), Bertram Luda ̈ scher (2), Shawn Bowers (3), Saumen Dey (2), Anandarup Sarkar (3), Biva Shrestha (4), Ilkay Altintas (5), Manish Kumar.
Quality views: capturing and exploiting the user perspective on data quality Paolo Missier, Suzanne Embury, Mark Greenwood School of Computer Science University.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Scientific Workflows. 2 Overview More background on workflows Kepler Details Example Scientific Workflows Other Workflow Systems.
ICCS WSES BOF Discussion. Possible Topics Scientific workflows and Grid infrastructure Utilization of computing resources in scientific workflows; Virtual.
Kepler includes contributors from GEON, SEEK, SDM Center and Ptolemy II, supported by NSF ITRs (SEEK), EAR (GEON), DOE DE-FC02-01ER25486.
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
1 Limitations of BLAST Can only search for a single query (e.g. find all genes similar to TTGGACAGGATCGA) What about more complex queries? “Find all genes.
EScience Workshop on Scientific Workflows Matthew B. Jones National Center for Ecological Analysis and Synthesis University of California Santa Barbara.
Scientific Workflow systems: Summary and Opportunities for SEEK and e-Science.
SDM center Supporting Heterogeneous Data Access in Genomics Terence Critchlow Ling Liu, Calton Pu GT Reagan Moore, Bertam Ludaescher, SDSC Amarnath Gupta.
Toward interactive visualization in a distributed workflow Steven G. Parker Oscar Barney Ayla Khan Thiago Ize Steven G. Parker Oscar Barney Ayla Khan Thiago.
SEEK Science Environment for Ecological Knowledge l EcoGrid l Ecological, biodiversity and environmental data l Computational access l Standardized, open.
Semantic Mediation and Scientific Workflows Bertram Ludäscher Data and Knowledge Systems San Diego Supercomputer Center University of California, San Diego.
Satisfying Requirements BPF for DRA shall address: –DAQ Environment (Eclipse RCP): Gumtree ISEE workbench integration; –Design Composing and Configurability,
Ocean Observatories Initiative OOI Cyberinfrastructure Life Cycle Objectives Review January 8-9, 2013 Scientific Workflows for OOI Ilkay Altintas Charles.
© Geodise Project, University of Southampton, Workflow Support for Advanced Grid-Enabled Computing Fenglian Xu *, M.
Workflow-Driven Science using Kepler Ilkay Altintas, PhD San Diego Supercomputer Center, UCSD words.sdsc.edu.
Collection-Based Persistent Archives Arcot Rajasekar, Richard Marciano, Reagan Moore San Diego Supercomputer Center Presented by: Preetham A Gowda.
Staging of the Ecological Niche Modeling Mammal Prototype Project Deana Pennington University of New Mexico December 14, 2004.
Efrat Jaeger – SDSC Bertram Ludäscher – UC DAVIS Krishna Sinha – Virginia Tech Ashraf Memon – SDSC Ghulam Memon – SDSC Ilkay Altintas – SDSC Kai Lin –
Ptolemy II - Heterogeneous Concurrent Modeling and Design in Java
Data R&D Issues for GTL Bertram Ludäscher Data and Knowledge Systems
Ptolemy II - Heterogeneous Concurrent Modeling and Design in Java
Ptolemy II - Heterogeneous Concurrent Modeling and Design in Java
Ptolemy II - Heterogeneous Concurrent Modeling and Design in Java
A Semantic Type System and Propagation
KEPLER: Overview and Project Status
Presentation transcript:

Kepler: Towards a Grid-Enabled System for Scientific Workflows Ilkay Altintas, Chad Berkley, Efrat Jaeger, Matthew Jones, Bertram Ludäscher*, Steve Mock San Diego Supercomputer Center (SDSC) University of California, San Diego (UCSD)

B. Ludäscher et al. – Grid-Enabling Kepler 2 Outline Motivation: Scientific Workflows (SEEK, SDM, GEON,..) Current Features of the Kepler Scientific Workflows System Extending Kepler: –Grid-Enabling Kepler: 3 rd party transfer –WF planning & optimization Shipping and Handling Algebra (SHA) Web Service Composition as Declarative Query Plans –Semantic Types for Scientific Workflows Conclusions

B. Ludäscher et al. – Grid-Enabling Kepler 3 Kepler Team, Projects, Sponsors Ilkay Altintas SDM Chad Berkley SEEK Shawn Bowers SEEK Jeffrey Grethe BIRN Christopher H. Brooks Ptolemy II Zhengang Cheng SDM Efrat Jaeger GEON Matt Jones SEEK Edward A. Lee Ptolemy II Kai Lin GEON Bertram Ludäscher BIRN, GEON, SDM, SEEK Steve Mock NMI Steve Neuendorffer Ptolemy II Jing Tao SEEK Mladen Vouk SDM Yang Zhao Ptolemy II … Ptolemy II

B. Ludäscher et al. – Grid-Enabling Kepler 4 Example: SEEK – Science Environment for Ecological Knowledge (large NSF ITR) Analysis & Modeling System –Design and execution of ecological models and analysis –End user focus – application-/upperware Semantic Mediation System –Data Integration of hard- to-relate sources and processes –Semantic Types and Ontologies – upper middleware EcoGrid –Access to ecology data and tools – middle-/underware Architecture Overview (cf. Cyberinfrastructure)

B. Ludäscher et al. – Grid-Enabling Kepler 5 Ecology: GARP Analysis Pipeline for Invasive Species Prediction Training sample (d) GARP rule set (e) Test sample (d) Integrated layers (native range) (c) Species presence & absence points (native range) (a) EcoGrid Query EcoGrid Query Layer Integration Layer Integration Sample Data + A3 + A2 + A1 Data Calculation Map Generation Validation User Validation Map Generation Integrated layers (invasion area) (c) Species presence &absence points (invasion area) (a) Native range prediction map (f) Model quality parameter (g) Environmental layers (native range) (b) Generate Metadata Archive To Ecogrid Registered Ecogrid Database Registered Ecogrid Database Registered Ecogrid Database Registered Ecogrid Database Environmental layers (invasion area) (b) Invasion area prediction map (f) Model quality parameter (g) Selected prediction maps (h) Source: NSF SEEK (Deana Pennington et. al, UNM)

B. Ludäscher et al. – Grid-Enabling Kepler 6 Genomics Example: Promoter Identification Workflow (PIW) Source: Matt Coleman (LLNL)

B. Ludäscher et al. – Grid-Enabling Kepler 7 Source: NIH BIRN (Jeffrey Grethe, UCSD)

B. Ludäscher et al. – Grid-Enabling Kepler 8 Scientific “Workflows”: Some Findings More dataflow than (business control-/) workflow –DiscoveryNet, Kepler, SCIRun, Scitegic, Taverna, Triana,, …, Need for “programming extension” –Iterations over lists (foreach); filtering; functional composition; generic & higher-order operations (zip, map(f), …) Need for abstraction and nested workflows Need for data transformations (WS1  DT  WS2) Need for rich user interaction & workflow steering: –pause / revise / resume –select & branch; e.g., web browser capability at specific steps as part of a coordinated SWF Need for high-throughput transfers (“grid-enabling”, “streaming”) Need for persistence of intermediate products and provenance

B. Ludäscher et al. – Grid-Enabling Kepler 9 Scientific “Workflows” vs Business Workflows Scientific “Workflows” –Dataflow and data transformations –Data problems: volume, complexity, heterogeneity –Grid-aspects Distributed computation Distributed data –User-interactions/WF steering –Data, tool, and analysis integration  Dataflow and control-flow are married! Business Workflows (BPEL4WS …) –Task-orientation: travel reservations; credit approval; BPM; … –Tasks, documents, etc. undergo modifications (e.g., flight reservation from reserved to ticketed), but modified WF objects still identifiable throughout –Complex control flow, complex process composition (danger of control flow/dataflow “spaghetti”)  Dataflow and control-flow are divorced!

B. Ludäscher et al. – Grid-Enabling Kepler 10 In a Flux: Workflow “Standards” Source: W.M.P. van der Aalst et al. Source: W.M.P. van der Aalst et al.

B. Ludäscher et al. – Grid-Enabling Kepler 11 Commercial & Open Source Scientific “Workflow” (well Dataflow ) Systems Kensington Discovery Edition from InforSense Taverna Triana

B. Ludäscher et al. – Grid-Enabling Kepler 12 SCIRun : Problem Solving Environments for Large-Scale Scientific Computing SCIRun: PSE for interactive construction, debugging, and steering of large-scale scientific computations New collaboration under Kepler/SDM Component model, based on generalized dataflow programming Steve Parker (cs.utah.edu)

Our Starting Point: Ptolemy II & Dataflow Process Networks see!see! try!try! read!read! Source: Edward Lee et al.

B. Ludäscher et al. – Grid-Enabling Kepler 14 Why Ptolemy II? Ptolemy II Objective: –“ The focus is on assembly of concurrent components. The key underlying principle in the project is the use of well-defined models of computation that govern the interaction between components. A major problem area being addressed is the use of heterogeneous mixtures of models of computation. ” Data & Process oriented: Dataflow process networks Natural Data Streaming Support User-Orientation –“application-ware”, not middle-/under-ware) –Workflow design & exec console (Vergil GUI) PRAGMATICS –mature, actively maintained, well-documented (500+pp) –open source system –developed across multiple projects (NSF/ITRs SEEK and GEON, DOE SciDAC SDM, …) –hoping to leverage e-sister projects (e.g. Taverna, …)

B. Ludäscher et al. – Grid-Enabling Kepler 15 Dataflow Process Networks: Putting Computation Models (“Orchestration”) first! Synchronous Dataflow Network (SDF) – Statically schedulable single-threaded dataflow Can execute multi-threaded, but the firing-sequence is known in advance –Maximally well-behaved, but also limited expressiveness Process Network (PN) –Multi-threaded dynamically scheduled dataflow –More expressive than SDF (dynamic token rate prevents static scheduling) –Natural streaming model Other Execution Models (“Domains”) –Implemented through different “ Directors ” actor typed i/o ports FIFO advanced push/pull

B. Ludäscher et al. – Grid-Enabling Kepler 16 Source: Edward Lee et al. Actor-/Dataflow Orientation vs Object-/ Control flow Orientation

B. Ludäscher et al. – Grid-Enabling Kepler 17 Marrying or Divorcing Control- & Dataflow Source: Edward Lee et al.

B. Ludäscher et al. – Grid-Enabling Kepler 18 Overview: Scientific Workflows in Kepler Modeling and Workflow Design Web services = individual components (“actors”) “Minute-Made” Application Integration: –Plugging-in and harvesting web service components is easy, fast Rich SWF modeling semantics (“directors”): –Different and precise dataflow models of computation –Clear and composable component interaction semantics  Web service composition and application integration tool Coming soon: –Shrinked wrapped, pre-packaged “Kepler-to-Go” –Structural and semantic typing (better design support) –Grid-enabled web services (for big data, big computations,…) –Different deployment models (web service, web site, applet, …)

B. Ludäscher et al. – Grid-Enabling Kepler 19 The KEPLER GUI: Vergil (Steve Neuendorffer, Ptolemy II) Drag and drop utilities, director and actor libraries.

B. Ludäscher et al. – Grid-Enabling Kepler 20 Running a Genomics WF (Ilkay Altintas, SDM)

B. Ludäscher et al. – Grid-Enabling Kepler 21 Support for Multiple Workflow Granularities Boulders Abstraction: Sand to Rocks Sand Powder Plumbing

B. Ludäscher et al. – Grid-Enabling Kepler 22 Directors and Combining Different Component Interaction Semantics Source: Edward Lee et al.

B. Ludäscher et al. – Grid-Enabling Kepler 23 Application Examples: Mineral Classification with Kepler … (Efrat Jaeger, GEON)

B. Ludäscher et al. – Grid-Enabling Kepler 24 … inside the Classifier

B. Ludäscher et al. – Grid-Enabling Kepler 25 Standard BrowserUI: Client-Side SVG

B. Ludäscher et al. – Grid-Enabling Kepler 26 SWF Reengineering (Ashraf, Efrat, Kai, GEON)

B. Ludäscher et al. – Grid-Enabling Kepler 27 DataMapper Sub-Workflow

B. Ludäscher et al. – Grid-Enabling Kepler 28 Result launched via BrowserUI actor (coupling with ESRI’s ArcIMS)

B. Ludäscher et al. – Grid-Enabling Kepler 29 Distributed Workflows in KEPLER Web and Grid Service plug-ins –WSDL (now) and Grid services (stay tuned …) –ProxyInit, GlobusGridJob, GridFTP, DataAccessWizard –SSH, SCP, SDSC SRB, OGS?-???… coming WS Harvester –Import query-defined WS operations as Kepler actors XSLT and XQuery Data Transformers –to link not “designed-to-fit” web services WS-deployment interface (planned)

B. Ludäscher et al. – Grid-Enabling Kepler 30 Generic Web Service Actor (Ilkay Altintas) Given a WSDL and the name of an operation of a web service, dynamically customizes itself to implement and execute that method. Configure - select service operation

B. Ludäscher et al. – Grid-Enabling Kepler 31 Set Parameters and Commit Set parameters and commit

B. Ludäscher et al. – Grid-Enabling Kepler 32 Specialized WS Actor (after instantiation)

B. Ludäscher et al. – Grid-Enabling Kepler 33 Web Service Harvester (Ilkay Altintas, SDM) Imports the web services in a repository into the actor library. Has the capability to search for web services based on a keyword.

B. Ludäscher et al. – Grid-Enabling Kepler 34 Composing 3 rd -Party WSs (NMI, Steve Mock) Output of previous web service User interaction & Transformations Input of next web service

B. Ludäscher et al. – Grid-Enabling Kepler 35 A Special Generic Ingestion Actor for EML Data (SEEK, Chad Berkley) Ingests any data format described by EML metadata Converts raw data to Ptolemy format Data can then be operated on with other actors

B. Ludäscher et al. – Grid-Enabling Kepler 36 Wrapping Legacy Applications

B. Ludäscher et al. – Grid-Enabling Kepler 37 Promoter Identification Workflow (PIW) Source: Matt Coleman (LLNL)

B. Ludäscher et al. – Grid-Enabling Kepler 38 Promoter Identification Workflow in Ptolemy-II [SSDBM’03] Execution Semantics

B. Ludäscher et al. – Grid-Enabling Kepler 39 hand-crafted control solution; also: forces sequential execution! designed to fit hand-crafted Web-service actor Complex backward control-flow No data transformations available

B. Ludäscher et al. – Grid-Enabling Kepler 40 Promoter Identification Workflow in FP genBankG :: GeneId -> GeneSeq genBankP :: PromoterId -> PromoterSeq blast :: GeneSeq -> [PromoterId] promoterRegion :: PromoterSeq -> PromoterRegion transfac :: PromoterRegion -> [TFBS] gpr2str :: (PromoterId, PromoterRegion) -> String d0 = Gid "7" -- start with some gene-id d1 = genBankG d0 -- get its gene sequence from GenBank d2 = blast d1 -- BLAST to get a list of potential promoters d3 = map genBankP d2 -- get list of promoter sequences d4 = map promoterRegion d3 -- compute list of promoter regions and... d5 = map transfac d get transcription factor binding sites d6 = zip d2 d4 -- create list of pairs promoter-id/region d7 = map gpr2str d6 -- pretty print into a list of strings d8 = concat d7 -- concat into a single "file" d9 = putStr d8 -- output that file

B. Ludäscher et al. – Grid-Enabling Kepler 41 Cleaned up Process Network PIW Back to purely functional dataflow process network (= also a data streaming model !) Re-introducing map ( f ) to Ptolemy- II (was there in PT Classic)  no control-flow spaghetti  data-intensive apps  free concurrent execution  free type checking  automatic support to go from piw(GeneId) to PIW := map (piw) over [GeneId] map (f)-style iterators Powerful type checking Generic, declarative “programming” constructs Generic data transformation actors Forward-only, abstractable sub- workflow piw(GeneId)

B. Ludäscher et al. – Grid-Enabling Kepler 42 Optimization by Declarative Rewriting I PIW as a declarative, referentially transparent functional process  optimization via functional rewriting possible e.g. map(f o g) = map(f) o map(g) Technical report &PIW specification in Haskell map(f o g) instead of map(f) o map(g) Combination of map and zip

B. Ludäscher et al. – Grid-Enabling Kepler 43 Optimizing II: Streams & Pipelines Clean functional semantics facilitates algebraic workflow (program) transformations (Bird-Meertens); e.g. mapS f mapS g  mapS ( f g ) Source: Real-Time Signal Processing: Dataflow, Visual, and Functional Programming, Hideki John Reekie, University of Technology, Sydney

Middle/Underware Access: Querying Databases Database connection actor: –Opening a database connection and passing it to all actors accessing this database. Database query actor: –A generic actor that queries a database and provides its result. DBConnection type and DBConnectionToken: –A new IOPort type and a token to distinguish a database connection from any general type.

Database Connection Actor OpenDBConnection actor: – Input : database connection information – Output : DBConnectionToken (reference to a DB connection instance, via a DBConnection output port)

Database Query Actor Database Query actor: – Input : SQL query string and a DB connection token – Parameters : output type: XML, Record, or String tuple-at-a-time vs set-at-a-time – Process : execute query produce results according to parameters

Querying Example

B. Ludäscher et al. – Grid-Enabling Kepler 48 An (oversimplified) Model of the Grid Hosts : {h1, h2, h3, …} Hosts : i }, j }, … Hosts : i }, j }, … Given : data/workflow: … as a functional plan: […; Y := f(X); Z := g(Y); …] … as a logic plan: […; f(X,Y)  g(Y,Z); …] Find Host Assignment : d i  h i, f j  h j for all d i, f j … s.t. […; := …] is a valid plan f g X Y Z

B. Ludäscher et al. – Grid-Enabling Kepler 49 Shipping and Handling Algebra (SHA) plan = of = 1.[ to A, := to C ] 2.[ => B, := to C ] 3.[ to C, => C, := ] Logical view Physical view: SHA Plans (1) (3) (2)

B. Ludäscher et al. – Grid-Enabling Kepler 50 Grid-Enabling PTII: Handles AB GAGB 1.A  GA: get_handle 2.GA  A: return &X 3.A  B: send &X 4.B  GB: request &X 5.GB  GA: request &X 6.GA  GB: send *X 7.GB  B: send done(&X) Example : &X = “GA.17” *X = Candidate Formalisms : GridFTP SSH, SCP SDSC SRB OGS?-??? … WSRF? Kepler space Grid space Logical token transfer (3) requires get_handle(1,2); then exec_handle(4,5,6,7) for completion.

B. Ludäscher et al. – Grid-Enabling Kepler 51 Extensions : Semantic Type Take concepts and relationships from an ontology to “semantically type” the data-in/out ports Application: e.g., design support: –smart/semi-automatic wiring, generation of “massaging actors” m 1 (normalize) p3p3 p4p4 Takes Abundance Count Measurements for Life Stages Returns Mortality Rate Derived Measurements for Life Stages

B. Ludäscher et al. – Grid-Enabling Kepler 52

B. Ludäscher et al. – Grid-Enabling Kepler 53

B. Ludäscher et al. – Grid-Enabling Kepler 54 Semantic Types The semantic type signature –Type expressions over the (OWL) ontology m 1 (normalize) p3p3 p4p4 SemType m1 :: Observation & itemMeasured.AbundanceCount & hasContext.appliesTo.LifeStageProperty -> DerivedObservation & itemMeasured.MortalityRate & hasContext.appliesTo.LifeStageProperty

B. Ludäscher et al. – Grid-Enabling Kepler 55 Extended Type System (here: OWL Semantic Types) SemType m1 :: Observation & itemMeasured.AbundanceCount & hasContext.appliesTo.LifeStageProperty  DerivedObservation & itemMeasured.MortalityRate & hasContext.appliesTo.LifeStageProperty Substructure association: XML raw-data =(X)Query=> object model =link => OWL ontology

B. Ludäscher et al. – Grid-Enabling Kepler 56 Semantic Types for Scientific Workflows

B. Ludäscher et al. – Grid-Enabling Kepler 57 Deriving Data Transformations from Semantic Service Registration [Bowers-Ludaescher, DILS’04]

B. Ludäscher et al. – Grid-Enabling Kepler 58 Structural and Semantic Mappings [Bowers-Ludaescher, DILS’04]

B. Ludäscher et al. – Grid-Enabling Kepler 59 Workflow Planning as Planning Queries with Limited Access Patterns User query Q : answer(ISBN, Author, Title)  book(ISBN, Author, Title), catalog(ISBN, Author), not library(ISBN). Limited (web service) Access Patterns (API) –Src1.books: in : ISBN out : Author, Title –Src1.books: in : Author out : ISBN, Title –Src2.catalog: in : {} out : ISBN, Author –Src3.library: in : {} out : ISBN Q is not executable, but feasible (equivalent to executable Q’: catalog ; book ; not library)  ICDE (poster), EDBT, PODS (papers), [Nash-Ludaescher,2004]

B. Ludäscher et al. – Grid-Enabling Kepler 60 Conclusions Summary – Kepler Scientific Workflow System – Open source, cross-project collaboration (SEEK, GEON, SDM,…) –Actor & Dataflow-oriented Modeling, Design, Execution ( Ptolemy II heritage) –Prototyping, static analysis, web services, data transformations Next Steps –First official release (“Kepler-to-Go”) April/May ’04 e-Science meeting NeSC, Edinburgh –Grid-enabling 3 rd party transfer, planning, optimization, … –Semantic Typing [DILS’04] –Provenance, Fault tolerance, … –Link-Up w/ e.g. Taverna, Pegasus, … –Become a member or co-developer (You!)