January, 23, 2006 Ilkay Altintas

Slides:



Advertisements
Similar presentations
GRADD: Scientific Workflows. Scientific Workflow E. Science laboris Workflows are the new rock and roll of eScience Machinery for coordinating the execution.
Advertisements

Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Nimrod/K: Towards Massively Parallel Dynamic Grid Workflows David Abramson, Colin Enticott, Monash Ilkay Altinas, UCSD.
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids for Collection Federation Reagan W. Moore University.
OGF-23 iRODS Metadata Grid File System Reagan Moore San Diego Supercomputer Center.
Digital Preservation Lifecycle Management Building a demonstration prototype for the preservation of large-scale multi-media collections Arcot Rajasekar.
UCSD SAN DIEGO SUPERCOMPUTER CENTER Ilkay Altintas Scientific Workflow Automation Technologies Provenance Collection Support in the Kepler Scientific Workflow.
Provenance in Open Distributed Information Systems Syed Imran Jami PhD Candidate FAST-NU.
DEV392: Extending SharePoint Products And Technologies Through Web Parts And ASP.NET Clint Covington, Program Manager Data And Developer Services - Office.
Chad Berkley National Center for Ecological Analysis and Synthesis (NCEAS), University of California, Santa Barbara February.
Kepler, Opal and Gemstone Amarnath Gupta University of California San Diego.
Workflow Exchange and Archival: The KSW File and the Kepler Object Manager Shawn Bowers (For Chad Berkley & Matt Jones) University of California, Davis.
6th Biennial Ptolemy Miniconference Berkeley, CA May 12, 2005 Distributed Computing in Kepler Ilkay Altintas Lead, Scientific Workflow Automation Technologies.
7th Biennial Ptolemy Miniconference Berkeley, CA February 13, 2007 Provenance Framework in Kepler Ilkay AltintasNorbert Podhorszki Contributors: S. Bowers,
The Earth System Grid Discovery and Semantic Web Technologies Line Pouchard Oak Ridge National Laboratory Luca Cinquini, Gary Strand National Center for.
EDUCATION YOU CAN TRUST ® SharePoint Designer 2010 Course Review Review provided by: DNS Computing Services, LLC
Biology.sdsc.edu CIPRes in Kepler: An integrative workflow package for streamlining phylogenetic data analyses Zhijie Guan 1, Alex Borchers 1, Timothy.
“This presentation is for informational purposes only and may not be incorporated into a contract or agreement.”
OpenMDR: Generating Semantically Annotated Grid Services Rakesh Dhaval Shannon Hastings.
1 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Requirements for caBIG Infrastructure to Support Semantic Workflows Yolanda.
10-1 aslkjdhfalskhjfgalsdkfhalskdhjfglaskdhjflaskdhjfglaksjdhflakshflaksdhjfglaksjhflaksjhf.
Max Planck Institute for Psycholinguistics Tool development report H. Brugman MPI Nijmegen.
© 2006 IBM Corporation IBM WebSphere Portlet Factory Architecture.
INFSO-RI Enabling Grids for E-sciencE Logging and Bookkeeping and Job Provenance Services Ludek Matyska (CESNET) on behalf of the.
Workflow Topics for the Next- Generation SDM-Center Ilkay Altintas Bertram Ludäscher San Diego Supercomputer Center.
National Center for Supercomputing Applications NCSA OPIE Presentation November 2000.
Science Environment for Ecological Knowledge: EcoGrid Matthew B. Jones National Center for.
Accelerating Scientific Exploration Using Workflow Automation Systems Terence Critchlow (LLNL) Ilkay Altintas (SDSC) Scott Klasky(ORNL) Mladen Vouk (NCSU)
Jan Hatje, DESY CSS ITER March 2009: Technology and Interfaces XFEL The European X-Ray Laser Project X-Ray Free-Electron Laser 1 CSS – Control.
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
SEEK EcoGrid l Integrate diverse data networks from ecology, biodiversity, and environmental sciences l Metacat, DiGIR, SRB, Xanthoria,... l EML is the.
Chad Berkley NCEAS National Center for Ecological Analysis and Synthesis (NCEAS), University of California Santa Barbara Long Term Ecological Research.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
SAN DIEGO SUPERCOMPUTER CENTER Inca Data Display (data consumers) Shava Smallen Inca Workshop September 5, 2008.
1 Ilkay ALTINTAS - July 24th, 2007 Ilkay ALTINTAS Director, Scientific Workflow Automation Technologies Laboratory San Diego Supercomputer Center, UCSD.
Accessing Grid Resources via Portals and Workflow Tools Accessing Grid Resources via Portals and Workflow Tools Sriram Krishnan, Ph.D.
Shannon Hastings Multiscale Computing Laboratory Department of Biomedical Informatics.
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure SRB + Web Services = Datagrid Management System (DGMS) Arcot.
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Management of Distributed Data Reagan W. Moore.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Kepler+PF+RWS, Kepler+PF+RWS, Podhorszki, Altintas et al. Provenance GGF18 RWS Provenance Experiments in Kepler (Kepler + PR + RWS) Norbert.
Technology behind using Taverna in caGrid caGrid user meeting Stian Soiland-Reyes, myGrid University of Manchester, UK
Presented by Scientific Annotation Middleware Software infrastructure to support rich scientific records and the processes that produce them Jens Schwidder.
ICCS WSES BOF Discussion. Possible Topics Scientific workflows and Grid infrastructure Utilization of computing resources in scientific workflows; Virtual.
Kepler includes contributors from GEON, SEEK, SDM Center and Ptolemy II, supported by NSF ITRs (SEEK), EAR (GEON), DOE DE-FC02-01ER25486.
OAI Overview DLESE OAI Workshop April 29-30, 2002 John Weatherley
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
Scientific Workflow systems: Summary and Opportunities for SEEK and e-Science.
Toward interactive visualization in a distributed workflow Steven G. Parker Oscar Barney Ayla Khan Thiago Ize Steven G. Parker Oscar Barney Ayla Khan Thiago.
Satisfy Your Technical Curiosity 27, 28 & 29 March 2007 International Convention Center (ICC) Ghent, Belgium.
Biomedical Informatics Research Network BIRN Workflow Portal.
SEEK Science Environment for Ecological Knowledge l EcoGrid l Ecological, biodiversity and environmental data l Computational access l Standardized, open.
©MIT LKTR Workshop, Digital Archive Policies and Trusted Digital Repositories MacKenzie Smith, MIT Libraries Reagan Moore, San Diego Supercomputer.
Satisfying Requirements BPF for DRA shall address: –DAQ Environment (Eclipse RCP): Gumtree ISEE workbench integration; –Design Composing and Configurability,
SAN DIEGO SUPERCOMPUTER CENTER Welcome to the 2nd Inca Workshop Sponsored by the NSF September 4 & 5, 2008 Presenters: Shava Smallen
1 CLASS – Simple NOAA Archive Access Portal SNAAP Eric Kihn and Rob Prentice NGDC CLASS Developers Meeting July 14th, 2008 Simple NOAA Archive Access Portal.
Ocean Observatories Initiative OOI Cyberinfrastructure Life Cycle Objectives Review January 8-9, 2013 Scientific Workflows for OOI Ilkay Altintas Charles.
Rights Management for Shared Collections Storage Resource Broker Reagan W. Moore
Supporting Collaborative Ontology Development in Protégé International Semantic Web Conference 2008 Tania Tudorache, Natalya F. Noy, Mark A. Musen Stanford.
Workflow-Driven Science using Kepler Ilkay Altintas, PhD San Diego Supercomputer Center, UCSD words.sdsc.edu.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Troubleshooting Workflow 8 Raymond Cruz, Software Support Engineer.
Excel Services Displays all or parts of interactive Excel worksheets in the browser –Excel “publish” feature with optional parameters defined in worksheet.
Information Systems and Network Engineering Laboratory I DR. KEN COSH WEEK 1.
October 2014 HYBRIS ARCHITECTURE & TECHNOLOGY 01 OVERVIEW.
Information Systems and Network Engineering Laboratory II
An Overview of Data-PASS Shared Catalog
The Re3gistry software and the INSPIRE Registry
A Semantic Type System and Propagation
SQL Server 2005 Reporting Services
Presentation transcript:

January, 23, 2006 Ilkay Altintas New Developments in Kepler January, 23, 2006 Ilkay Altintas

Kepler System Architecture Authentication GUI …Kepler GUI Extensions… Vergil Documentation Kepler Object Manager SMS Smart Re-run / Failure Recovery Provenance Framework Type System Ext Actor&Data SEARCH Kepler Core Extensions Ptolemy

Joint Authentication Framework Requirements: Coordinating between the different security architectures GEON uses GAMA which requires a single certificate authority. SEEK uses LDAP with has a centralized certificate authority with distributed subordinate Cas To connect LDAP with GAMA Coordinating between 2 different GAMA servers Single sign-on/authentication at the initialize step of the run for multiple actors that are using authentication This has issues related to single GAMA repository vs multiple, and requires users to have accounts on all servers. Kepler needs to be able to handle expired certificates for long-running workflows and/or for users who use it for a long time. A trust relation between the different GAMA servers must be established in order to allow for single authentication.

Functional Prototype Completed APIs and tests cases in place More work required on certificate renewal and multiple server access

Vergil is the GUI for Kepler Actor Search Data Search Actor ontology and semantic search for actors Search -> Drag and drop -> Link via ports Metadata-based search for datasets

Actor Search Building/searching a repository … Challenges: Building/searching a repository … Making changes to MoML (see KAR) GUI changes Ontology management Kepler Actor Ontology Used in searching actors and creating conceptual views (= folders) Currently 160 Kepler actors added!

Data Search and Usage of Results Kepler DataGrid Discovery of data resources through local and remote services SRB, Grid and Web Services, Db connections Registry of datasets on the fly using workflows

Vergil Updates Improve readability Develop cohesive visual language To make it more useful to the user Updated actor icons Menu redesign Improve readability Develop cohesive visual language Follow standard HF principles Improve organization Composite DB Query Computation or Operation Transformation Filter File Operation Web Service

Kepler Archives Purpose: Encapsulate WF data and actors in an archive file … inlined or by reference … version control More robust workflow exchange Easy management of semantic annotations Plug-in architecture (Drop in and use) Easy documentation updates A jar-like archive file (.kar) including a manifest All entities have unique ids (LSID) Custom object manager and class loader UI and API to create, define, search and load .kar files

KAR File Example <entity name="Multiply or Divide" class="ptolemy.kernel.ComponentEntity"> <property name="entityId" value="urn:lsid:localhost:actor:80:1" class="org.kepler.moml.NamedObjId"/> <property name="documentation" class="org.kepler.moml.DocumentationAttribute"></property> <property name="class" value="ptolemy.actor.lib.MultiplyDivide" class="ptolemy.kernel.util.StringAttribute"> <property name="id" value="urn:lsid:localhost:class:955:1" class="ptolemy.kernel.util.StringAttribute"/></property> <property name="multiply" class="org.kepler.moml.PortAttribute"> <property name="direction" value="input" class="ptolemy.kernel.util.StringAttribute"/> <property name="dataType" value="unknown" class="ptolemy.kernel.util.StringAttribute"/> <property name="isMultiport" value="true" class="ptolemy.kernel.util.StringAttribute"/></property> <property name="divide" class="org.kepler.moml.PortAttribute"> <property name="isMultiport" value="true" class="ptolemy.kernel.util.StringAttribute"/> </property> <property name="output" class="org.kepler.moml.PortAttribute"> <property name="direction" value="output" class="ptolemy.kernel.util.StringAttribute"/> <property name="isMultiport" value="false" class="ptolemy.kernel.util.StringAttribute"/></property> <property name="semanticType00" value="http://seek.ecoinformatics.org/ontology#ArithmeticMathOperationActor" class="org.kepler.sms.SemanticType"/> </entity>

Kepler Object Manager Designed to access local and distributed objects Objects: data, metadata, annotations, actor classes, supporting libraries, native libraries, etc. archived in kar files Advantages: Reduce the size of Kepler distribution Only ship the core set of generic actors and domains Easy exchange of full or partial workflows for collaborations Publish full workflows with their bound data Becomes a provenance system for derived data objects => Separate workflow repository and distributions easily

Initial Work on Provenance Framework Track origin and derivation information about scientific workflows, their runs and derived information (datasets, metadata…) Need for Provenance Association of process and results reproduce results “explain & debug” results (via lineage tracing, parameter settings, …) optimize: “Smart Re-Runs” Types of Provenance Information: Data provenance Intermediate and end results including files and db references Process (=workflow instance) provenance Keep the wf definition with data and parameters used in the run Error and execution logs Workflow design provenance (quite different) WF design is a (little supported) process (art, magic, …) for free via cvs: edit history need more “structure” (e.g. templates) for individual & collaborative workflow design

Kepler Provenance Recording Utility Parametric and customizable Different report formats Variable levels of detail Verbose-all, verbose-some, medium, on error Multiple cache destinations Saves information on User name, Date, Run, etc…

Provenance: Possible Next Steps Provenance Meeting: Last week at SDSC Deciding on terms and definitions .kar file generation, registration and search for provenance information Possible data/metadata formats Automatic report generation from accumulated data A GUI to keep track of the changes Adding provenance repositories A relational schema for the provenance info in addition to the existing XML

What other system functions does provenance relate to? Failure recovery Smart re-runs Semantic extensions Kepler Data Grid Reporting and Documentation Authentication Data registration Re-run only the updated/failed parts Guided documentation generation an updates

Hot Topics in Kepler http://kepler-project.org/Wiki.jsp?page=HotTopics