1 Ilkay ALTINTAS - October, 2007 Ilkay ALTINTAS Lab Director, Scientific Workflow Automation Technologies San Diego Supercomputer Center, UCSD Kepler Scientific.

Slides:



Advertisements
Similar presentations
High Performance Wireless Research and Education Network
Advertisements

Overview of the Science Environment for Ecological Knowledge (SEEK) Ricardo Scachetti Pereira.
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
SACNAS, Sept 29-Oct 1, 2005, Denver, CO What is Cyberinfrastructure? The Computer Science Perspective Dr. Chaitan Baru Project Director, The Geosciences.
Matthew B. Jones Jim Regetz National Center for Ecological Analysis and Synthesis (NCEAS) University of California Santa Barbara NCEAS Synthesis Institute.
UCSD SAN DIEGO SUPERCOMPUTER CENTER Ilkay Altintas Scientific Workflow Automation Technologies Provenance Collection Support in the Kepler Scientific Workflow.
6th Biennial Ptolemy Miniconference Berkeley, CA May 12, 2005 Distributed Computing in Kepler Ilkay Altintas Lead, Scientific Workflow Automation Technologies.
February 11, 2010 Center for Hybrid and Embedded Software Systems Ptolemy II - Heterogeneous Concurrent Modeling and Design.
Ngu, Texas StatePtolemy Miniconference, February 13, 2007 Flexible Scientific Workflows Using Dynamic Embedding Anne H.H. Ngu, Nicholas Haasch Terence.
1 CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Global Earth Observation Grid Workshop, Bangkok, Thailand, March Integration Platform.
A Kepler-based Three Tier Architecture applied to LiDAR Interpolation and Analysis Efrat Frank, Ilkay Altintas San Diego Supercomputer Center, UCSD Configuration.
The Kepler Project Overview, Status, and Future Directions Matthew B. Jones on behalf of the Kepler Project team National Center for Ecological Analysis.
April 2009 OSG Grid School - RDU 1 Open Science Grid John McGee – Renaissance Computing Institute University of North Carolina, Chapel.
Annual SERC Research Review - Student Presentation, October 5-6, Extending Model Based System Engineering to Utilize 3D Virtual Environments Peter.
 Scientific workflow management system based on Ptolemy II  Allows scientists to visually design and execute scientific workflows  Actor-oriented.
Biology.sdsc.edu CIPRes in Kepler: An integrative workflow package for streamlining phylogenetic data analyses Zhijie Guan 1, Alex Borchers 1, Timothy.
V. Chandrasekar (CSU), Mike Daniels (NCAR), Sara Graves (UAH), Branko Kerkez (Michigan), Frank Vernon (USCD) Integrating Real-time Data into the EarthCube.
Computing in Atmospheric Sciences Workshop: 2003 Challenges of Cyberinfrastructure Alan Blatecky Executive Director San Diego Supercomputer Center.
January, 23, 2006 Ilkay Altintas
SDM Center A Quick Update on the TSI and PIW workflows SDM All Hands March 2-3, Terence Critchlow, Xiaowen Xin, Bertram.
Composing Models of Computation in Kepler/Ptolemy II
A Proposal for a Distributed Earth Observation Data Network Matthew B Jones UC Santa Barbara National Center for Ecological Analysis and Synthesis (NCEAS)
The Digital Library for Earth System Education: A Community Resource
Introduction for BEAM Ecological Niche Modeling Working Meeting Deana Pennington University of New Mexico December 14, 2004.
Data R&D Issues for GTL Data and Knowledge Systems San Diego Supercomputer Center University of California, San Diego Bertram Ludäscher
O C I October 31, 2006Office of CyberInfrastructure1 Software Development for Cyberinfrastructure (SDCI) and Cyberinfrastructure for Environmental Observatories:
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
National Center for Supercomputing Applications University of Illinois at Urbana-Champaign Dynamic Virtual Observatories James Myers, Luigi Marini, Rob.
Science Environment for Ecological Knowledge: EcoGrid Matthew B. Jones National Center for.
IPlant cyberifrastructure to support ecological modeling Presented at the Species Distribution Modeling Group at the American Museum of Natural History.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
Accelerating Scientific Exploration Using Workflow Automation Systems Terence Critchlow (LLNL) Ilkay Altintas (SDSC) Scott Klasky(ORNL) Mladen Vouk (NCSU)
The Future of the iPlant Cyberinfrastructure: Coming Attractions.
Chad Berkley NCEAS National Center for Ecological Analysis and Synthesis (NCEAS), University of California Santa Barbara Long Term Ecological Research.
1 Ilkay ALTINTAS - July 24th, 2007 Ilkay ALTINTAS Director, Scientific Workflow Automation Technologies Laboratory San Diego Supercomputer Center, UCSD.
Geosciences - Observations (Bob Wilhelmson) The geosciences in NSF’s world consists of atmospheric science, ocean science, and earth science Many of the.
National Center for Supercomputing Applications Barbara S. Minsker, Ph.D. Associate Professor National Center for Supercomputing Applications and Department.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
ICCS WSES BOF Discussion. Possible Topics Scientific workflows and Grid infrastructure Utilization of computing resources in scientific workflows; Virtual.
Kepler includes contributors from GEON, SEEK, SDM Center and Ptolemy II, supported by NSF ITRs (SEEK), EAR (GEON), DOE DE-FC02-01ER25486.
Your name here SPA: Successes, Status, and Future Directions Terence Critchlow And many, many, others Scientific Process Automation PNNL.
NEES Cyberinfrastructure Center at the San Diego Supercomputer Center, UCSD George E. Brown, Jr. Network for Earthquake Engineering Simulation NEES TeraGrid.
Breakout # 1 – Data Collecting and Making It Available Data definition “ Any information that [environmental] researchers need to accomplish their tasks”
GEOSCIENCE NEEDS & CHALLENGES Dogan Seber San Diego Supercomputer Center University of California, San Diego, USA.
Scientific Workflow systems: Summary and Opportunities for SEEK and e-Science.
Toward interactive visualization in a distributed workflow Steven G. Parker Oscar Barney Ayla Khan Thiago Ize Steven G. Parker Oscar Barney Ayla Khan Thiago.
Applications and Requirements for Scientific Workflow May NSF Geoffrey Fox Indiana University.
Cyberinfrastructure: Many Things to Many People Russ Hobby Program Manager Internet2.
System Development & Operations NSF DataNet site visit to MIT February 8, /8/20101NSF Site Visit to MIT DataSpace DataSpace.
Matthew B. Jones National Center for Ecological Analysis and Synthesis (NCEAS) University of California Santa Barbara Advancing Software for Ecological.
Digital Data Collections ARL, CNI, CLIR, and DLF Forum October 28, 2005 Washington DC Chris Greer Program Director National Science Foundation.
OOI Cyberinfrastructure and Semantics OOI CI Architecture & Design Team UCSD/Calit2 Ocean Observing Systems Semantic Interoperability Workshop, November.
Satisfying Requirements BPF for DRA shall address: –DAQ Environment (Eclipse RCP): Gumtree ISEE workbench integration; –Design Composing and Configurability,
Applications and Requirements for Scientific Workflow May NSF Geoffrey Fox Indiana University.
OGCE Workflow and LEAD Overview Suresh Marru, Marlon Pierce September 2009.
Project number: ENVRI and the Grid Wouter Los 20/02/20161.
Ocean Observatories Initiative OOI Cyberinfrastructure Life Cycle Objectives Review January 8-9, 2013 Scientific Workflows for OOI Ilkay Altintas Charles.
Workflow-Driven Science using Kepler Ilkay Altintas, PhD San Diego Supercomputer Center, UCSD words.sdsc.edu.
Cyberinfrastructure Overview of Demos Townsville, AU 28 – 31 March 2006 CREON/GLEON.
Transforming Science Through Data-driven Discovery Workshop Overview Ohio State University MCIC Jason Williams – Lead, CyVerse – Education, Outreach, Training.
Scientific workflow in Kepler – hands on tutorial
Tools and Services Workshop
Joslynn Lee – Data Science Educator
Ptolemy II - Heterogeneous Concurrent Modeling and Design in Java
Joseph JaJa, Mike Smorul, and Sangchul Song
Ptolemy II - Heterogeneous Concurrent Modeling and Design in Java
Retargetable Model-Based Code Generation in Ptolemy II
Ptolemy II - Heterogeneous Concurrent Modeling and Design in Java
Ptolemy II - Heterogeneous Concurrent Modeling and Design in Java
A Semantic Type System and Propagation
Presentation transcript:

1 Ilkay ALTINTAS - October, 2007 Ilkay ALTINTAS Lab Director, Scientific Workflow Automation Technologies San Diego Supercomputer Center, UCSD Kepler Scientific Workflows : Current and Future Development

2 Ilkay ALTINTAS - October, 2007 Scientific Workflow Systems Combination of –data integration, analysis, and visualization steps –automated "scientific process” Mission of scientific workflow systems –Promote “scientific discovery” by providing tools and methods to generate scientific workflows –Create an extensible and customizable graphical user interface for scientists from different scientific domains –Support computational experiment creation, execution, sharing, reuse and provenance –Design frameworks which define efficient ways to connect to the existing data and integrate heterogeneous data from multiple resources

3 Ilkay ALTINTAS - October, 2007 Ptolemy II: A laboratory for investigating design KEPLER: A problem-solving environment for Scientific Workflow KEPLER = “Ptolemy II + X” for Scientific Workflows Kepler is a Scientific Workflow System … and a cross-project collaboration 3rd Beta release (Jan 8, 2007) Builds upon the open- source Ptolemy II framework

4 Ilkay ALTINTAS - October, 2007 Kepler use cases represent many science domains! Ecology –SEEK: Ecological Niche Modeling and climate change –REAP: Modeling parasite invasions in grasslands using sensor networks –NEON: Ecological sensor networks –COMET: Environmental science Geosciences –GEON: LiDAR data processing, Geological data integration –NEESit: Earthquake engineering Molecular biology –SDM: Gene promoter identification and ScalaBLAST –ChIP-chip: Genome-scale research –CAMERA: Metagenomics Oceanography –REAP: SST data processing –LOOKING/OOI CI: ocean observing CI –ROADNet: real-time data modeling and analysis –Ocean Life project Phylogenetics –ATOL: Processing Phylodata –CiPRES: Phylogentic tools Chemistry –Resurgence: Computational chemistry –DART/ARCHER: X-Ray crystallography Library science –DIGARCH: Digital preservation –UK Text Mining Center: Cheshire feature and archival Conservation biology –SanParks: Thresholds of Potential Concerns Physics –SDM: astrophysics TSI-1 and TSI-2 –CPES: Plasma fusion simulation –ITER-EU: ITM fusion workflows

5 Ilkay ALTINTAS - October, 2007 Kepler today is a research prototype and a production workflow tool! Some of the current R&D –Distributed execution of workflow parts (peer to peer) –Efficient data transfer –Provenance tracking of data and processes –Tracking workflow evolution –Streaming data analysis –Easy-to-deploy batch interfaces –Intuitive workflow design –Customizable semantic typing –Interoperability with other workflow and analytical environments (at exec level) Production workflow examples: –GEON LiDAR workflow (GLW) 116 registered, 106 active users 2076 submitted jobs to date –Center for Plasma Edge Simulation Code-Coupling Workflow (CPES-CCW) 2000 actors, 5 levels of model hierarchy Longest run duration 3 hours –PtII AirForce Lab Model actors, attributes Longest run duration: 10 minutes –Longest running real-time simple monitoring model in PtII - months at a time All generated using the GUI and executed in batch mode… –No coding and text manipulation

6 Ilkay ALTINTAS - October, 2007 REAP: Realtime Environment for Analytical Processing Funded –NSF CEO:P Jones(PI), Altintas, Baru, Ludaescher, Schildhauer –Partners: NCEAS/UCSB (Lead), SDSC/UCSD, UCDavis, CENS/UCLA, OpenDAP, OSU Management and Analysis of Observatory Data using Kepler Scientific Workflows The vision: –An integrated environment for analyzing data from observatories Two scientific use cases: –Terrestrial ecology –Oceanography reap.ecoinformatics.org

7 Ilkay ALTINTAS - October, 2007 REAP Views For data-grid engineers –monitoring and management capabilities of underlying sensor networks For outside users –access to observatory data and results of models, approachable to non- scientists. For scientists –capabilities for designing and executing complex analytical models over near real-time and archived data sources

8 Ilkay ALTINTAS - October, 2007 REAP: Terrestrial Ecology Usecase Workflows to develop and test models exploring the impacts of abiotic factors (real-time light, temperature, and rainfall measurements) on the dynamics of plant host populations and their susceptibility to viral pathogens.

9 Ilkay ALTINTAS - October, 2007 REAP: RBNB Streaming Data Actor Example data from Terrestrial UseCase Hardware: a Campbell Scientific CR800 datalogger with eight attached sensors, operating on a workbench.

10 Ilkay ALTINTAS - October, 2007 REAP: Oceanographic Usecase Facilitate the quantitative evaluation of SST data sets.

11 Ilkay ALTINTAS - October, 2007 Kepler/C.O.R.E Funded –NSF SDCI Ludaescher(PI), Altintas, Bowers, Jones, Mc Phillips, Schildhauer –Partners: Genome Center/UCDavis (Lead), SDSC/UCSD, NCEAS/UCSB SDCI NMI Improvement: Development of Kepler/CORE – A Comprehensive, Open, Reliable, and Extensible Scientific Workflow Infrastructure The vision: –Coordinate development of a comprehensive, open, reliable and extensible Kepler scientific workflow infrastructure kepler-project.org Builds on community participation as a driving force for Kepler.

12 Ilkay ALTINTAS - October, 2007 Kepler/C.O.R.E. Comprehensive –First-class support for technical features Open –well designed and clearly articulated mechanisms and interfaces provided to facilitate developing extensions Reliable –Both as a development platform and as a run-time environment for the user Extensible –Independently extensible by groups not directly collaborating with the team

13 Ilkay ALTINTAS - October, 2007 Directors in Kepler Means to execute networks of components under multiple execution models –Dataflow (SDF, PN, DDF) vs. time-based (CT) vs. event-based (DE) vs. all combined Makes use of separation of concerns principle –e.g., component execution, workflow execution and provenance tracking The manager acts like a “common execution environment” –governing different concerns related to execution of the network and services Ptolemy and Kepler are unique in combining different execution models in heterogeneous models! Process Networks Rendezvous Publish and Subscribe Continuous Time Finite State Machines Dataflow Time Triggered Synchronous/reactive model Discrete Event Wireless

14 Ilkay ALTINTAS - October, 2007 Credits Kepler community and colleagues On REAP and Kepler/CORE: –Shawn Bowers, Bertram Ludaescher, Timothy Mc Phillips, Genome Center, UCD –Matt Jones, Derik Barseghian, Mark Schildhauer, NCEAS, UCSB –Eric Seabloom, OSU –Peter Cornillion, OpenDAP

15 Ilkay ALTINTAS - October, 2007 Ilkay Altintas +1 (858) Questions…