Streamflow - Programming Model for Data Streaming in Scientific Workflows Chathura Herath.

Slides:



Advertisements
Similar presentations
Lehrstuhl Informatik III: Datenbanksysteme Astrometric Matching - E-Science Workflow 1 Lehrstuhl Informatik III: 1 Datenbanksysteme 1 Fakultät für Informatik.
Advertisements

GRADD: Scientific Workflows. Scientific Workflow E. Science laboris Workflows are the new rock and roll of eScience Machinery for coordinating the execution.
LEAD Portal: a TeraGrid Gateway and Application Service Architecture Marcus Christie and Suresh Marru Indiana University LEAD Project (
Towards a Practical Composition Language Oscar Nierstrasz Software Composition Group University of Bern.
Presented by: Thabet Kacem Spring Outline Contributions Introduction Proposed Approach Related Work Reconception of ADLs XTEAM Tool Chain Discussion.
Transparent Robustness in Service Aggregates Onyeka Ezenwoye School of Computing and Information Sciences Florida International University May 2006.
Variability Oriented Programming – A programming abstraction for adaptive service orientation Prof. Umesh Bellur Dept. of Computer Science & Engg, IIT.
SSP Re-hosting System Development: CLBM Overview and Module Recognition SSP Team Department of ECE Stevens Institute of Technology Presented by Hongbing.
Page 1 Building Reliable Component-based Systems Chapter 16 - Component based embedded systems Chapter 16 Component based embedded systems.
PTIDES: Programming Temporally Integrated Distributed Embedded Systems Yang Zhao, EECS, UC Berkeley Edward A. Lee, EECS, UC Berkeley Jie Liu, Microsoft.
L4-1-S1 UML Overview © M.E. Fayad SJSU -- CmpE Software Architectures Dr. M.E. Fayad, Professor Computer Engineering Department, Room #283I.
GenSpace: Exploring Social Networking Metaphors for Knowledge Sharing and Scientific Collaborative Work Chris Murphy, Swapneel Sheth, Gail Kaiser, Lauren.
Agenda  Introduction  Background to CEP  Complex Event Processing  Stream Insight  Anatomy of a Stream Insight Project.
Lecture 23: Software Architectures
February 12, 2009 Center for Hybrid and Embedded Software Systems Encapsulated Model Transformation Rule A transformation.
An Intelligent Broker Approach to Semantics-based Service Composition Yufeng Zhang National Lab. for Parallel and Distributed Processing Department of.
Design of Fault Tolerant Data Flow in Ptolemy II Mark McKelvin EE290 N, Fall 2004 Final Project.
Satzinger, Jackson, and Burd Object-Orieneted Analysis & Design
Architectural Design Principles. Outline  Architectural level of design The design of the system in terms of components and connectors and their arrangements.
Department of Electrical Engineering and Computer Sciences University of California at Berkeley System-Level Types for Component-Based Design Edward A.
February 12, 2009 Center for Hybrid and Embedded Software Systems Model Transformation Using ERG Controller Thomas H. Feng.
7th Biennial Ptolemy Miniconference Berkeley, CA February 13, 2007 PTIDES: A Programming Model for Time- Synchronized Distributed Real-time Systems Yang.
CS 290C: Formal Models for Web Software Lecture 6: Model Driven Development for Web Software with WebML Instructor: Tevfik Bultan.
Using the Vanderbilt Generic Modeling Environment (GME) to Address SOA QoS Sumant Tambe Graduate Intern, Applied Research, Telcordia Technologies Inc.
Avoiding Idle Waiting in the execution of Continuous Queries Carlo Zaniolo CSD CS240B Notes April 2008.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
A Semantic Workflow Mechanism to Realise Experimental Goals and Constraints Edoardo Pignotti, Peter Edwards, Alun Preece, Nick Gotts and Gary Polhill School.
Architectural Design.
June Amsterdam A Workflow Bus for e-Science Applications Dr Zhiming Zhao Faculty of Science, University of Amsterdam VL-e SP 2.5.
Katanosh Morovat.   This concept is a formal approach for identifying the rules that encapsulate the structure, constraint, and control of the operation.
University of Kansas Electrical Engineering Computer Science Jerry James and Douglas Niehaus Information and Telecommunication Technology Center Electrical.
1 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Requirements for caBIG Infrastructure to Support Semantic Workflows Yolanda.
Software Development Stephenson College. Classic Life Cycle.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
Dart: A Meta-Level Object-Oriented Framework for Task-Specific Behavior Modeling by Domain Experts R. Razavi et al..OOPSLA Workshop DSML‘ Dart:
Grid-enabling OGC Web Services Andrew Woolf, Arif Shaon STFC e-Science Centre Rutherford Appleton Lab.
Privacy issues in integrating R environment in scientific workflows Dr. Zhiming Zhao University of Amsterdam Virtual Laboratory for e-Science Privacy issues.
Combining the strengths of UMIST and The Victoria University of Manchester Utility Driven Adaptive Workflow Execution Kevin Lee School of Computer Science,
Towards Low Overhead Provenance Tracking in Near Real-Time Stream Filtering Nithya N. Vijayakumar, Beth Plale DDE Lab, Indiana University {nvijayak,
Cracow Grid Workshop, October 27 – 29, 2003 Institute of Computer Science AGH Design of Distributed Grid Workflow Composition System Marian Bubak, Tomasz.
Argonne National Laboratory is a U.S. Department of Energy laboratory managed by U Chicago Argonne, LLC. Xin Zhao *, Pavan Balaji † (Co-advisor) and William.
© DATAMAT S.p.A. – Giuseppe Avellino, Stefano Beco, Barbara Cantalupo, Andrea Cavallini A Semantic Workflow Authoring Tool for Programming Grids.
Euro-Par, A Resource Allocation Approach for Supporting Time-Critical Applications in Grid Environments Qian Zhu and Gagan Agrawal Department of.
Chad Berkley NCEAS National Center for Ecological Analysis and Synthesis (NCEAS), University of California Santa Barbara Long Term Ecological Research.
Data Tagging Architecture for System Monitoring in Dynamic Environments Bharat Krishnamurthy, Anindya Neogi, Bikram Sengupta, Raghavendra Singh (IBM Research.
Application of Provenance for Automated and Research Driven Workflows Tara Gibson June 17, 2008.
Chapter 10 Analysis and Design Discipline. 2 Purpose The purpose is to translate the requirements into a specification that describes how to implement.
Integrating Scale Out and Fault Tolerance in Stream Processing using Operator State Management Author: Raul Castro Fernandez, Matteo Migliavacca, et al.
Unified Modeling Language* Keng Siau University of Nebraska-Lincoln *Adapted from “Software Architecture and the UML” by Grady Booch.
L6-S1 UML Overview 2003 SJSU -- CmpE Advanced Object-Oriented Analysis & Design Dr. M.E. Fayad, Professor Computer Engineering Department, Room #283I College.
1 Computing Challenges for the Square Kilometre Array Mathai Joseph & Harrick Vin Tata Research Development & Design Centre Pune, India CHEP Mumbai 16.
Quality views: capturing and exploiting the user perspective on data quality Paolo Missier, Suzanne Embury, Mark Greenwood School of Computer Science University.
Using Meta-Model-Driven Views to Address Scalability in i* Models Jane You Department of Computer Science University of Toronto.
INNOV-10 Progress® Event Engine™ Technical Overview Prashant Thumma Principal Software Engineer.
Sponsored by the National Science Foundation A New Approach for Using Web Services, Grids and Virtual Organizations in Mesoscale Meteorology.
Network Computing Laboratory A programming framework for Stream Synthesizing Service.
Non-Traditional Databases. Reading 1. Scientific data management at the Johns Hopkins institute for data intensive engineering and science Yanif Ahmad,
1 Limitations of BLAST Can only search for a single query (e.g. find all genes similar to TTGGACAGGATCGA) What about more complex queries? “Find all genes.
SCAPE Rainer Schmidt SCAPE Training Event September 16 th – 17 th, 2013 The British Library Building Scalable Environments Technologies and SCAPE Platform.
SQL Based Knowledge Representation And Knowledge Editor UMAIR ABDULLAH AFTAB AHMED MOHAMMAD JAMIL SAWAR (Presented by Lei Jiang)
David Chiu and Gagan Agrawal Department of Computer Science and Engineering The Ohio State University 1 Supporting Workflows through Data-driven Service.
Behavioral Framework Background & Terminology. Behavioral Framework: Introduction  Background..  What was the goal..
OGCE Workflow and LEAD Overview Suresh Marru, Marlon Pierce September 2009.
National Aeronautics and Space Administration Jet Propulsion Laboratory March 17, 2009 Workflow Orchestration: Conducting Science Efficiently on the Grid.
Introduction to Software Engineering
Unified Modeling Language
An Adaptive Middleware for Supporting Time-Critical Event Response
Resource Allocation for Distributed Streaming Applications
Software Development Process Using UML Recap
Scientific Workflows Lecture 15
Presentation transcript:

Streamflow - Programming Model for Data Streaming in Scientific Workflows Chathura Herath

Outline Background Motivation Approach Architecture Programming Model Domain application

Background Scientific workflow are a good programming model for scientific computing Scientific domains have high volumes of data Most of the data are coming from sensors, catalogs and other experiments. Most data sources are data streams or can be modeled as streams.

Motivation Huge data sources require preprocessing and mining and scaling down of data volumes. Compute resources are limited when taking the scale of date. Currently experts determine which data sets contain the interesting data Preserve the workflow programming model for the user. Users are familiar with DAG execution Define workflow patterns for use as new workflow semantics that can capture data streams Goal ◦ Real-time data mining, filtering and preprocessing ◦ Data-driven reactive workflow systems ◦ Feedback systems

Data to Information Data Storage Supercomputing Information Rate Data Rate

Data to Information Data Storage Supercomputing Information Rate Data Rate Scientific workflow Stream Mining

Streamflow Data Storage Supercomputing Information Rate Data Rate Streamflow

Why Workflow Streaming? Most scientific workflows are static Considerable segment of scientific data for scientific workflows are produced by scientific sensors Sensor data tend to behave as repeating data streams It is possible to provide a programming abstraction to capture data search and filtration?

Possible approaches Complete decoupled systems where workflows and the data mining is separate. ◦ Data mining rules or queries would produce outputs which would may get refined again and again. ◦ Some interesting event would launch the workflow. ◦ It may loose the insight and abstraction provided by the workflows ◦ The Data mining itself may have complex data and control dependencies Pure workflow approach ◦ Workflow languages are not designed for streaming

Stream Integration Approach Complex Event Processing system ◦ Interact with the streams ◦ Filter and bundle data ◦ Publish input datasets to workflows Workflow system ◦ Handles the scientific computations ◦ Gets invoked when dataset of specified nature gets published to the CEP system Resources Streamflow Semantics StreamBaseWorkflow Streamflow Composer Esper

STREAMing workFLOWS - Streamflows Streamflows are enhancement of workflows to handle data streams Allows the complex experimental logic to be encapsulated using scientific workflows Allows the management of large streams of data with stream mining Provide a programming model similar to workflow composition to handle streams Workflow Streamflow

Stream Integration Select * from DataminedRUCDATA(reflectivity> 3.5).win:time_batch(1h)

Workflow Semantics Conventional SOA components can be used as it is. Workflow components may change behavior based on input data or stream. Filter nodes will change the “cardinality” of the output stream Aggregator will aggregate data over a window. Generator node interface external stream to the Streamflow

Programming model Join semantics ◦ Constant inputs need to be matched to streams. Inputs Streamed into the workflow from Stream Engine Outputs are published back by stream sinks and may be used for feedback.

Evaluation Deployment Overhead ◦ Extra overhead as the workflow is flat. Θ (1) ◦ Extra overhead are comparable to the normal workflow deployment because it may need to deploy new workflows Runtime Latency ◦ Latency of event arriving at the framework to be delivered the workflow.

Evaluation

Domains Meteorology Astronomy On-Demand Grid Computing Streaming Observations Storms Forming Forecast Model Data Mining Astronomy Meteorology

Related work B. Biornstad. A workflow approach to stream processing, PhD Thesis, Computer Science Department, ETH Zurich. Y. Liu, N. Vijayakumar, and B. Plale. Stream processing in data- driven computational science. In Proceedings of the 7th IEEE/ACM International Conference on Grid Computing, pages 160–167. IEEE Computer Society Washington, DC, USA, J. Buck, S. Ha, E. Lee, and D. Messerschmitt. Ptolemy: A framework for simulating and prototyping heterogeneous systems. International Journal of Computer Simulation, 4(2):155–182, – DataTurbine Y. Cai et al. MAIDS: Mining Alarming Incidents from Data Streams Automated Learning Group, NCSA, University of Illinois at Urbana-Champaign, U.S.A.

Future work Develop a formal model for the workflow semantics Event order guarantees How to handle missing streams Provenance for data streams.

Questions ?