Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering

Slides:



Advertisements
Similar presentations
A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University
Advertisements

Cyberinfrastructure for Coastal Forecasting and Change Analysis
Simulation of Feedback Scheduling Dan Henriksson, Anton Cervin and Karl-Erik Årzén Department of Automatic Control.
Kensington Oracle Edition: Open Discovery Workflow Meets Oracle 10g Professor Yike Guo.
SLA-Oriented Resource Provisioning for Cloud Computing
University of Michigan Electrical Engineering and Computer Science 1 Polymorphic Pipeline Array: A Flexible Multicore Accelerator with Virtualized Execution.
A Grid Resource Broker Supporting Advance Reservations and Benchmark- Based Resource Selection Erik Elmroth and Johan Tordsson Reporter : S.Y.Chen.
1 Introduction to Load Balancing: l Definition of Distributed systems. Collection of independent loosely coupled computing resources. l Load Balancing.
Computer Science and Engineering A Middleware for Developing and Deploying Scalable Remote Mining Services P. 1DataGrid Lab A Middleware for Developing.
CHALLENGING SCHEDULING PROBLEM IN THE FIELD OF SYSTEM DESIGN Alessio Guerri Michele Lombardi * Michela Milano DEIS, University of Bologna.
PAGE: A Framework for Easy Parallelization of Genomic Applications 1 Mucahid Kutlu Gagan Agrawal Department of Computer Science and Engineering The Ohio.
A Distributed Framework for Correlated Data Gathering in Sensor Networks Kevin Yuen, Ben Liang, Baochun Li IEEE Transactions on Vehicular Technology 2008.
EXPOSE GOOGLE APP ENGINE AS TASKTRACKER NODES AND DATA NODES.
Ohio State University Department of Computer Science and Engineering 1 Cyberinfrastructure for Coastal Forecasting and Change Analysis Gagan Agrawal Hakan.
An Autonomic Framework in Cloud Environment Jiedan Zhu Advisor: Prof. Gagan Agrawal.
Young Suk Moon Chair: Dr. Hans-Peter Bischof Reader: Dr. Gregor von Laszewski Observer: Dr. Minseok Kwon 1.
A Survey of Distributed Task Schedulers Kei Takahashi (M1)
Euro-Par, A Resource Allocation Approach for Supporting Time-Critical Applications in Grid Environments Qian Zhu and Gagan Agrawal Department of.
Smita Vijayakumar Qian Zhu Gagan Agrawal 1.  Background  Data Streams  Virtualization  Dynamic Resource Allocation  Accuracy Adaptation  Research.
Performance Prediction for Random Write Reductions: A Case Study in Modelling Shared Memory Programs Ruoming Jin Gagan Agrawal Department of Computer and.
Tool Integration with Data and Computation Grid GWE - “Grid Wizard Enterprise”
1 A Grid-Based Middleware for Processing Distributed Data Streams Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering.
Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing.
FREERIDE: System Support for High Performance Data Mining Ruoming Jin Leo Glimcher Xuan Zhang Ge Yang Gagan Agrawal Department of Computer and Information.
High-level Interfaces and Abstractions for Data-Driven Applications in a Grid Environment Gagan Agrawal Department of Computer Science and Engineering.
Sponsored by the National Science Foundation A New Approach for Using Web Services, Grids and Virtual Organizations in Mesoscale Meteorology.
George Goulas, Christos Gogos, Panayiotis Alefragis, Efthymios Housos Computer Systems Laboratory, Electrical & Computer Engineering Dept., University.
1 Supporting Dynamic Migration in Tightly Coupled Grid Applications Liang Chen Qian Zhu Gagan Agrawal Computer Science & Engineering The Ohio State University.
MROrder: Flexible Job Ordering Optimization for Online MapReduce Workloads School of Computer Engineering Nanyang Technological University 30 th Aug 2013.
GSAF: A Grid-based Services Transfer Framework Chunyan Miao, Wang Wei, Zhiqi Shen, Tan Tin Wee.
Software Deployment and Mobility. Introduction Deployment is the placing of software on the hardware where it is supposed to run. Redeployment / migration.
Implementing Data Cube Construction Using a Cluster Middleware: Algorithms, Implementation Experience, and Performance Ge Yang Ruoming Jin Gagan Agrawal.
Run-time Adaptive on-chip Communication Scheme 林孟諭 Dept. of Electrical Engineering National Cheng Kung University Tainan, Taiwan, R.O.C.
GEM: A Framework for Developing Shared- Memory Parallel GEnomic Applications on Memory Constrained Architectures Mucahid Kutlu Gagan Agrawal Department.
RE-PAGE: Domain-Specific REplication and PArallel Processing of GEnomic Applications 1 Mucahid Kutlu Gagan Agrawal Department of Computer Science and Engineering.
An Overview of Scientific Workflows: Domains & Applications Laboratoire Lorrain de Recherche en Informatique et ses Applications Presented by Khaled Gaaloul.
High-level Interfaces for Scalable Data Mining Ruoming Jin Gagan Agrawal Department of Computer and Information Sciences Ohio State University.
Tool Integration with Data and Computation Grid “Grid Wizard 2”
Research Overview Gagan Agrawal Associate Professor.
An Overview of Distributed Real- Time Systems Research By Brian Demers March 24, 2003 CS 535, Spring 2003.
Euro-Par, HASTE: An Adaptive Middleware for Supporting Time-Critical Event Handling in Distributed Environments ICAC 2008 Conference June 2 nd,
MSF and MAGE: e-Science Middleware for BT Applications Sep 21, 2006 Jaeyoung Choi Soongsil University, Seoul Korea
1 A Grid-Based Middleware’s Support for Processing Distributed Data Streams Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering.
1 Supporting a Volume Rendering Application on a Grid-Middleware For Streaming Data Liang Chen Gagan Agrawal Computer Science & Engineering Ohio State.
Implementation of Classifier Tool in Twister Magesh khanna Vadivelu Shivaraman Janakiraman.
Computer Science and Engineering Parallelizing Feature Mining Using FREERIDE Leonid Glimcher P. 1 ipdps’04 Scaling and Parallelizing a Scientific Feature.
Introduction to Load Balancing:
LEAD-VGrADS Day 1 Notes.
Applying Control Theory to Stream Processing Systems
QianZhu, Liang Chen and Gagan Agrawal
Shaowen Wang1, 2, Yan Liu1, 2, Nancy Wilkins-Diehr3, Stuart Martin4,5
EIN 6133 Enterprise Engineering
Real-time Software Design
Accelerating MapReduce on a Coupled CPU-GPU Architecture
A Framework for Automatic Resource and Accuracy Management in A Cloud Environment Smita Vijayakumar.
Supporting Fault-Tolerance in Streaming Grid Applications
Dynamic Process Allocation in Apache Server
An Adaptive Middleware for Supporting Time-Critical Event Response
Smita Vijayakumar Qian Zhu Gagan Agrawal
Networked Real-Time Systems: Routing and Scheduling
GATES: A Grid-Based Middleware for Processing Distributed Data Streams
Resource Allocation in a Middleware for Streaming Data
Presented By: Darlene Banta
Compiler Supported Coarse-Grained Pipelined Parallelism: Why and How
Resource Allocation for Distributed Streaming Applications
A General Approach to Real-time Workflow Monitoring
Self-Managed Systems: an Architectural Challenge
FREERIDE: A Framework for Rapid Implementation of Datamining Engines
LCPC02 Wei Du Renato Ferreira Gagan Agrawal
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering A Grid-Based Middleware’s Support for Processing Distributed Data Streams Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering

Introduction-Motivation Data stream processing and analysis Data stream: data arrive continuously and need to be processed in real-time Data Stream Applications: Online network Intrusion Detection Sensor networks Network Fault Management System for Telecommunication Network Elements Computer Vision Based Surveillance Common features of data streams Continuous arrival Enormous volume Real-time constraints Data sources could be distributed

Introduction-Motivation Network Fault Management System analyzing alarm message streams Switch Network X Network Fault Management System

Introduction-Motivation Computer Vision Based Surveillance

Introduction-Motivation Challenges & possible Solutions Challenge1: Data and/or Computation intensive Switch Network X

Introduction-Motivation Challenges & possible Solutions Challenge1: Data and/or Computation intensive Solution: Grid computing technologies Switch Network

Introduction-Motivation Challenges & possible Solutions Challenge1: Data and/or Computation intensive Solution: Grid computing technologies Challenge 2: real-time analysis is required Solution: Self-Adaptation functionality is desired

Introduction-Motivation From point of view of the developers who are interested in applications of data streams Would like to concentrate on applications themselves Would not like to focus efforts on Grid computing Adaptation function

Introduction-Our Approach A Middle-ware that is based on Grid standards and tools and provides self-adaptation functionality The middleware is referred to as GATES (Grid-based AdapTive Execution on Stream) Automatically distributed to proper computing nodes Automatically self-adaptive to varying environment without implementing certain algorithms

System Architecture and Design (From Application Perspective) Breaking down a task into several sub-tasks so that the sub-tasks can consist of a pipeline Implementing each sub-task in Java Writing an XML configuration file for the sub-tasks to be automatically deployed. I.E. specify how many stages (sub-tasks) the pipeline has specify where the codes that are implementing the sub-tasks reside Launch the application by running a java program (StreamClient.class) provided by the GATES

System Architecture and Design (Architecture)

System Architecture and Design (Architecture) Stage A Stage B Stage C A B C :Grid services of the GATES :Stages of an application :Queues between Grid services :Buffers for applications

System Architecture and Design (Example) Public class Sampling-Stage implements StreamProcessing{ … void init(){…} void work(buffer in, buffer out){ while(true) { Image img = get-from-buffer-in-GATES(in); Image img-sample = Sampling(img, sampling-ratio); put-to-buffer-in-GATES(img-sample, out); } GATES.Information-About-Adjustment-Parameter(min, max, 1) sampling-ratio = GATES.getSuggestedParameter();

Self-adaptation Algorithm Given a queue’s long-term factor at each stage, we want to improve the method of adjusting values of an adaptation parameter Should the adaptation parameter be modified, and if so, in which direction? How to find a new value (update the value) of the adaptation parameter

Enhanced Self-adaptation Algorithm Should the adaptation parameter be modified, and if so, in which direction? The answer is related to load status of queues at two consecutive stages

Enhanced Self-adaptation Algorithm Performance Parameter A B C A B C A B C A B C A B C A B C A B C A B C Convergent States A B C Non-Convergent States

Enhanced Self-adaptation Algorithm Summary of Load States

Enhanced Self-adaptation Algorithm How to determine the new value for the adaptation parameter Linear update: increase or decrease by a fixed value Hard to find a proper fixed value Previous method Binary tree search

Enhanced Self-adaptation Algorithm Left Border Current Value New Value Right Border Left Border Current Value Right Border

Data Mining Applications & System Evaluation Two Data mining applications Clustream: Clustering data arriving in data streams

Data Mining Applications & System Evaluation Dist-Freq-Counting: finding frequent itemsets from distributed streams

Data Mining Applications & System Evaluation

Data Mining Applications & System Evaluation

Data Mining Applications & System Evaluation

Data Mining Applications & System Evaluation

Data Mining Applications & System Evaluation

Data Mining Applications & System Evaluation

Data Mining Applications & System Evaluation

Data Mining Applications & System Evaluation

Data Mining Applications & System Evaluation

Resource Allocation Schemes Problem Definition Grid resource scheduling for Pipelined processing and real-time distributed streaming applications Mapping workflows onto Grid is a NP-complete problem Static Part: the resource allocation problem for GATES is to determine a deployment configuration Dynamic Part

Static Allocation Scheme Static allocation problem: determining a deployment configuration Objective: Automatically generate a deployment configuration according to the information of available resources The number of data sources and their location The destination The number of stages consisting of a pipeline The number of instances of each stage How the instances connect to each other The node where each instance is placed

Static Allocation Scheme Examples of deployment configurations

Related work Grid Resource Allocation Condor Realtor ACDS etc. Main Differences: our work focuses on Grid resource allocation for workflow applications Adaptation Through a Middleware Cheng et al.’s adaptation framework SWiFT Conductor DART ROAM Main Differences: our work focuses on general supports for adaptation in run-time

Summary Grid computing could be an effective solution for distributed data stream processing GATES Distributed processing Exploit grid web services Self-adaptation to meet the real-time constraints Grid resource allocation schemes