Abstract Major Cloud computing companies have started to integrate frameworks for parallel data processing in their product portfolio, making it easy for.

Slides:



Advertisements
Similar presentations
MapReduce.
Advertisements

SLA-Oriented Resource Provisioning for Cloud Computing
SOFTWARE TESTING. INTRODUCTION  Software Testing is the process of executing a program or system with the intent of finding errors.  It involves any.
Locality-Aware Dynamic VM Reconfiguration on MapReduce Clouds Jongse Park, Daewoo Lee, Bokyeong Kim, Jaehyuk Huh, Seungryoul Maeng.
Optimus: A Dynamic Rewriting Framework for Data-Parallel Execution Plans Qifa Ke, Michael Isard, Yuan Yu Microsoft Research Silicon Valley EuroSys 2013.
Distributed Computations
GridFlow: Workflow Management for Grid Computing Kavita Shinde.
CS 201 Functions Debzani Deb.
Lecture Nine Database Planning, Design, and Administration
CloudCmp: Shopping for a Cloud Made Easy Ang Li Xiaowei Yang Duke University Srikanth Kandula Ming Zhang Microsoft Research 6/22/2010HotCloud 2010, Boston1.
WORKFLOWS IN CLOUD COMPUTING. CLOUD COMPUTING  Delivering applications or services in on-demand environment  Hundreds of thousands of users / applications.
Software Architecture for DSD The “Uses” Relation.
Data Mining on the Web via Cloud Computing COMS E6125 Web Enhanced Information Management Presented By Hemanth Murthy.
Pregel: A System for Large-Scale Graph Processing
MapReduce.
Assignment 3: A Team-based and Integrated Term Paper and Project Semester 1, 2012.
CLEANROOM SOFTWARE ENGINEERING.
Raffaele Di Fazio Connecting to the Clouds Cloud Brokers and OCCI.
SOFTWARE DESIGN AND ARCHITECTURE LECTURE 21. Review ANALYSIS PHASE (OBJECT ORIENTED DESIGN) Functional Modeling – Use case Diagram Description.
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
Optimizing Cloud MapReduce for Processing Stream Data using Pipelining 作者 :Rutvik Karve , Devendra Dahiphale , Amit Chhajer 報告 : 饒展榕.
1 Dryad Distributed Data-Parallel Programs from Sequential Building Blocks Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, Dennis Fetterly of Microsoft.
1 © 2009 Cisco Systems, Inc. All rights reserved.Cisco Confidential Cloud Computing – The Value Proposition Wayne Clark Architect, Intelligent Network.
LATA: A Latency and Throughput- Aware Packet Processing System Author: Jilong Kuang and Laxmi Bhuyan Publisher: DAC 2010 Presenter: Chun-Sheng Hsueh Date:
Optimizing Cloud MapReduce for Processing Stream Data using Pipelining 2011 UKSim 5th European Symposium on Computer Modeling and Simulation Speker : Hong-Ji.
6. A PPLICATION MAPPING 6.3 HW/SW partitioning 6.4 Mapping to heterogeneous multi-processors 1 6. Application mapping (part 2)
MROrder: Flexible Job Ordering Optimization for Online MapReduce Workloads School of Computer Engineering Nanyang Technological University 30 th Aug 2013.
ProShell Procedure Framework Status MedAustron Control System Week 2 October 7 th, 2010 Roland Moser PR a-RMO, October 7 th, 2010 Roland Moser 1.
Integrating Big Data into the Computing Curricula 02/2015 Achmad Benny Mutiara
REX: RECURSIVE, DELTA-BASED DATA-CENTRIC COMPUTATION Yavuz MESTER Svilen R. Mihaylov, Zachary G. Ives, Sudipto Guha University of Pennsylvania.
INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.
Leverage Big Data With Hadoop Analytics Presentation by Ravi Namboori Visit
CLOUD ARCHITECTURE Many organizations and researchers have defined the architecture for cloud computing. Basically the whole system can be divided into.
Kai Li, Allen D. Malony, Sameer Shende, Robert Bell
TensorFlow– A system for large-scale machine learning
Yarn.
Software Testing.
Distributed Programming in “Big Data” Systems Pramod Bhatotia wp
The Mach System Sri Ramkrishna.
Introduction to Visual Basic 2008 Programming
Architecting Web Services
Unified Modeling Language
Parallel Programming By J. H. Wang May 2, 2017.
Dynamic Graph Partitioning Algorithm
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing MapReduce - Introduction Dr. Xiao Qin Auburn.
Architecting Web Services
Tools and Services Workshop Overview of Atmosphere
Chapter 18 MobileApp Design
Systems Design: Activity Based Costing
Cloud Computing By P.Mahesh
Management of Virtual Execution Environments 3 June 2008
MapReduce Computing Paradigm Basics Fall 2013 Elke A. Rundensteiner
HW05: Graphs and Shortest Paths
CIS16 Application Development – Programming with Visual Basic
Chapter 2 Lin and Dyer & MapReduce Basics Chapter 2 Lin and Dyer &
Module 01 ETICS Overview ETICS Online Tutorials
What is Concurrent Programming?
Lecture Topics: 11/1 General Operating System Concepts Processes
Analysis models and design models
"Cloud services" - what it is.
MAPREDUCE TYPES, FORMATS AND FEATURES
Software Development Life Cycle Models
DryadInc: Reusing work in large-scale computations
Systems Design: Activity Based Costing
Chapter-1 Computer is an advanced electronic device that takes raw data as an input from the user and processes it under the control of a set of instructions.
Chapter 2 Lin and Dyer & MapReduce Basics Chapter 2 Lin and Dyer &
Server & Tools Business
DBOS DecisionBrain Optimization Server
MapReduce: Simplified Data Processing on Large Clusters
Presentation transcript:

Exploiting Dynamic Resource Allocation for Efficient Parallel Data Processing in the Cloud

Abstract Major Cloud computing companies have started to integrate frameworks for parallel data processing in their product portfolio, making it easy for customers to access these services and to deploy their programs. Consequently, the allocated compute resources may be inadequate for big parts of the submitted job and unnecessarily increase processing time and cost. In this project we studied the opportunities and challenges for efficient parallel data processing in clouds and present our research project “Nephele”.

What is Nephele? “Nephele” is the first data processing framework to explicitly exploit the dynamic resource allocation offered by today’s IaaS clouds for both, task scheduling and execution. Particular tasks of a processing job can be assigned to different types of virtual machines which are automatically instantiated and terminated during the job execution. Based on this new framework, we perform extended evaluations of MapReduce-inspired processing jobs on an IaaS cloud system and compare the results to the popular data processing framework Hadoop.

Nephele’s Architecture

Important Terms in Architecture: Job Manager Before submitting a Nephele compute job, a user must start a VM in the cloud which runs the so called Job manager. The Job Manager receives the client’s jobs, is responsible for scheduling them, and coordinates their execution. Task Managers: The actual execution of tasks which a Nephele job consists of is carried out by a set of instances. Each instance runs a so-called Task Manager (TM). A Task Manager receives one or more tasks from the Job Manager at a time, executes them, and after that informs the Job Manager about their completion or possible errors. Unless a job is submitted to the Job Manager, we expect the set of instances (and hence the set of Task Managers) to be empty. Upon job reception the Job Manager then decides, depending on the job’s particular tasks, how many and what type of instances the job should be executed on, and when the respective instances must be allocated/deallocated to ensure a continuous but cost-efficient processing.

Terms… Cloud Cotroller: It is capable of communicating with the interface the cloud operator provides to control the instantiation of VMs. We call this interface the “CloudController”. Cloud controller is nothing but the program running on the Job Manager to schedule the jobs according to time complexity of each .

Nephele Job Similar to Microsoft’s Dryad [14], jobs in Nephele are expressed as a directed acyclic graph (DAG). Each vertex in the graph represents a task of the overall processing job, the graph’s edges define the communication flow between these tasks. We also decided to use DAGs to describe processing jobs for two major reasons: The first reason is that DAGs allow tasks to have multiple input and multiple output edges. This tremendously simplifies the implementation of classic data combining functions like, e.g., join operations. Second and more importantly, though, the DAG’s edges explicitly model the communication paths of the processing job.

Nephele Job… Defining a Nephele job comprises three mandatory steps: First, the user must write the program code for each task of his processing job or select it from an external library. Second, the task program must be assigned to a vertex. Finally, the vertices must be connected by edges to define the communication paths of the job.

Scheduling Algorithm We chosen FCFS algorithm with time complexity concern for scheduling: Algorithm: Schedular(process) { If tm1.timespan<=0 then Tm1.timespan=process.timeComplexity tm2.timespan--; Tm3.timespan--; Else if tm2.timespan<=0 then tm1.timespan--; Tm2.timespan=process.timeComplexity Else if tm3.timespan=0 then Tm1.timespan--; Tm2.timespan--; Tm3.timespan=process.timeComplexity End If }

Architecture in Implementation Phase JM Web-service TM1 Web-Service TM2 Web-Service TM3 Web-Service Scheduler

Process Control Architecture User Cloud App JM Scheduler Selected TM

Selected Real-Time Application Online Banking System Let us demonstrate… Thank You…