Gueyoung Jung, Nathan Gnanasambandam, and Tridib Mukherjee International Conference on Cloud Computing 2012.

Slides:



Advertisements
Similar presentations
Pricing for Utility-driven Resource Management and Allocation in Clusters Chee Shin Yeo and Rajkumar Buyya Grid Computing and Distributed Systems (GRIDS)
Advertisements

All Rights Reserved © Alcatel-Lucent 2009 Enhancing Dynamic Cloud-based Services using Network Virtualization F. Hao, T.V. Lakshman, Sarit Mukherjee, H.
Hadi Goudarzi and Massoud Pedram
03/20/2003Parallel IR1 Papers on Parallel IR Agenda Introduction Paper 1:Inverted file partitioning schemes in multiple disk systems Paper 2: Parallel.
LIBRA: Lightweight Data Skew Mitigation in MapReduce
Daniel Schall, Volker Höfner, Prof. Dr. Theo Härder TU Kaiserslautern.
Resource Management §A resource can be a logical, such as a shared file, or physical, such as a CPU (a node of the distributed system). One of the functions.
SLA-Oriented Resource Provisioning for Cloud Computing
Fall 2008Parallel Sorting1. Fall 2008Parallel Sorting2 Sort  Sorting operation is frequently used for database processing.  For example sorting may.
Power Management in Cloud Computing using Green Algorithm -Kushal Mehta COP 6087 University of Central Florida.
Low-Cost Data Deduplication for Virtual Machine Backup in Cloud Storage Wei Zhang, Tao Yang, Gautham Narayanasamy University of California at Santa Barbara.
Efficient Autoscaling in the Cloud using Predictive Models for Workload Forecasting Roy, N., A. Dubey, and A. Gokhale 4th IEEE International Conference.
Authors: Thilina Gunarathne, Tak-Lon Wu, Judy Qiu, Geoffrey Fox Publish: HPDC'10, June 20–25, 2010, Chicago, Illinois, USA ACM Speaker: Jia Bao Lin.
Effectively Utilizing Global Cluster Memory for Large Data-Intensive Parallel Programs John Oleszkiewicz, Li Xiao, Yunhao Liu IEEE TRASACTION ON PARALLEL.
High Performance Computing 1 Parallelization Strategies and Load Balancing Some material borrowed from lectures of J. Demmel, UC Berkeley.
1 Introduction to Load Balancing: l Definition of Distributed systems. Collection of independent loosely coupled computing resources. l Load Balancing.
Grid Load Balancing Scheduling Algorithm Based on Statistics Thinking The 9th International Conference for Young Computer Scientists Bin Lu, Hongbin Zhang.
New Challenges in Cloud Datacenter Monitoring and Management
Web Server Load Balancing/Scheduling Asima Silva Tim Sutherland.
A User Experience-based Cloud Service Redeployment Mechanism KANG Yu.
MobSched: An Optimizable Scheduler for Mobile Cloud Computing S. SindiaS. GaoB. Black A.LimV. D. AgrawalP. Agrawal Auburn University, Auburn, AL 45 th.
1 Distributed Operating Systems and Process Scheduling Brett O’Neill CSE 8343 – Group A6.
Location-aware MapReduce in Virtual Cloud 2011 IEEE computer society International Conference on Parallel Processing Yifeng Geng1,2, Shimin Chen3, YongWei.
1 Fast Failure Recovery in Distributed Graph Processing Systems Yanyan Shen, Gang Chen, H.V. Jagadish, Wei Lu, Beng Chin Ooi, Bogdan Marius Tudor.
SoftCOM 2005: 13 th International Conference on Software, Telecommunications and Computer Networks September 15-17, 2005, Marina Frapa - Split, Croatia.
1 Time & Cost Sensitive Data-Intensive Computing on Hybrid Clouds Tekin Bicer David ChiuGagan Agrawal Department of Compute Science and Engineering The.
Message-Passing for Wireless Scheduling: an Experimental Study Paolo Giaccone (Politecnico di Torino) Devavrat Shah (MIT) ICCCN 2010 – Zurich August 2.
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. LogKV: Exploiting Key-Value.
Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Xin Huo, Vignesh T. Ravi, Gagan Agrawal Department of Computer Science and Engineering.
MRPGA : An Extension of MapReduce for Parallelizing Genetic Algorithm Reporter :古乃卉.
1 Multiprocessor and Real-Time Scheduling Chapter 10 Real-Time scheduling will be covered in SYSC3303.
1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University.
Data Placement and Task Scheduling in cloud, Online and Offline 赵青 天津科技大学
Example: Sorting on Distributed Computing Environment Apr 20,
Frontiers in Massive Data Analysis Chapter 3.  Difficult to include data from multiple sources  Each organization develops a unique way of representing.
An Architecture for Distributed High Performance Video Processing in the Cloud 作者 :Pereira, R.; Azambuja, M.; Breitman, K.; Endler, M. 出處 :2010 IEEE 3rd.
CONTI'20041 Event Management in Distributed Control Systems Gheorghe Sebestyen Technical University of Cluj-Napoca Computers Department.
Job scheduling algorithm based on Berger model in cloud environment Advances in Engineering Software (2011) Baomin Xu,Chunyan Zhao,Enzhao Hua,Bin Hu 2013/1/251.
Reporter : Yu Shing Li 1.  Introduction  Querying and update in the cloud  Multi-dimensional index R-Tree and KD-tree Basic Structure Pruning Irrelevant.
Compiler and Runtime Support for Enabling Generalized Reduction Computations on Heterogeneous Parallel Configurations Vignesh Ravi, Wenjing Ma, David Chiu.
Department of Computer Science MapReduce for the Cell B. E. Architecture Marc de Kruijf University of Wisconsin−Madison Advised by Professor Sankaralingam.
Performance Analysis of Preemption-aware Scheduling in Multi-Cluster Grid Environments Mohsen Amini Salehi, Bahman Javadi, Rajkumar Buyya Cloud Computing.
Design Issues of Prefetching Strategies for Heterogeneous Software DSM Author :Ssu-Hsuan Lu, Chien-Lung Chou, Kuang-Jui Wang, Hsiao-Hsi Wang, and Kuan-Ching.
July 2013 Elastic Offloading by Dale Denis. Dale Denis The Elastic Offloading of Computationally Intensive Tasks to the Cloud to Augment the Computing.
1 Adaptive Parallelism for Web Search Myeongjae Jeon Rice University In collaboration with Yuxiong He (MSR), Sameh Elnikety (MSR), Alan L. Cox (Rice),
CISC 849 : Applications in Fintech Namami Shukla Dept of Computer & Information Sciences University of Delaware iCARE : A Framework for Big Data Based.
Optimized File Uploads in Mobile Cloud Computing Yash Sheth Vishal Sahu Swapnil Tiwari
Data Consolidation: A Task Scheduling and Data Migration Technique for Grid Networks Author: P. Kokkinos, K. Christodoulopoulos, A. Kretsis, and E. Varvarigos.
Sunpyo Hong, Hyesoon Kim
Euro-Par, HASTE: An Adaptive Middleware for Supporting Time-Critical Event Handling in Distributed Environments ICAC 2008 Conference June 2 nd,
Load Rebalancing for Distributed File Systems in Clouds.
LIOProf: Exposing Lustre File System Behavior for I/O Middleware
Spark on Entropy : A Reliable & Efficient Scheduler for Low-latency Parallel Jobs in Heterogeneous Cloud Huankai Chen PhD Student at University of Kent.
1 Parallel Mining of Closed Sequential Patterns Shengnan Cong, Jiawei Han, David Padua Proceeding of the 11th ACM SIGKDD international conference on Knowledge.
Dynamic Mobile Cloud Computing: Ad Hoc and Opportunistic Job Sharing.
Resource Provision for Batch and Interactive Workloads in Data Centers Ting-Wei Chang, Pangfeng Liu Department of Computer Science and Information Engineering,
Optimizing Distributed Actor Systems for Dynamic Interactive Services
Web Server Load Balancing/Scheduling
Threads vs. Events SEDA – An Event Model 5204 – Operating Systems.
Web Server Load Balancing/Scheduling
Introduction to Load Balancing:
Large-scale file systems and Map-Reduce
Cloud-Assisted VR.
Parallel Programming By J. H. Wang May 2, 2017.
So far we have covered … Basic visualization algorithms
Cloud-Assisted VR.
CSI 400/500 Operating Systems Spring 2009
Lifecycle Suppose we have two processes that require the CPU. The first one had the CPU and you would like to let the second process run, ie context switch.
Collaborative Offloading for Distributed Mobile-Cloud Apps
KISS-Tree: Smart Latch-Free In-Memory Indexing on Modern Architectures
Presentation transcript:

Gueyoung Jung, Nathan Gnanasambandam, and Tridib Mukherjee International Conference on Cloud Computing 2012

 Introduction  Related Work  Problem Statement  Maximally Overlapped Cloud-Bursting (MOBB) approach  Experimental Evaluation  Conclusion 2

 Introduction  Related Work  Problem Statement  Maximally Overlapped Cloud-Bursting (MOBB) approach  Experimental Evaluation  Conclusion 3

 Collected data can exceed hundreds of terabytes and continuously generated ◦ sensors, social media, click-stream, log files, and mobile devices  The solution: Cloud Computing ◦ Analyze big-data by leveraging vast amounts of computing resources available on demand with low resource usage cost 4

 Parallel data mining ◦ topic mining, pattern mining ◦ analyze large amounts of unstructured data ◦ time constraint  Big-data are partly analyzed on local private resources while rest of big-data are transferred to external computing nodes ◦ more flexible and obvious cost benefits 5

 The considerations for optimizing parallel data mining ◦ Node determination ◦ Synchronized completion ◦ Data partition determination  Maximally Overlapped Bin-packing driven Bursting (MOBB) 6

 The goals of MOBB algorithm ◦ Balancing across computing nodes ◦ Time overlap between data transfer delay and computation time in each computing node 7

 Introduction  Related Work  Problem Statement  Maximally Overlapped Cloud-Bursting (MOBB) approach  Experimental Evaluation  Conclusion 8

 Load distribution ◦ the overhead of data transfer  Maximum overlap between data transfer and computation ◦ determine the order of different sizes of data chunks transferred to each node  Task scheduling among computing nodes ◦ load-balancing (CometCloud) ◦ heterogeneous clouds 9

 Introduction  Related Work  Problem Statement  Maximally Overlapped Cloud-Bursting (MOBB) approach  Experimental Evaluation  Conclusion 10

SLA: Service Level Agreement 11

12

13

 Introduction  Related Work  Problem Statement  Maximally Overlapped Cloud-Bursting (MOBB) approach  Experimental Evaluation  Conclusion 14

15 made by the unit of data

 Estimation of computation time ◦ Response surface model ◦ Queueing model  Estimation of data transfer delay ◦ more dynamic than computation time ◦ Auto-regressive moving average (ARMA) model 16

17

 Determination of bucket size of each node  Sorting of data chunks in descending order  Sorting node bucket sizes in descending order (high delay = lower bucket size) 18

19

20

21

 Weighted load distribution  Delay-based preference  Buckets are completely filled one at a time ◦ reduce fragmentation of buckets 22

 Organize the sequence of chunks for maximizing the overlap between data transfer and computation 23

24

25

 Introduction  Related Work  Problem Statement  Maximally Overlapped Cloud-Bursting (MOBB) approach  Experimental Evaluation  Conclusion 26

 Frequent Pattern Mining ◦ A phone call log obtained from a call center and web access log ◦ Size: 200 GB (collected for one year) ◦ Objective: Obtain patterns of each user activities on human resource information systems 27

 Four computing nodes ◦ Low–end Local Central node (LLC)  5 VMs, each has two 2.8 GHz cores, 1GB memory, 1TB hard drive ◦ Low-end Local Worker (LLW)  similar to LLC ◦ High-end Local Worker (HLW)  6 non-virtualized servers, each has GHz cores, 48GB memory, 10 TB hard drive  Shared by other applications ◦ Mid-end Remote Worker (MRW)  9 VMs, each has two 2.8 GHz, 4 GB memory, 1 TB hard drive 28

29

30

31

32 HLW+MRW

 Ideal optimal data allocation ◦ The slack time must be 0 33

 Introduction  Related Work  Problem Statement  Maximally Overlapped Cloud-Bursting (MOBB) approach  Experimental Evaluation  Conclusion 34

 A cloud-bursting based on maximally overlapped load-balancing algorithm which is to optimize the performance of big-data analytics is proposed  Results shows the performance can be improved by 20% to 60% against other approaches 35

36