1 Time & Cost Sensitive Data-Intensive Computing on Hybrid Clouds Tekin Bicer David ChiuGagan Agrawal Department of Compute Science and Engineering The.

Slides:



Advertisements
Similar presentations
IPDPS Boston Integrating Online Compression to Accelerate Large-Scale Data Analytics Applications Tekin Bicer, Jian Yin, David Chiu, Gagan Agrawal.
Advertisements

University of Minnesota Optimizing MapReduce Provisioning in the Cloud Michael Cardosa, Aameek Singh†, Himabindu Pucha†, Abhishek Chandra
Achieving Elasticity for Cloud MapReduce Jobs Khaled Salah IEEE CloudNet 2013 – San Francisco November 13, 2013.
Piccolo: Building fast distributed programs with partitioned tables Russell Power Jinyang Li New York University.
SLA-Oriented Resource Provisioning for Cloud Computing
Locality-Aware Dynamic VM Reconfiguration on MapReduce Clouds Jongse Park, Daewoo Lee, Bokyeong Kim, Jaehyuk Huh, Seungryoul Maeng.
Distributed Approximate Spectral Clustering for Large- Scale Datasets FEI GAO, WAEL ABD-ALMAGEED, MOHAMED HEFEEDA PRESENTED BY : BITA KAZEMI ZAHRANI 1.
Cloud Computing Resource provisioning Keke Chen. Outline  For Web applications statistical Learning and automatic control for datacenters  For data.
Meeting Service Level Objectives of Pig Programs Zhuoyao Zhang, Ludmila Cherkasova, Abhishek Verma, Boon Thau Loo University of Pennsylvania Hewlett-Packard.
Parallel K-Means Clustering Based on MapReduce The Key Laboratory of Intelligent Information Processing, Chinese Academy of Sciences Weizhong Zhao, Huifang.
Cloud based Dynamic workflow with QOS for Mass Spectrometry Data Analysis Thesis Defense: Ashish Nagavaram Graduate student Computer Science and Engineering.
Computer Science and Engineering A Middleware for Developing and Deploying Scalable Remote Mining Services P. 1DataGrid Lab A Middleware for Developing.
MATE-EC2: A Middleware for Processing Data with Amazon Web Services Tekin Bicer David Chiu* and Gagan Agrawal Department of Compute Science and Engineering.
ANL Chicago Elastic and Efficient Execution of Data- Intensive Applications on Hybrid Cloud Tekin Bicer Computer Science and Engineering Ohio State.
IPDPS, Supporting Fault Tolerance in a Data-Intensive Computing Middleware Tekin Bicer, Wei Jiang and Gagan Agrawal Department of Computer Science.
Thermal Aware Resource Management Framework Xi He, Gregor von Laszewski, Lizhe Wang Golisano College of Computing and Information Sciences Rochester Institute.
Supporting GPU Sharing in Cloud Environments with a Transparent
Marcos Dias de Assunção 1,2, Alexandre di Costanzo 1 and Rajkumar Buyya 1 1 Department of Computer Science and Software Engineering 2 National ICT Australia.
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining Wei Jiang and Gagan Agrawal.
Presented By HaeJoon Lee Yanyan Shen, Beng Chin Ooi, Bogdan Marius Tudor National University of Singapore Wei Lu Renmin University Cang Chen Zhejiang University.
1 Quincy: Fair Scheduling for Distributed Computing Clusters Michael Isard, Vijayan Prabhakaran, Jon Currey, Udi Wieder, Kunal Talwar, and Andrew Goldberg.
Grid Appliance – On the Design of Self-Organizing, Decentralized Grids David Wolinsky, Arjun Prakash, and Renato Figueiredo ACIS Lab at the University.
Resource Provisioning based on Lease Preemption in InterGrid Mohsen Amini Salehi, Bahman Javadi, Rajkumar Buyya Cloud Computing and Distributed Systems.
Performance Issues in Parallelizing Data-Intensive applications on a Multi-core Cluster Vignesh Ravi and Gagan Agrawal
Fault-Tolerant Workflow Scheduling Using Spot Instances on Clouds Deepak Poola, Kotagiri Ramamohanarao, and Rajkumar Buyya Cloud Computing and Distributed.
An Autonomic Framework in Cloud Environment Jiedan Zhu Advisor: Prof. Gagan Agrawal.
A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Tekin Bicer Gagan Agrawal 1.
Budget-based Control for Interactive Services with Partial Execution 1 Yuxiong He, Zihao Ye, Qiang Fu, Sameh Elnikety Microsoft Research.
CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.
1 A Framework for Data-Intensive Computing with Cloud Bursting Tekin Bicer David ChiuGagan Agrawal Department of Compute Science and Engineering The Ohio.
Large Scale Sky Computing Applications with Nimbus Pierre Riteau Université de Rennes 1, IRISA INRIA Rennes – Bretagne Atlantique Rennes, France
A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Graduate Student Department Of CSE 1.
Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Xin Huo, Vignesh T. Ravi, Gagan Agrawal Department of Computer Science and Engineering.
A Survey of Distributed Task Schedulers Kei Takahashi (M1)
Evaluating FERMI features for Data Mining Applications Masters Thesis Presentation Sinduja Muralidharan Advised by: Dr. Gagan Agrawal.
Euro-Par, A Resource Allocation Approach for Supporting Time-Critical Applications in Grid Environments Qian Zhu and Gagan Agrawal Department of.
Computer Science and Engineering Predicting Performance for Grid-Based P. 1 IPDPS’07 A Performance Prediction Framework.
MC 2 : Map Concurrency Characterization for MapReduce on the Cloud Mohammad Hammoud and Majd Sakr 1.
Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing.
CCGrid, 2012 Supporting User Defined Subsetting and Aggregation over Parallel NetCDF Datasets Yu Su and Gagan Agrawal Department of Computer Science and.
Compiler and Runtime Support for Enabling Generalized Reduction Computations on Heterogeneous Parallel Configurations Vignesh Ravi, Wenjing Ma, David Chiu.
MROrder: Flexible Job Ordering Optimization for Online MapReduce Workloads School of Computer Engineering Nanyang Technological University 30 th Aug 2013.
Dynamic Slot Allocation Technique for MapReduce Clusters School of Computer Engineering Nanyang Technological University 25 th Sept 2013 Shanjiang Tang,
DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.
Elastic Cloud Caches for Accelerating Service-Oriented Computations Gagan Agrawal Ohio State University Columbus, OH David Chiu Washington State University.
Supporting Load Balancing for Distributed Data-Intensive Applications Leonid Glimcher, Vignesh Ravi, and Gagan Agrawal Department of ComputerScience and.
Grid Appliance The World of Virtual Resource Sharing Group # 14 Dhairya Gala Priyank Shah.
PDAC-10 Middleware Solutions for Data- Intensive (Scientific) Computing on Clouds Gagan Agrawal Ohio State University (Joint Work with Tekin Bicer, David.
OSU – CSE 2014 Supporting Data-Intensive Scientific Computing on Bandwidth and Space Constrained Environments Tekin Bicer Department of Computer Science.
Xi He Golisano College of Computing and Information Sciences Rochester Institute of Technology Rochester, NY THERMAL-AWARE RESOURCE.
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
Author Utility-Based Scheduling for Bulk Data Transfers between Distributed Computing Facilities Xin Wang, Wei Tang, Raj Kettimuthu,
A stochastic scheduling algorithm for precedence constrained tasks on Grid Future Generation Computer Systems (2011) Xiaoyong Tang, Kenli Li, Guiping Liao,
Euro-Par, HASTE: An Adaptive Middleware for Supporting Time-Critical Event Handling in Distributed Environments ICAC 2008 Conference June 2 nd,
Scientific days, June 16 th & 17 th, 2014 This work has been partially supported by the LabEx PERSYVAL-Lab (ANR-11-LABX ) funded by the French program.
Architecture for Resource Allocation Services Supporting Interactive Remote Desktop Sessions in Utility Grids Vanish Talwar, HP Labs Bikash Agarwalla,
Evaluating and Optimizing Indexing Schemes for a Cloud-based Elastic Key- Value Store Apeksha Shetty and Gagan Agrawal Ohio State University David Chiu.
Computer Science and Engineering Parallelizing Feature Mining Using FREERIDE Leonid Glimcher P. 1 ipdps’04 Scaling and Parallelizing a Scientific Feature.
A Dynamic Scheduling Framework for Emerging Heterogeneous Systems
Resource Elasticity for Large-Scale Machine Learning
Supporting Fault-Tolerance in Streaming Grid Applications
Class project by Piyush Ranjan Satapathy & Van Lepham
Optimizing MapReduce for GPUs with Effective Shared Memory Usage
Data-Intensive Computing: From Clouds to GPU Clusters
An Adaptive Middleware for Supporting Time-Critical Event Response
Yi Wang, Wei Jiang, Gagan Agrawal
Resource Allocation for Distributed Streaming Applications
Supporting Online Analytics with User-Defined Estimation and Early Termination in a MapReduce-Like Framework Yi Wang, Linchuan Chen, Gagan Agrawal The.
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

1 Time & Cost Sensitive Data-Intensive Computing on Hybrid Clouds Tekin Bicer David ChiuGagan Agrawal Department of Compute Science and Engineering The Ohio State University School of Engineering and Computer Science Washington State University † † CCGrid 2012 – Ottowa Canada

Outline Introduction Motivation Challenges System Overview Resource Allocation Framework Experiments Related Work Conclusion CCGrid 2012 – Ottowa Canada

Introduction Big Data – Scientific Datasets: Simulation, Climate etc. Shared resources – Limitations on usage – Application deadlines – Long wait times Cloud Technologies – Elasticity – Pay-as-you-go CCGrid 2012 – Ottowa Canada

Hybrid Cloud Motivation Co-locating Resources – Not always possible In-house dedicated machines –Demand for more resources –Workload might vary in time Hybrid Cloud – Local Resources – Cloud Resources CCGrid 2012 – Ottowa Canada

Hybrid Cloud and Data-Intensive Computing Large dataset split across local and cloud resources – Too large to fit in locally – Use local resources first How do we analyze such a split dataset? – Data movements are extremely expensive Middleware developed in our recent work – Cluster 2011 CCGrid 2012 – Ottowa Canada

Challenges Meeting User Constraints – Time: Minimize cost while meeting the time – Cost: Minimize time while meeting the cost Resource Allocation – A Model for Capturing Time & Cost Constraints Data-Intensive Processing – Map-Reduce Type of Processing CCGrid 2012 – Ottowa Canada

Outline Introduction Motivation Challenges System Overview Resource Allocation Framework Experiments Related Work Conclusion CCGrid 2012 – Ottowa Canada

System Overview for Hybrid Cloud Local cluster and Cloud Environment Map-Reduce type of processing All the clusters connect to a centralized node – Coarse grained job assignment – Consideration of locality Each cluster has a Master node – Fine grained job assignment Job Stealing Cluster Texas Austin 8

Middleware Design for Hybrid Cloud Head Node – Resource Allocation – Job Assignment (Coarse) – Global Reduction Master (In-Cluster) – Job Assignment (Fine) – Reduction Slave – Local Map-Reduce – Remote Map-Reduce CCGrid 2012 – Ottowa Canada

Resource Allocation Framework CCGrid 2012 – Ottowa Canada Estimate required time for local cluster processing Estimate required time for cloud cluster processing All variables can be profiled during execution, except estimated # stolen jobs Estimate required the # jobs that will be stolen Estimate processing time of a cloud job by a local node

Executing the Model Head node – Executes model – Estimates # cloud inst. Before each job assignment Master – Initiates nodes CCGrid 2012 – Ottowa Canada

Outline Introduction Motivation Challenges System Overview Resource Allocation Framework Experiments Related Work Conclusion CCGrid 2012 – Ottowa Canada

Goals of Experiments Analyzing the behavior of our model Observing whether user constraints are met Evaluating system in Cloud Bursting scenario – Local nodes are dropped during execution – Observed how system is adopted CCGrid 2012 – Ottowa Canada

Experimental Setup Two Applications: – KMeans (520GB): Local: 104GB, Cloud:416GB k=5000, 48.2x10^9 points – PageRank (520GB): Local:104GB, Cloud:416GB 50x10^6 link with 41.7x10^8 edges Local node – (Ohio State University, Columbus) – 16 nodes, each with 8 cores: 128 cores Cloud node – (Amazon S3, Virginia) – Max. 16 nodes, each with 8 cores: 128 cores CCGrid 2012 – Ottowa Canada

KMeans – Time Constraint CCGrid 2012 – Ottowa Canada # Local Inst.: 16 (fixed) # Cloud Inst.: Max 16 (Varies) Local: 104GB, Cloud:416GB System is not able to meet the time constraint because max. # of cloud instances is reached All other configurations meet the time constraint with <1.5% error rate

PageRank – Time Constraint CCGrid 2012 – Ottowa Canada # Local Inst.: 16 (fixed) # Cloud Inst.: Max 16 (Varies) Local: 104GB, Cloud:416GB Similar results with KMeans The error rate is <1.3%

KMeans – Cloud Bursting CCGrid 2012 – Ottowa Canada 4 local nodes are dropped … After 25% and 50% of time constraints are elapsed, error rate <1.9% After 75% of time constraint is elapsed, error rate <3.6% Reason of higher error rate: Shorter time to profile new environment # Local Inst.: 16 (fixed) # Cloud Inst.: Max 16 (Varies) Local: 104GB, Cloud:416GB

Kmeans – Cost Constraint CCGrid 2012 – Ottowa Canada System meets the cost constraints with <1.1% error rate Maximum # cloud instances is allocated error rate is again <1.1% System tries to minimize the execution time with provided cost constraint

Related Work Mao et al. (SC’11, GRID’10) – Dynamically (de)allocate cloud instances to meet user constraint (Single Cluster) – Considers different types of instances on EC2 De Assuncao et al. (HPDC’09) – Job scheduling for cloud bursting Marshall et al., Elastic Site (CCGRID’10) – Extending computational limit of local resources with cloud – Considers local cluster’s job queue Map-Reduce on Cloud – Kambatla et al. (HotCloud’09); – Zaharia et al. (OSDI’08); – Lin et al., MOON (HPDC’10) CCGrid 2012 – Ottowa Canada

Conclusion Map-Reduce type of applications – Hybrid cloud setting Developed a resource allocation model – Time and cost constraints – Based on feedback mechanism Two data-intensive applications (KMeans, PR) – Error rate for time < 3.6% – Error rate for cost < 1.2% CCGrid 2012 – Ottowa Canada

Thanks CCGrid 2012 – Ottowa Canada Any Questions?