ANL 2014 - Chicago Elastic and Efficient Execution of Data- Intensive Applications on Hybrid Cloud Tekin Bicer Computer Science and Engineering Ohio State.

ANL 2014 - Chicago Elastic and Efficient Execution of Data- Intensive Applications on Hybrid Cloud Tekin Bicer Computer Science and Engineering Ohio State University

ANL 2014 - Chicago Introduction Scientific simulations and instruments –X–X-ray Photon Correlation Spectroscopy CCD Detector: 120MB/s now; 44GB/s by 2015 –G–Global Cloud Resolving Model 1PB for 4km grid-cell Performed on local clusters –N–Not sufficient Problems –D–Data Analysis, Storage, I/O performance Cloud Technologies –E–Elasticity –P–Pay-as-you-go Model 2

ANL 2014 - Chicago Hybrid Cloud Motivation Cloud technologies –Typically associated with computational resources Massive data generation –Exhaust local storage Hybrid Cloud –Local Resources: Base –Cloud Resources: Additional Cloud –Compute and storage resources 3

ANL 2014 - Chicago Cloud Storage Usage of Hybrid Cloud 4 Local Storage Data Source Local Nodes Cloud Compute Nodes

ANL 2014 - Chicago Challenges Data-Intensive Processing –T–Transparent Data Access and Analysis –P–Programmability of Large-Scale Applications Meeting User Constraints –E–Enabling Cloud Bursting Minimizing Storage and I/O Cost –D–Domain Specific Compression –I–In-Situ and In-Transit Data Analysis 5 MATE-HC: Map-reduce with AlternaTE API over Hybrid Cloud Dynamic Resource Allocation Framework for Hybrid Cloud Compression Methodology and System for Large-Scale App.

ANL 2014 - Chicago Programmability of Large-Scale Applications on Hybrid Cloud Geographically distributed resources Ease of programmability –Reduction-based programming structures MATE-HC –A middleware for transparent data access and processing –Selective job assignment –Multi-threaded data retrieval 6

ANL 2014 - Chicago Middleware for Hybrid Cloud 7 Remote Data Analysis Job Assignment Global Reduction

ANL 2014 - Chicago MATE vs. Map-Reduce Processing Structure 8 Reduction Object represents the intermediate state of the execution Reduce func. is commutative and associative Sorting, grouping.. overheads are eliminated with red. func/obj.

ANL 2014 - Chicago Simple Example 3 5 8 4 1 3 5 2 6 7 9 4 2 4 8 9 Our large Dataset Our Compute Nodes Robj[1]= Local Reduction (+) 81514212327 Result= 71 Global Reduction(+)

ANL 2014 - Chicago Experiments 2 geographically distributed clusters –Cloud: EC2 instances running on Virginia –22 nodes x 8 cores –Local: Campus cluster (Columbus, OH) –150 nodes x 8 cores 3 applications with 120GB of data –KMeans: k=1000; KNN: k=1000; –PageRank: 50x10 links w/ 9.2x10 edges Goals: –Evaluating the system overhead with different job distributions –Evaluating the scalability of the system 10

ANL 2014 - Chicago System Overhead: K-Means 11 Env-*Global Reduction Idle TimeTotal Slowdown Stolen # Jobs (960) localEC2 50/500.067093.87120.430 (0.5%)0 33/670.066031.232142.403 (5.9%)128 17/830.066025.101243.31 (10.4%)240

ANL 2014 - Chicago Scalability: K-Means 12

ANL 2014 - Chicago Summary MATE-HC is a data-intensive middleware developed for Hybrid Cloud Our results show that –Low inter-cluster comm. overhead –Job distribution is important –Overall slowdown is modest –Proposed system is scalable 13

ANL 2014 - Chicago Outline Data-Intensive Processing –Programmability of Large-Scale Applications –Transparent Data Access and Analysis Meeting User Constraints –Enabling Cloud Bursting Minimizing Storage and I/O Cost –Domain Specific Compression –In-Situ and In-Transit Data Analysis 14 MATE-HC: Map-reduce with AlternaTE API over Hybrid Cloud Dynamic Resource Allocation Framework for Cloud Bursting Compression Methodology and System for Large-Scale App.

ANL 2014 - Chicago Dynamic Resource Allocation for Cloud Bursting Performance of cloud resources and workload vary –Problems: Extended execution times Unable to meet user constraints –Cloud resources can dynamically scale Cloud Bursting –In-house resources: Base workload –Cloud resources: Adopt performance requirements Dynamic Resource Allocation Framework –A model for capturing “Time” and “Cost” constraints with cloud bursting 15

ANL 2014 - Chicago System Components Local cluster and Cloud MATE-HC processing structure Pull-based job distribution Head Node –Coarse grained job assignment –Consideration of locality Master node –Fine grained job assignment Job Stealing –Remote data processing 16

ANL 2014 - Chicago Resource Allocation Framework Estimate required time for local cluster processing Estimate required time for cloud cluster processing All variables can be profiled during execution, except estimated # stolen jobs Estimate remaining # jobs after local jobs are consumed Ratio of local computational throughput in system 17

ANL 2014 - Chicago Execution of Resource Allocation Framework Head Node –Evaluates profiled info. –Estimates # cloud inst. Before each job assign. –Informs Master nodes Master Node –Each cluster has one –Collects profile info. During job req. time –(De)allocates resources Slave Nodes –Request and consume jobs 18

ANL 2014 - Chicago Experimental Setup Two Applications –KMeans (520GB): Local=104GB; Cloud=416GB –PageRank (520GB): Local=104GB; Cloud=416GB Local cluster: Max. 16 nodes x 8 cores = 128 cores Cloud resources: Max. 16 nodes x 8 cores = 128 cores Evaluation of model –Local nodes are dropped during execution –Observed how system is adopted 19

ANL 2014 - Chicago KMeans – Time Constraint # Local Inst.: 16 (fixed) # Cloud Inst.: Max 16 (Varies) Local: 104GB, Cloud:416GB System is not able to meet the time constraint because max. # of cloud instances is reached All other configurations meet the time constraint with <1.5% error rate 20

ANL 2014 - Chicago KMeans – Cloud Bursting 4 local nodes are dropped … After 25% and 50% of time constraints are elapsed, error rate <1.9% After 75% of time constraint is elapsed, error rate <3.6% Reason of higher error rate: Shorter time to profile new environment # Local Inst.: 16 (fixed) # Cloud Inst.: Max 16 (Varies) Local: 104GB, Cloud:416GB 21

ANL 2014 - Chicago Summary MATE-HC: MapReduce type of processing –Federated resources Developed a resource allocation model –Based on feedback mechanism –Time and cost constraints Two data-intensive applications (KMeans, PR) –Error rate for time < 3.6% –Error rate for cost < 1.2% 22

ANL 2014 - Chicago Outline Data-Intensive Processing –Programmability of Large-Scale Applications –Transparent Data Access and Analysis Meeting User Constraints –Enabling Cloud Bursting Minimizing Storage and I/O Cost –Domain Specific Compression –In-Situ and In-Transit Data Analysis 23 MATE-HC: Map-reduce with AlternaTE API over HC Dynamic Resource Allocation Framework for Cloud Bursting Compression Methodology and System for Large-Scale App.

ANL 2014 - Chicago Data Management using Compression Generic compression algorithms –Good for low entropy sequence of bytes –Scientific dataset are hard to compress Floating point numbers: Exponent and mantissa Mantissa can be highly entropic Using compression is challenging –Suitable compression algorithms –Utilization of available resources –Integration of compression algorithms 24

ANL 2014 - Chicago Compression Methodology Common properties of scientific datasets –Multidimensional arrays –Consist of floating point numbers –Relationship between neighboring values Domain specific solutions can help Approach: –Prediction-based differential compression Predict the values of neighboring cells Store the difference 25

ANL 2014 - Chicago Example: GCRM Temperature Variable Compression E.g.: Temperature record The values of neighboring cells are highly related X’ table (after prediction): X’’ compressed values –5bits for prediction + difference Lossless and lossy comp. Fast and good compression ratios 26

ANL 2014 - Chicago Compression Framework Improve end-to-end application performance Minimize the application I/O time –Pipelining I/O and (de)compression operations Hide computational overhead –Overlapping application computation with compression framework Easy implementation of compression algorithms Easy integration with applications –Similar API to POSIX I/O 27

ANL 2014 - Chicago A Compression Framework for Data Intensive Applications Chunk Resource Allocation (CRA) Layer Initialization of the system Generate chunk requests, enqueue processing Converting original offset and data size requests to compressed 28 Parallel Compression Engine (PCE) Applies encode(), decode() functions to chunks Manages in-memory cache with informed prefetching Creates I/O requests Parallel I/O Layer (PIOL) Creates parallel chunk requests to storage medium Each chunk request is handled by a group of threads Provides abstraction for different data transfer protocols 28

ANL 2014 - Chicago Integration with a Data-Intensive Computing System Remote data processing –Sensitive to I/O bandwidth Processes data in… –local cluster –cloud –or both (Hybrid Cloud) 29

ANL 2014 - Chicago Experimental Setup Two datasets: –GCRM: 375GB (L:270 + R:105) –NPB: 237GB (L:166 + R:71) 16x8 cores (Intel Xeon 2.53GHz) Storage of datasets –Lustre FS (14 storage nodes) –Amazon S3 (Northern Virginia) Compression algorithms –CC, FPC, LZO, bzip, gzip, lzma Applications: AT, MMAT, KMeans 30

ANL 2014 - Chicago Performance of MMAT 31 Breakdown of Performance Overhead (Local): 15.41% Read Speedup: 1.96

ANL 2014 - Chicago Lossy Compression (MMAT) 32 Lossy #e: # dropped bits Error bound: 5x(1/10^5)

ANL 2014 - Chicago Summary Management and analysis of scientific datasets are challenging –Generic compression algorithms are inefficient for scientific datasets Proposed a compression framework and methodology –Domain specific compression algorithms are fast and space efficient 51.68% compression ratio 53.27% improvement in exec. time –Easy plug-and-play of compression –Integration of the proposed framework and methodology with a data analysis middleware 33

ANL 2014 - Chicago Outline Data-Intensive Processing –P–Programmability of Large-Scale Applications –T–Transparent Data Access and Analysis Meeting User Constraints –E–Enabling Cloud Bursting Minimizing Storage and I/O Cost –D–Domain Specific Compression –I–In-Situ and In-Transit Data Analysis 34 MATE-HC: Map-reduce with AlternaTE API over Hybrid Cloud Dynamic Resource Allocation Framework for Cloud Bursting Compression Methodology and System for Large-Scale App.

ANL 2014 - Chicago In-Situ and In-Transit Analysis Compression can ease data management –But may not always be sufficient In-situ data analysis –Co-locate data source and analysis code –Data analysis during data generation In-transit data analysis –Remote resources are utilized –Forward generated data to “staging nodes” 35

ANL 2014 - Chicago In-Situ and In-Transit Data Analysis Significant reduction in generated dataset size –Noise elimination, data filtering, stream mining… –Timely insights Parallel data analysis –MATE-Stream Dynamic resource allocation and load balancing –Hybrid data analysis –Both in-situ and in-transit 36

ANL 2014 - Chicago Robj[...] LR Parallel In-Situ Data Analysis 37 Data Source Disp LRobj[...] Local Combination Intermediate results Timely insights Continuous global red. Local Reduction Filtering, stream mining Data reduction Continuous local red. Data Generation Scientific instruments, simulations, etc. (Un)bounded data

ANL 2014 - Chicago Robj[...] LR Robj[...] LR Elastic In-Situ Data Analysis 38 Data Source Disp LRobj[...] Insufficient resource utilization Dynamically extend resources New local reduction proc.

ANL 2014 - Chicago Robj[...] LR Robj[...] LR Elastic In-Situ and In-Transit Data Analysis 39 Data Source Disp LRobj[...] Disp LRobj[...] GRobj[...] Staging node is set Forward data Reduction process: 1.Local comb. 2.Global comb. N0 N1

ANL 2014 - Chicago Future Directions Scientific applications are difficult to modify –Integration with existing data sources –GridFTP, (P)NetCDF and HDF5 etc. Data transfer is expensive (especially for in-transit) –Utilization of advanced network technologies –Software-Defined Networking (SDN) Long running nature of large-scale app. –Failures are inevitable –Exploit features of processing structure 40

ANL 2014 - Chicago Conclusions Data-intensive applications and instruments can easily exhaust local resources Hybrid cloud can provide additional resources Challenges: Transparent data access and processing; meeting user constraints; minimizing I/O and storage cost MATE-HC: Transparent and efficient data processing on Hybrid Cloud Developed a “dynamic resource allocation framework” and integrated with MATE-HC –Time and cost sensitive data processing Proposed a “compression methodology and a system” to minimize storage cost and I/O bottleneck Design of “in-situ and in-transit data analysis” (on going work) 41

ANL 2014 - Chicago 42 Thanks for your attention!

ANL 2014 - Chicago MATE-EC2 Design Data organization –Three levels: Buckets, Chunks and Units –Metadata information Chunk Retrieval –Threaded Data Retrieval –Selective Job Assignment Load Balancing and handling heterogeneity –Pooling mechanism 43

ANL 2014 - Chicago MATE-EC2 vs. EMR 44 KMeans Speedups vs. combine 3.54 – 4.58 PageRank Speedups vs. combine 4.08 – 7.54

ANL 2014 - Chicago Different Chunk Sizes KMeans 1 retrieval threads Performance increase –128KB vs. >8M –2.07 to 2.49 45

ANL 2014 - Chicago K-Means (Data Retrieval) Fig 1: 16 Retrieval Threads –8M vs. others speedup: 1.13-1.30 Fig. 2: 128M Chunk Size –1 Thread vs. others speedup: 1.37-1.90 46 Fig. 1 Fig. 2 Dataset: 8.2GB

ANL 2014 - Chicago Job Assignment 47 KMeans: –1.01 (8M) and 1.10-1.14 (for others) PCA (2 iterations): –Speedups : 1.19-1.68

ANL 2014 - Chicago Heterogeneous Conf. 48 Overheads –KMeans: 1% –PCA: 1.1%, 7.4%, 11.7%

ANL 2014 - Chicago Kmeans – Cost Constraint System meets the cost constraints with <1.1% error rate Maximum # cloud instances is allocated error rate is again <1.1% System tries to minimize the execution time with provided cost constraint 49

ANL 2014 - Chicago Prefetching and In-Memory Cache Overlapping application layer computation with I/O Reusability of already accessed data is small Prefetching and caching the prospective chunks –Default is LRU –User can analyze history and provide prospective chunk list Cache uses row-based locking scheme for efficient consecutive chunk requests 50 Informed Prefetching prefetch(…)

ANL 2014 - Chicago 51 Performance of KMeans NPB dataset Comp ratio: 24.01% (180GB) More computation –More opportunity to fetch and decompression

ANL 2014 - Chicago Elastic and Efficient Execution of Data- Intensive Applications on Hybrid Cloud Tekin Bicer Computer Science and Engineering Ohio State.

Similar presentations

Presentation on theme: "ANL 2014 - Chicago Elastic and Efficient Execution of Data- Intensive Applications on Hybrid Cloud Tekin Bicer Computer Science and Engineering Ohio State."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

ANL 2014 - Chicago Elastic and Efficient Execution of Data- Intensive Applications on Hybrid Cloud Tekin Bicer Computer Science and Engineering Ohio State.

Similar presentations

Presentation on theme: "ANL 2014 - Chicago Elastic and Efficient Execution of Data- Intensive Applications on Hybrid Cloud Tekin Bicer Computer Science and Engineering Ohio State."— Presentation transcript:

Similar presentations

About project

Feedback