Download presentation
Published byPatience Harrison Modified over 9 years ago
1
The Cloud Workloads Archive: A Status Report
Special thanks to Ion for this opportunity! Alexandru Iosup Rean Griffith, Andrew Konwinski, Matei Zaharia, Ali Ghodsi, Ion Stoica Parallel and Distributed Systems Group, Delft University of Technology, The Netherlands RADLab, University of California, Berkeley, USA April 22, 2017 Berkeley, CA, USA
2
About the Team Recent Work in Performance Speaker: Alexandru Iosup
The Grid Workloads Archive (Nov 2006) The Failure Trace Archive (Nov 2009) Analysis of Facebook, Yahoo, and Google data center workloads ( ) The Peer-to-Peer Trace Archive (Apr 2010) Tools: GrenchMark workload-based grid benchmarking, RAIN Speaker: Alexandru Iosup Systems work: Tribler (P2P file sharing), Koala (grid scheduling), POGGI and CAMEO (massively multiplayer online gaming) Performance evaluation of clouds for sci.comp.: EC2 & three others Team of 15+ active collaborators in NL, AT, RO, US Happy to be in Berkeley until September April 22, 2017
3
Traces: Sine Qua Non in Comp.Sys.Res.
“My system/method/algorithm is better than yours (on my carefully crafted workload)” Unrealistic (trivial): Prove that ‘prioritize jobs from users whose name starts with A’ is a good scheduling policy Realistic? 85% jobs are short, 15% are long Major problem in Computer Systems research Workload Trace = recording of real activity from a (real) system, often as a sequence of jobs / requests submitted by users for execution Main use: compare and cross-validate new job and resource management techniques and algorithms Major problem: obtaining and using real workload traces April 22, 2017
4
Previous Data Sharing Efforts
Critical datasets in computer science Grid Workloads Archive Failure Trace Archive Peer-to-Peer Trace Archive Game Trace Archive (soon) … PWA, ITA, CRAWDAD, … 1,000s of scientists From theory to practice Research Question: Are data center workloads unique? (vs GWA, PWA, …) Dataset Size Year 1GB 10GB 100GB 1TB 1TB/yr P2PTA GamTA ‘09 ‘10 ‘11 ‘06 April 22, 2017
5
Agenda Introduction & Motivation
The Cloud Workloads Archive: What’s in a Name? Format and Tools Contents Analysis & Modeling Applications Take Home Message April 22, 2017
6
The Cloud Workloads Archive (CWA) What’s in a Name?
CWA = Public collection of cloud/data center workload traces and of tools to process these traces; allows us to: Compare and cross-validate new job and resource management techniques and algorithms, across various workload traces Determine which (part of a) trace is most interesting for a specific job and resource management technique or algorithm Design a general model for data center workloads, and validate it with various real workload traces Evaluate the generality of a particular workload trace, to determine if results are biased towards a particular trace Analyze the evolution of workload characteristics across long timescales, both intra- and inter-trace April 22, 2017
7
One Format Fits Them All
Flat format Job and Tasks Summary (20 unique data fields) and Detail (60 fields) Categories of information Shared with GWA, PWA: Time, Disk, Memory, Net Jobs/Tasks that change resource consumption profile MapReduce-specific (two-thirds data fields) CWJ CWJD CWT CWTD A. Iosup, R. Griffith, A. Konwinski, M. Zaharia, A. Ghodsi, I. Stoica, Data Format for the Cloud Workloads Archive, v.3, 13/07/10 April 22, 2017
8
CWA Contents: Large-Scale Workloads
Trace ID System Size J/T/Obs Period Notes CWA-01 Facebook 1.1M/-/- 5m/2009 Time & IO CWA-02 Yahoo M 28K/28M/- 20d/2009 ~Full detail CWA-03 Facebook 2 61K/10M/- 10d/2009 Full detail CWA-04 Facebook 3 ?/?/- 10d/ CWA-05 Facebook 4 3m/ CWA-06 Google 2 25 Aug 2010 CWA-07 eBay 23 Sep 2010 CWA-08 Twitter Need help! CWA-09? Google 9K/177K/4M 7h/2009 Coarse,Period Tools Convert to CWA format Analyze and model automatically Report April 22, 2017
9
Agenda Introduction & Motivation
The Cloud Workloads Archive: What’s in a Name? Format and Tools Contents Analysis & Modeling Applications Take Home Message April 22, 2017
10
Types of Analysis Analysis Focus Time-related Structure-related
Run, Wait, Resp.Time Bounded Slowdown Structure-related Number of tasks IO-related IO sizes and ratios Status-related Sys. Utilization-related Counts/Ratios Analysis Type Basic statistics Evolution over time Correlations Data Break-down Overall By Task Type (M/R) By App. Type (ID) By User (ID) By Duration (Short) April 22, 2017
11
Types of Analysis Sys.U., Over Time, By RunTime
Also 1h, 10mins, … counting intervals Study Short-/Long- Range Dependence (self-similarity) Also Job count, Running/Waiting counts, … Study system utilization behavior April 22, 2017
12
Modeling Process Well-known prob. distrib. MLE to fit Goodness-of-Fit
Normal, Exp, LogNormal, Gamma, Weibull, Gen-Pareto, MLE to fit Fit known distribution to empirical distribution parameters Goodness-of-Fit Assess how good the fit is; select best-fitting distribution Kolmogorov-Smirnov: sensitive to body of distribution + D stat Anderson-Darling: sensitive to tails of distribution Hybrid method*: works for very large populations *Kondo et al., Failure Trace Archive, CCGrid’10, Best Paper Award. April 22, 2017
13
Main Results: Basic Stats
Trace ID TRunTime [s] #Tasks/Job Pk.Arr.Rate/D # users CWA-01 165J n/a 21KJ/-T CWA-02 512/80med 901/712Map 6KJ/3.2MT CWA-03 433/86med 153/143Map 8KJ/2MT 18 GWA-T1 370 5—20 -/20KT 332 GWA-T3 89,274 -/8KT 387 GWA-T6 14,599 -/22.5KT 206 GWA-T10 31,964 -/1.6KTph 216 GWA-T11 8,971 -/22KTph 412 MapReduce vs Grid workloads [vs Parallel Prod. Env.] Massive short tasks vs Many long tasks vs Few very long tasks Fewer users for MapReduce environments? TODO: Analyse amounts per core April 22, 2017
14
Agenda Introduction & Motivation
The Cloud Workloads Archive: What’s in a Name? Format and Tools Contents Analysis & Modeling Applications Take Home Message April 22, 2017
15
Applications Mesos running mixtures of workloads
Workloads: MPI, MapReduce, grid, … Find bottlenecks Find workloads that are particularly difficult to run Improve the system! Status: in progress, using cluster in Finland (Petri Savolainen) All the apps typical to trace-based work: design, validation, and comparison of algorithms, methods, and systems. April 22, 2017
16
Agenda Introduction & Motivation
The Cloud Workloads Archive: What’s in a Name? Format and Tools Contents Analysis & Modeling Applications Take Home Message April 22, 2017
17
Take Home Message Cloud Workloads Archive
Datasets Tools to convert, analyze, and model the datasets Need your help to collect more traces Converted and analyzed three MapReduce workloads Different from grid and parallel production environment workloads (ask about additional proof and let me show a couple more slides) Invariants? Applications 1: Model of Cloud/MapReduce workloads 2: Test and improve Mesos April 22, 2017
18
Continuing Our Collaboration
Scheduling mixtures of grid/HPC/cloud workloads Scheduling and resource management in practice Modeling aspects of cloud infrastructure and workloads Condor on top of Mesos Massively Social Gaming and Mesos Step 1: Game analytics and social network analysis in Mesos … April 22, 2017
19
Thank you! Questions? Observations?
Alex Iosup, Rean Griffith, Andrew Konwinski, Matei Zaharia, Ali Ghodsi, Ion Stoica Thanks for all: AliG, Andrew, AndyK, Ari, Beth, Blaine, David, Ion, Justin, Lucian, Matei, Petri, Rean, Tim, … More Information: The Grid Workloads Archive: gwa.ewi.tudelft.nl The Failure Trace Archive: fta.inria.fr The GrenchMark perf. eval. tool: grenchmark.st.ewi.tudelft.nl Cloud research: see PDS publication database at: Big thanks to our collaborators: U. Wisc.-Madison, U Chicago, U Dortmund, U Innsbruck, LRI/INRIA Paris, INRIA Grenoble, U Leiden, Politehnica University of Bucharest, Technion, … April 22, 2017
20
Additional Slides April 22, 2017
21
Main Results: Basic Stats
Trace ID Total IO [MB] Rd. [MB] Wr [%] HDFS Wr[MB] CWA-01 10,934 6,805 38% 1,538 CWA-02 75,546 47,539 37% 8,563 CWA-03 - GWA12.1 469 174 63% n/a GWA12.2 144 114 21% GWA12.3 161 130 19% GWA12.4 389 33 92% GWA12.5 330 31 91% MapReduce vs Grid workloads IO-intensive vs Compute-intensive Constant Wr[%]~40%IO for MapReduce traces? TODO: More MapReduce traces to validate findings April 22, 2017
22
Main Results Two-mode trace do NOT analyze as whole April 22, 2017
23
April 22, 2017
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.