Presentation is loading. Please wait.

Presentation is loading. Please wait.

Three Perspectives & Two Problems Shivnath Babu Duke University.

Similar presentations


Presentation on theme: "Three Perspectives & Two Problems Shivnath Babu Duke University."— Presentation transcript:

1 Three Perspectives & Two Problems Shivnath Babu Duke University

2 Outline I want to highlight two problems / thoughts First some context

3 Three Perspectives The Cloud era is ringing in interesting changes Increasingly overlapping roles Joe Schmoe can now provision a 100-node Hadoop cluster in minutes Administrators in traditional roles are getting laid off System Designers / Developers Users of the System Administrators

4 Three Perspectives The Cloud era is ringing in interesting changes Software abstractions / packing / release cycle have changed More visibility into how users use the software System Designers / Developers Users of the System Administrators

5 Problem 1: Automated Experiment-driven System Management

6 Taking the (Next) Bite Out of System Administration Cloud has automated some system administration tasks Can we automate others: System tuning (configuration parameters, SQL queries, MapReduce jobs) Detecting and repairing data corruption (disaster recovery) Software /service testing

7 Database Performance Tuning 2-dim Projection of a 11-dim Surface

8 MapReduce Job Tuning in Hadoop 2-dim Projection of a 13-dim Surface

9 Taking the (Next) Bite Out of System Administration Cloud has automated some system administration tasks Can we automate others: System tuning (configuration parameters, SQL queries, MapReduce jobs) Detecting and repairing data corruption (disaster recovery) Software /service testing

10 Data Corruption Stored data becomes different from what it is supposed to be Bugs in software / firmware Alpha particles, bit rot Human mistakes Bad things have happened Data loss System unavailability Incorrect results Stored Data Applications File-System Storage Database

11 Taking the (Next) Bite Out of System Administration Cloud has automated some system administration tasks Can we automate others: System tuning (configuration parameters, SQL queries, MapReduce jobs) Detecting and repairing data corruption (disaster recovery) Software /service testing

12 Key Insight: Need to Run Experiments System tuning: Running workload under various system settings Detecting data corruption: Running integrity checks to verify data correctness Software /service testing: Running the tests Stored Data Applications File-System Storage Database Challenge: Where / How / When to run experiments?

13 Cloud is Part of the Answer Take snapshots of production data at low overhead Fire up production-like instances of the system Pay-as-you-go, elasticity Run the experiments Production Data Applications File-System Storage Database Applications File-System Storage Database Data on system for doing experiments

14 Power of Experiments to the People Resources Declarative Language Plan optimized sequence of expts Conduct expts automatically Declarative benchmarking & tuning Protecting against data corruption

15 Problem 2: Data-Parallel Computing for the Masses

16 Challenges Joe Schmoe can now provision a 100-node Hadoop cluster in minutes. Is that enough? Joe may need to answers to: o How many reduce tasks to use in MapReduce job J for getting the best perf. on my 8-node production cluster? o My current cluster needs more than 6 hours to process 1 days worth of data. Want to reduce that to under 3 hours. How many and what type of Amazon EC2 nodes to use?

17 Performance Vs. Price Tradeoff

18 Spectrum Database Systems SQL Known data-access patterns Fixed set of operators Cost-based optimizers, What-if engines Grid Computing Python / R / Java Unknown data-access patterns Black-box functions Newer Data-Parallel Systems

19 Starfish: Self-Tuning Analytics on Big Data What-if Engine Workflow-level tuning Workflow-aware Optimizer/Scheduler Workload-level tuning Workload OptimizerElastisizer Data Manager Metadata Mgr. Intermediate Data Mgr. Data Layout & Storage Mgr. Just-in-Time Optimizer Profiler Job-level tuning Sampler

20 MapReduce Job Tuning in Hadoop True Surface Estimated Surface

21 Summary Three perspectives: Developer, User, & Administrator Two problems: Automated Experiment-driven System Management Data-Parallel Computing for the Masses


Download ppt "Three Perspectives & Two Problems Shivnath Babu Duke University."

Similar presentations


Ads by Google