Three Perspectives & Two Problems Shivnath Babu Duke University.

Three Perspectives & Two Problems Shivnath Babu Duke University

Outline I want to highlight two problems / thoughts First some context

Three Perspectives The Cloud era is ringing in interesting changes Increasingly overlapping roles Joe Schmoe can now provision a 100-node Hadoop cluster in minutes Administrators in traditional roles are getting laid off System Designers / Developers Users of the System Administrators

Three Perspectives The Cloud era is ringing in interesting changes Software abstractions / packing / release cycle have changed More visibility into how users use the software System Designers / Developers Users of the System Administrators

Problem 1: Automated Experiment-driven System Management

Taking the (Next) Bite Out of System Administration Cloud has automated some system administration tasks Can we automate others: System tuning (configuration parameters, SQL queries, MapReduce jobs) Detecting and repairing data corruption (disaster recovery) Software /service testing

Database Performance Tuning 2-dim Projection of a 11-dim Surface

MapReduce Job Tuning in Hadoop 2-dim Projection of a 13-dim Surface

Data Corruption Stored data becomes different from what it is supposed to be Bugs in software / firmware Alpha particles, bit rot Human mistakes Bad things have happened Data loss System unavailability Incorrect results Stored Data Applications File-System Storage Database

Key Insight: Need to Run Experiments System tuning: Running workload under various system settings Detecting data corruption: Running integrity checks to verify data correctness Software /service testing: Running the tests Stored Data Applications File-System Storage Database Challenge: Where / How / When to run experiments?

Cloud is Part of the Answer Take snapshots of production data at low overhead Fire up production-like instances of the system Pay-as-you-go, elasticity Run the experiments Production Data Applications File-System Storage Database Applications File-System Storage Database Data on system for doing experiments

Power of Experiments to the People Resources Declarative Language Plan optimized sequence of expts Conduct expts automatically Declarative benchmarking & tuning Protecting against data corruption

Problem 2: Data-Parallel Computing for the Masses

Challenges Joe Schmoe can now provision a 100-node Hadoop cluster in minutes. Is that enough? Joe may need to answers to: o How many reduce tasks to use in MapReduce job J for getting the best perf. on my 8-node production cluster? o My current cluster needs more than 6 hours to process 1 days worth of data. Want to reduce that to under 3 hours. How many and what type of Amazon EC2 nodes to use?

Performance Vs. Price Tradeoff

Spectrum Database Systems SQL Known data-access patterns Fixed set of operators Cost-based optimizers, What-if engines Grid Computing Python / R / Java Unknown data-access patterns Black-box functions Newer Data-Parallel Systems

Starfish: Self-Tuning Analytics on Big Data What-if Engine Workflow-level tuning Workflow-aware Optimizer/Scheduler Workload-level tuning Workload OptimizerElastisizer Data Manager Metadata Mgr. Intermediate Data Mgr. Data Layout & Storage Mgr. Just-in-Time Optimizer Profiler Job-level tuning Sampler

MapReduce Job Tuning in Hadoop True Surface Estimated Surface

Summary Three perspectives: Developer, User, & Administrator Two problems: Automated Experiment-driven System Management Data-Parallel Computing for the Masses

Three Perspectives & Two Problems Shivnath Babu Duke University.

Similar presentations

Presentation on theme: "Three Perspectives & Two Problems Shivnath Babu Duke University."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Three Perspectives & Two Problems Shivnath Babu Duke University.

Similar presentations

Presentation on theme: "Three Perspectives & Two Problems Shivnath Babu Duke University."— Presentation transcript:

Similar presentations

About project

Feedback