Spark Debugger Ankur Dave, Matei Zaharia, Murphy McCauley, Scott Shenker, Ion Stoica UC BERKELEY.

Spark Debugger Ankur Dave, Matei Zaharia, Murphy McCauley, Scott Shenker, Ion Stoica UC BERKELEY

Motivation Debugging distributed programs is hard Debuggers for general distributed systems incur high overhead Spark model enables debugging for almost zero overhead

Spark Programming Model map(_.split(‘\t’)(3)) articles Resilient Distributed Datasets (RDDs) filter(_.contains( “Berkeley”)) matches count() 10,000 HDFS file Deterministic transformations Example: Find how many Wikipedia articles match a search term

Debugging a Spark Program Debug the individual transformations instead of the whole system Rerun tasks Recompute RDDs Debugging a distributed program is now as easy as debugging a single-threaded one Also applies to MapReduce and Dryad

Approach As Spark program runs, workers report key events back to the master, which logs them Worker Master Worker Performance stats Exceptions RDD checksums Event log

Approach Later, user can re-execute from the event log to debug in a controlled environment Worker Master Debugger Worker Event log

Detecting Nondeterministic Transformations Re-running a nondeterministic transformation may yield different results We can use RDD checksums to detect nondeterminism and alert the user

Demo Example app: PageRank on Wikipedia dataset

Performance Event logging introduces minimal overhead

Future Plans Culprit determination GC monitoring Memory monitoring

Ankur Dave ankurd@eecs.berkeley.edu http://ankurdave.com The Spark debugger is in development at https://github.com/mesos/spark, branch event-log Try Spark at http://spark-project.org!

Spark Debugger Ankur Dave, Matei Zaharia, Murphy McCauley, Scott Shenker, Ion Stoica UC BERKELEY.

Similar presentations

Presentation on theme: "Spark Debugger Ankur Dave, Matei Zaharia, Murphy McCauley, Scott Shenker, Ion Stoica UC BERKELEY."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Spark Debugger Ankur Dave, Matei Zaharia, Murphy McCauley, Scott Shenker, Ion Stoica UC BERKELEY.

Similar presentations

Presentation on theme: "Spark Debugger Ankur Dave, Matei Zaharia, Murphy McCauley, Scott Shenker, Ion Stoica UC BERKELEY."— Presentation transcript:

Similar presentations

About project

Feedback