What is the Cost of Determinism?

What is the Cost of Determinism?
Cedomir Segulja, Tarek S. Abdelrahman University of Toronto

Source: [Youtube] Source: [Intel]

Non-Determinism Same program + same input ≠ same output
This is bad for … Testing Too many interleaving to test Debugging Hard to debug when behavior is not repeatable Selling CAD tools users expect each run to produce the same circuit

Deterministic Schedulers
Determinism Deterministic Schedulers Maximum Slowdown DMP [Devietti et al. 2009] 1.7x Kendo [Olszewski et al. 2009] 1.6x Grace [Berger et al. 2009] 3.6x CoreDet [Bergan et al. 2010] 10x Calvin [Hower et al. 2011] RCDC [Devietti et al. 2011] Dthreads [Liu et al. 2011] 4x Conversion [Merriﬁeld and Eriksson 2013] 5x Parrot [Cui et al. 2013] 3.8x RFDet [Lu et al. 2014] 2.6x Is good, but costly What is the fundamental cost of determinism? What is this cost across various execution environments? “Determinism in the field” 1 2 Source: [Bergan et al. 2011] and the respective papers *Only to show that determinism comes at a cost, and not to be used for a direct comparison (different features, benchmarks, # threads, etc.)

What is Determinism? Property that requires observing the same output whenever program runs with the same input SyncOrder determinism [Lu and Scott 11] Require the same program result and same order of synchronization More flexible than internal determinism Still greatly eases testing [Cui et al. 13] We assume data-race-freedom Determinism during debugging is needed But the cost of determinism matters the most in production All data races are bugs [Boehm 2008, S. Adve 2010, Marino et al. 2010, Lucia et al. 2010, …] Data races in general do not help performance [Boehm 12] External SyncOrder Internal

What is the impact of enforcing a fixed synchronization order on program execution time?

Schedule-Record-Replay Framework
1 2 application application schedule thread1 thread2 scheduler replayer serial hybrid round-robin perturber idle small perturbations architectures dynamic-A dynamic-S NUMA background processes recorder DVFS

Replayer Force threads to wait only when absolutely necessary under the schedule And do so with as little overhead as possible Non-deterministic execution vs. Non-deterministic execution with the replayer’s overhead

Deterministic Schedulers
Schedules Deterministic Schedulers Schedule Grace [Berger et al. 2009] serial Dthreads [Liu et al. 2011] round-robin Conversion [Merriﬁeld and Eriksson 2013] Parrot [Cui et al. 2013] Kendo [Olszewski et al. 2009] dynamic RCDC [Devietti et al. 2011] RFDet [Lu et al. 2014] DMP [Devietti et al. 2009] hybrid CoreDet [Bergan et al. 2010] Calvin [Hower et al. 2011] When does a thread pass its turn? At the end – serial After each synchronization operation – round-robin After each instruction/store – dynamic-A/dynamic-S After N instructions – hybrid N = 100,000 No “reduced serial mode”

Platform 8-core Xeon E5-2660 24 SPLASH-2 and PARSEC benchmarks, 8 threads Deterministic slowdown 𝑑𝑒𝑡𝑒𝑟𝑚𝑖𝑛𝑖𝑠𝑡𝑖𝑐 𝑒𝑥𝑒𝑐𝑢𝑡𝑖𝑜𝑛 𝑡𝑖𝑚𝑒 𝑛𝑜𝑛 𝑑𝑒𝑡𝑒𝑟𝑚𝑖𝑛𝑖𝑠𝑡𝑖𝑐 𝑒𝑥𝑒𝑐𝑢𝑡𝑖𝑜𝑛 𝑡𝑖𝑚𝑒 Data races in general do not help performance [Boehm 12] 15 benchmarks had races, performance degradation in only 3 barnes (11%), radiosity (5%), raytrace_parsec (8%)

Benchmarks serial round-robin dynamic-S dynamic-A hybrid
splash barnes 1.10 0.98 0.95 0.96 0.99 cholesky 3.39 2.39 1.07 1.05 fft 4.36 1.02 1.01 fmm 6.34 1.33 1.16 1.13 1.19 lu_cb 1.00 lu_ncb ocean_cp ocean_ncp radiosity 7.58 3.04 1.09 1.08 2.67 radix raytrace 7.72 2.93 1.88 volrend 6.12 1.91 1.67 water_nsquared water_spatial parsec blackscholes bodytrack 5.87 1.04 dedup 5.04 1.77 1.63 1.34 facesim 6.19 ferret 3.19 1.58 1.23 1.25 fluidanimate 1.81 0.97 7.26 1.52 1.06 streamcluster swaptions vips 7.61 5.27 1.31 average slowdown 3.61 1.60 1.17 maximum slowdown

the fundamental cost of determinism is small.
For this set of benchmarks and our platform, and implementation overhead set aside, the fundamental cost of determinism is small.

What is the performance cost of insisting on the same schedule across different environments?

Schedule-Record-Perturb-Replay Framework
1 2 application application schedule thread1 thread2 scheduler replayer serial hybrid round-robin perturber idle small perturbations architectures dynamic-A dynamic-S NUMA background processes recorder DVFS

Perturber Small perturbations (context switches, thread migrations, page faults) Simulate first order effects by inserting small delays (μs and ms) Background processes Spawn additional threads and control their work to sleep ratio Dynamic voltage and frequency scaling (DVFS) Use Linux’s cpufreq system to explore different DVFS policies Non-uniform memory access (NUMA) Spread threads over two NUMA nodes Asymmetric architectures Use DVFS to create asymmetry [Shelepov et al. 2009]

Metric Deterministic slowdown
𝑑𝑒𝑡𝑒𝑟𝑚𝑖𝑛𝑖𝑠𝑡𝑖𝑐 𝑒𝑥𝑒𝑐𝑢𝑡𝑖𝑜𝑛 𝑡𝑖𝑚𝑒 𝑛𝑜𝑛 𝑑𝑒𝑡𝑒𝑟𝑚𝑖𝑛𝑖𝑠𝑡𝑖𝑐 𝑒𝑥𝑒𝑐𝑢𝑡𝑖𝑜𝑛 𝑡𝑖𝑚𝑒 Same conditions during both runs, for example 𝑑𝑒𝑡𝑒𝑟𝑚𝑖𝑛𝑖𝑠𝑡𝑖𝑐 𝑒𝑥𝑒𝑐𝑢𝑡𝑖𝑜𝑛 𝑡𝑖𝑚𝑒 𝑤𝑖𝑡ℎ 𝑏𝑎𝑐𝑘𝑔𝑟𝑜𝑢𝑛𝑑 𝑝𝑟𝑜𝑐𝑒𝑠𝑠𝑒𝑠 𝑛𝑜𝑛 𝑑𝑒𝑡𝑒𝑟𝑚𝑖𝑛𝑖𝑠𝑡𝑖𝑐 𝑒𝑥𝑒𝑐𝑢𝑡𝑖𝑜𝑛 𝑡𝑖𝑚𝑒 𝑤𝑖𝑡ℎ 𝑏𝑎𝑐𝑘𝑔𝑟𝑜𝑢𝑛𝑑 𝑝𝑟𝑜𝑐𝑒𝑠𝑠𝑒𝑠

Benchmarks Quiet Small perturbations Backgroud proc. DVFS NUMA
Asym. Arch. balanced unbalanced auto manual 4/4 1/7 splash barnes 0.96 0.95 0.97 0.92 0.91 0.94 cholesky 1.05 1.06 1.25 1.02 1.08 1.03 1.09 fft 1.01 1.07 1.00 fmm 1.13 1.19 1.24 1.14 1.15 lu_cb 0.99 0.98 lu_ncb ocean_cp ocean_ncp radiosity 1.94 1.11 1.46 1.71 radix raytrace 1.92 1.44 1.69 volrend 1.38 1.55 water_nsquared water_spatial parsec blackscholes bodytrack 1.04 1.51 1.33 1.56 dedup 1.35 1.31 1.29 1.32 1.64 facesim ferret 1.23 1.21 1.37 1.10 fluidanimate 1.77 1.39 1.63 streamcluster swaptions vips 1.43 1.53 avg. slowdown 1.17 max. slowdown

Insisting on the same schedule in the presence of skewed conditions
can slow down execution by a factor of almost 2x.

Conclusions Employed the schedule-record-replay framework to divorce implementation overhead from the fundamental cost of enforcing deterministic execution Fundamental cost of determinism is small (4% on avg., 33 % max.) There is room for lowering overheads in current deterministic systems Measured this fundamental cost across a range of execution environments The cost of raises to almost 2x when threads face skewed conditions Do we need a more relaxed definition of determinism? Quantified various sources of non-determinism Deterministic logical clocks are not deterministic (not only due to the performance counters imperfections [Weaver et al. 2013])

Thank you!

What is the Cost of Determinism?

Similar presentations

Presentation on theme: "What is the Cost of Determinism?"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

What is the Cost of Determinism?

Similar presentations

Presentation on theme: "What is the Cost of Determinism?"— Presentation transcript:

Similar presentations

About project

Feedback