Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 U NIVERSITY OF M ASSACHUSETTS, A MHERST School of Computer Science P REDATOR : Predictive False Sharing Detection Tongping Liu*, Chen Tian, Ziang Hu,

Similar presentations


Presentation on theme: "1 U NIVERSITY OF M ASSACHUSETTS, A MHERST School of Computer Science P REDATOR : Predictive False Sharing Detection Tongping Liu*, Chen Tian, Ziang Hu,"— Presentation transcript:

1 1 U NIVERSITY OF M ASSACHUSETTS, A MHERST School of Computer Science P REDATOR : Predictive False Sharing Detection Tongping Liu*, Chen Tian, Ziang Hu, Emery Berger* *University of Massachusetts Amherst Huawei US Research Center

2 2 Parallelism: Expectation is Awesome Runtime (s) Expectation Parallel Program int count[8]; int W; void increment(int S) { for(in=S; in<S+W; in++) for(j=0; j<1M; j++) count[in]++; } int main(int THREADS) { W=8/THREADS; for(i=0; i<8; i+=W) spawn(increment,i); }

3 3 False sharing slows the program by 13X Runtime (s) Parallel Program Expectation Reality Parallelism: Reality is Awful int count[8]; int W; void increment(int S) { for(in=S; in<S+W; in++) for(j=0; j<1M; j++) count[in]++; } int main(int THREADS) { W=8/THREADS; for(i=0; i<8; i+=W) spawn(increment,i); } False sharing

4 4 False Sharing in Real Applications False sharing slows MySQL by 50%

5 5 Cache Line False Sharing vs. True Sharing

6 6 Task 3Task 1 Task 2Task 4 False Sharing Task 1 True Sharing Task 2 False Sharing vs. True Sharing

7 7 Resource Contention at Cache Line Level

8 8 Thread 1 Main Memory Core 1 Thread 2 Core 2 Cache Invalidate Cache line: basic unit of data transfer False Sharing Causes Performance Problems

9 9 Thread 1 Thread 2 Cache Invalidate Interleaved accesses cause cache invalidations Main Memory Core 1 Core 2 False Sharing Causes Performance Problems

10 10 me = 1; you = 1; // globals me = new Foo; you = new Bar; // heap class X { int me; int you; }; // fields array[me] = 12; array[you] = 13; // array indices False Sharing is Everywhere

11 11 False Sharing is Hard to Diagnose Multiple experts worked together to diagnose MySQL scalability issue (1.5M LOC)

12 12 Problems of Existing Tools No precise information/false positives – WIBA’09, VEE’11, EuroSys’13, SC’13 Accurate & Precise – OOPSLA’11 ( Cannot detect read-write FS) Shared problem: only detect observed false sharing

13 13 Task 1 Task 2 Cache Invalidat e Main Memory Core 1 Core 2 False Sharing Causes Performance Problems Find cache lines with many cache invalidations Interleaved accesses Cache invalidations Performance problems Detect false sharing causing performance problems

14 14 Find Lines with Many Invalidations....... …… Track cache invalidations on each cache line Memory: Global, Heap

15 15 Track Cache Invalidations Hardware-based approach – Needs hardware support – No portability Simulation-based approach – Needs hardware info such as cache hierarchy, cache capacity – Very slow Conservative Assumptions – Each thread runs on a different core with its private cache. – Infinite cache capacity. P REDATOR : based on memory access history of each cache line

16 16 Track Cache Invalidations rwrwwrwr T1T2 0 0 # of invalidations 1 1 2 2 Time 3 3 0 0 0 0 0 0 0 0 T2 r r T1 r r T2 w w Each Entry: { Thread ID, Access Type} T2 w w 0 0 0 0 T1 w w T2 w w 0 0 0 0 T1 r r

17 17 P REDATOR Components Compiler Instrumentation Runtime System Instruments every memory read/write access Collects memory accesses and reports false sharing

18 18 Detect Problems Correctly & Precisely Correctly: – No false alarms Task 3Task 1 Task 2Task 4 False Sharing Task 1 True Sharing Task 2 Track memory accesses on each word Precisely – Global variables – Heap objects: pinpoint the line of memory allocation

19 19 P REDATOR ’s Report

20 20 Why do we need prediction?

21 21 Necessity of False Sharing Prediction Thread 1Thread 2 Cache line 1Cache line 2 Cache line 1Cache line 2 False Sharing Cache line 1 False Sharing

22 22 Properties Affecting False Sharing Occurrence  32-bit platform   64-bit platform  Different memory allocator  Different compiler or optimization  Different allocation order by changing the code, e.g., printf Change of memory layout Run on hardware with different cache line size

23 23 Example of False Sharing Sensitivity Offset = 0Offset = 8Offset = 56 …… Memory Colors represent threads Cache line size = 64 bytes

24 24 P REDATOR predicts false sharing problems without occurrence Example of False Sharing Sensitivity

25 25 Prediction Based on Virtual Cache Lines Thread 1Thread 2 Cache line 1Cache line 2 Virtual cache line 1Virtual cache line 2 False Sharing Virtual cache line 1 False Sharing Real case Prediction 1 Prediction 2

26 26 d Y X (sz-d)/2 Tracked virtual line Non-tracked virtual lines Track Invalidations on Virtual Cache Lines  d < the cache line size - sz  (X, Y) from different threads && one of them is write

27 27 Benchmark Results BenchmarksUnknown Problem Without Prediction With PredictionImprovement Histogram ✔✔✔ 46% Linear_regression ✔ 1207% Reverse_index ✔✔ 0.09% Word_count ✔✔ 0.14% Streamcluster-1 ✔✔✔ 4.77% Streamcluster-2 ✔✔ 7.52%

28 28 Real Applications Results MySQL – Problem: False sharing occurs when different threads update the shared bitmap simultaneously. – Performance improves 180% after fixes. Boost library: – Problem: “there will be 16 spinlocks per cache line” – Performance improves about 100%.

29 29 Performance Overhead of P REDATOR 5.6X

30 30 Compiler Instrumentation Runtime System Thread 1 Thread 2 Cache Invalidat e Main Memory Core 1 Core 2 Precise report Thread 1Thread 2 Cache line 1Cache line 2 Virtual cache line 1Virtual cache line 2 False Sharing Virtual cache line 1 False Sharing Real case Prediction 1 Prediction 2

31 31

32 32 False Sharing is Hard to Diagnose Multiple experts worked together to diagnose MySQL scalability issue (1.5M LOC)

33 33 Detailed Prediction Algorithm 1. Find suspected cache lines

34 34 Detailed Prediction Algorithm 1. Find suspected cache lines 2. Track detailed memory accesses

35 35 Detailed Prediction Algorithm 1. Find suspected cache lines 2. Track detailed memory accesses 3. Predict based on hot accesses Y X d d < sz && (X, Y) from different threads, potential false sharing

36 36 4: Tracking Cache Invalidations on the Virtual Line d Y X (sz-d)/2 Tracked virtual line Non-tracked virtual lines


Download ppt "1 U NIVERSITY OF M ASSACHUSETTS, A MHERST School of Computer Science P REDATOR : Predictive False Sharing Detection Tongping Liu*, Chen Tian, Ziang Hu,"

Similar presentations


Ads by Google