Presentation is loading. Please wait.

Presentation is loading. Please wait.

Behavior of Synchronization Methods in Commonly Used Languages and Systems Yiannis Nikolakopoulos Joint work with: D. Cederman, B.

Similar presentations


Presentation on theme: "Behavior of Synchronization Methods in Commonly Used Languages and Systems Yiannis Nikolakopoulos Joint work with: D. Cederman, B."— Presentation transcript:

1 Behavior of Synchronization Methods in Commonly Used Languages and Systems Yiannis Nikolakopoulos ioaniko@chalmers.se Joint work with: D. Cederman, B. Chatterjee, N. Nguyen, M. Papatriantafilou, P. Tsigas Distributed Computing and Systems Chalmers University of Technology Gothenburg, Sweden

2 Developing a multithreaded application… Yiannis Nikolakopoulos ioaniko@chalmers.se 2 The boss wants.NET The client wants speed… (C++?) Java is nice Multicores everywhere

3 Yiannis Nikolakopoulos ioaniko@chalmers.se 3 The worker threads need to access data Concurrent Data Structures Then we need Synchronization. Developing a multithreaded application…

4 Implementation Coarse Grain Locking Fine Grain Locking Test And SetArray LocksAnd more! Yiannis Nikolakopoulos ioaniko@chalmers.se 4 Implementing Concurrent Data Structures Performance Bottleneck

5 Implementation Coarse Grain Locking Fine Grain Locking Test And SetArray LocksAnd more!Lock Free Yiannis Nikolakopoulos ioaniko@chalmers.se 5 Implementing Concurrent Data Structures Hardware platform Which is the fastest/most scalable?

6 Implementing concurrent data structures Yiannis Nikolakopoulos ioaniko@chalmers.se 6

7 Problem Statement How the interplay of the above parameters and the different synchronization methods, affect the performance and the behavior of concurrent data structures. Yiannis Nikolakopoulos ioaniko@chalmers.se 7

8 Outline Introduction Experiment Setup Highlights of Study and Results Conclusion Yiannis Nikolakopoulos ioaniko@chalmers.se 8

9 Which data structures to study? Represent different levels of contention: Queue - 1 or 2 contention points Hash table - multiple contention points Yiannis Nikolakopoulos ioaniko@chalmers.se 9

10 How do we choose implementation? Possible criteria: Framework dependencies Programmability “Good” performance Yiannis Nikolakopoulos ioaniko@chalmers.se 10

11 Interpreting “good” Throughput: The more operations completed per time unit the better. Is this enough? Yiannis Nikolakopoulos ioaniko@chalmers.se 11

12 Non-fairness Yiannis Nikolakopoulos ioaniko@chalmers.se 12

13 What to measure? Yiannis Nikolakopoulos ioaniko@chalmers.se 13 Operations by thread i Average operations per thread

14 Implementation Parameters Yiannis Nikolakopoulos ioaniko@chalmers.se 14 Programming Environments C++JavaC# (.NET,Mono) Synchronization Methods TAS, TTAS, Lock-free, Array lock PMutex, Lock-free memory management Reentrant, synchronized lock construct, Mutex NUMA Architectures Intel Nehalem, 2 x 6 core (24 HW threads) AMD Bulldozer, 4 x 12 core (48 HW threads) Do they influence fairness?

15 Experiment Parameters Different levels of contention Number of threads Measured time intervals Yiannis Nikolakopoulos ioaniko@chalmers.se 15

16 Outline Queue – Fairness – Intel vs AMD – Throughput vs Fairness Hash Table – Intel vs AMD – Scalability Introduction Experiment Setup Highlights of Study and Results Conclusion Yiannis Nikolakopoulos ioaniko@chalmers.se 16

17 Fairness can change along different time intervals 24 Threads, High contention Yiannis Nikolakopoulos ioaniko@chalmers.se 17 Observations: Queue

18 Significantly different fairness behavior in different architectures 24 Threads, High contention Yiannis Nikolakopoulos ioaniko@chalmers.se 18 Observations: Queue Fairness

19 Significantly different fairness behavior in different architectures 24 Threads, High contention Lock-free is less affected in this case Yiannis Nikolakopoulos ioaniko@chalmers.se 19 Observations: Queue Fairness

20 Queue: Throughput vs Fairness Fairness 0.6 s, IntelThroughput Yiannis Nikolakopoulos ioaniko@chalmers.se 20 0 0,2 0,4 0,6 0,8 1 2468122448 Fairness Threads C++ TTASLock-freePMutex 0 2 4 6 8 10 12 14 16 2468122448 Operations per ms (thousands) Threads C++

21 Observations: Hash table Operations are distributed in different buckets Things get interesting when #threads > #buckets Tradeoff between throughput and fairness – Different winners and losers – Contention is lowered in the linked list components Yiannis Nikolakopoulos ioaniko@chalmers.se 21

22 Fairness differences in Hash table across architectures 24 Threads, High contention Yiannis Nikolakopoulos ioaniko@chalmers.se 22 Observations: Hash table

23 Fairness differences in Hash table across architectures 24 Threads, High contention Lock-free is again not affected Yiannis Nikolakopoulos ioaniko@chalmers.se 23 Observations: Hash table

24 In C++, custom memory management and lock-free implementations excel in scalability and performance. Yiannis Nikolakopoulos ioaniko@chalmers.se 24

25 Conclusion Complex synchronization mechanisms (Pmutex, Reentrant lock) pay off in heavily contended hot spots Scalability via more complex, inherently parallel designs and implementations Tradeoff between throughput and fairness – LF Hash table – Reentrant lock vs Array Lock vs LF Queue Fairness can be heavily influenced by HW – Interesting exceptions Yiannis Nikolakopoulos ioaniko@chalmers.se 25 Which is the fastest/most scalable? Is fairness influenced by NUMA?


Download ppt "Behavior of Synchronization Methods in Commonly Used Languages and Systems Yiannis Nikolakopoulos Joint work with: D. Cederman, B."

Similar presentations


Ads by Google