Download presentation
Presentation is loading. Please wait.
Published byVictoria Brooks Modified over 9 years ago
1
Behavior of Synchronization Methods in Commonly Used Languages and Systems Yiannis Nikolakopoulos ioaniko@chalmers.se Joint work with: D. Cederman, B. Chatterjee, N. Nguyen, M. Papatriantafilou, P. Tsigas Distributed Computing and Systems Chalmers University of Technology Gothenburg, Sweden
2
Developing a multithreaded application… Yiannis Nikolakopoulos ioaniko@chalmers.se 2 The boss wants.NET The client wants speed… (C++?) Java is nice Multicores everywhere
3
Yiannis Nikolakopoulos ioaniko@chalmers.se 3 The worker threads need to access data Concurrent Data Structures Then we need Synchronization. Developing a multithreaded application…
4
Implementation Coarse Grain Locking Fine Grain Locking Test And SetArray LocksAnd more! Yiannis Nikolakopoulos ioaniko@chalmers.se 4 Implementing Concurrent Data Structures Performance Bottleneck
5
Implementation Coarse Grain Locking Fine Grain Locking Test And SetArray LocksAnd more!Lock Free Yiannis Nikolakopoulos ioaniko@chalmers.se 5 Implementing Concurrent Data Structures Hardware platform Which is the fastest/most scalable?
6
Implementing concurrent data structures Yiannis Nikolakopoulos ioaniko@chalmers.se 6
7
Problem Statement How the interplay of the above parameters and the different synchronization methods, affect the performance and the behavior of concurrent data structures. Yiannis Nikolakopoulos ioaniko@chalmers.se 7
8
Outline Introduction Experiment Setup Highlights of Study and Results Conclusion Yiannis Nikolakopoulos ioaniko@chalmers.se 8
9
Which data structures to study? Represent different levels of contention: Queue - 1 or 2 contention points Hash table - multiple contention points Yiannis Nikolakopoulos ioaniko@chalmers.se 9
10
How do we choose implementation? Possible criteria: Framework dependencies Programmability “Good” performance Yiannis Nikolakopoulos ioaniko@chalmers.se 10
11
Interpreting “good” Throughput: The more operations completed per time unit the better. Is this enough? Yiannis Nikolakopoulos ioaniko@chalmers.se 11
12
Non-fairness Yiannis Nikolakopoulos ioaniko@chalmers.se 12
13
What to measure? Yiannis Nikolakopoulos ioaniko@chalmers.se 13 Operations by thread i Average operations per thread
14
Implementation Parameters Yiannis Nikolakopoulos ioaniko@chalmers.se 14 Programming Environments C++JavaC# (.NET,Mono) Synchronization Methods TAS, TTAS, Lock-free, Array lock PMutex, Lock-free memory management Reentrant, synchronized lock construct, Mutex NUMA Architectures Intel Nehalem, 2 x 6 core (24 HW threads) AMD Bulldozer, 4 x 12 core (48 HW threads) Do they influence fairness?
15
Experiment Parameters Different levels of contention Number of threads Measured time intervals Yiannis Nikolakopoulos ioaniko@chalmers.se 15
16
Outline Queue – Fairness – Intel vs AMD – Throughput vs Fairness Hash Table – Intel vs AMD – Scalability Introduction Experiment Setup Highlights of Study and Results Conclusion Yiannis Nikolakopoulos ioaniko@chalmers.se 16
17
Fairness can change along different time intervals 24 Threads, High contention Yiannis Nikolakopoulos ioaniko@chalmers.se 17 Observations: Queue
18
Significantly different fairness behavior in different architectures 24 Threads, High contention Yiannis Nikolakopoulos ioaniko@chalmers.se 18 Observations: Queue Fairness
19
Significantly different fairness behavior in different architectures 24 Threads, High contention Lock-free is less affected in this case Yiannis Nikolakopoulos ioaniko@chalmers.se 19 Observations: Queue Fairness
20
Queue: Throughput vs Fairness Fairness 0.6 s, IntelThroughput Yiannis Nikolakopoulos ioaniko@chalmers.se 20 0 0,2 0,4 0,6 0,8 1 2468122448 Fairness Threads C++ TTASLock-freePMutex 0 2 4 6 8 10 12 14 16 2468122448 Operations per ms (thousands) Threads C++
21
Observations: Hash table Operations are distributed in different buckets Things get interesting when #threads > #buckets Tradeoff between throughput and fairness – Different winners and losers – Contention is lowered in the linked list components Yiannis Nikolakopoulos ioaniko@chalmers.se 21
22
Fairness differences in Hash table across architectures 24 Threads, High contention Yiannis Nikolakopoulos ioaniko@chalmers.se 22 Observations: Hash table
23
Fairness differences in Hash table across architectures 24 Threads, High contention Lock-free is again not affected Yiannis Nikolakopoulos ioaniko@chalmers.se 23 Observations: Hash table
24
In C++, custom memory management and lock-free implementations excel in scalability and performance. Yiannis Nikolakopoulos ioaniko@chalmers.se 24
25
Conclusion Complex synchronization mechanisms (Pmutex, Reentrant lock) pay off in heavily contended hot spots Scalability via more complex, inherently parallel designs and implementations Tradeoff between throughput and fairness – LF Hash table – Reentrant lock vs Array Lock vs LF Queue Fairness can be heavily influenced by HW – Interesting exceptions Yiannis Nikolakopoulos ioaniko@chalmers.se 25 Which is the fastest/most scalable? Is fairness influenced by NUMA?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.