Behavior of Synchronization Methods in Commonly Used Languages and Systems Yiannis Nikolakopoulos Joint work with: D. Cederman, B.

Behavior of Synchronization Methods in Commonly Used Languages and Systems Yiannis Nikolakopoulos ioaniko@chalmers.se Joint work with: D. Cederman, B. Chatterjee, N. Nguyen, M. Papatriantafilou, P. Tsigas Distributed Computing and Systems Chalmers University of Technology Gothenburg, Sweden

Developing a multithreaded application… Yiannis Nikolakopoulos ioaniko@chalmers.se 2 The boss wants.NET The client wants speed… (C++?) Java is nice Multicores everywhere

Yiannis Nikolakopoulos ioaniko@chalmers.se 3 The worker threads need to access data Concurrent Data Structures Then we need Synchronization. Developing a multithreaded application…

Implementation Coarse Grain Locking Fine Grain Locking Test And SetArray LocksAnd more! Yiannis Nikolakopoulos ioaniko@chalmers.se 4 Implementing Concurrent Data Structures Performance Bottleneck

Implementation Coarse Grain Locking Fine Grain Locking Test And SetArray LocksAnd more!Lock Free Yiannis Nikolakopoulos ioaniko@chalmers.se 5 Implementing Concurrent Data Structures Hardware platform Which is the fastest/most scalable?

Implementing concurrent data structures Yiannis Nikolakopoulos ioaniko@chalmers.se 6

Problem Statement How the interplay of the above parameters and the different synchronization methods, affect the performance and the behavior of concurrent data structures. Yiannis Nikolakopoulos ioaniko@chalmers.se 7

Outline Introduction Experiment Setup Highlights of Study and Results Conclusion Yiannis Nikolakopoulos ioaniko@chalmers.se 8

Which data structures to study? Represent different levels of contention: Queue - 1 or 2 contention points Hash table - multiple contention points Yiannis Nikolakopoulos ioaniko@chalmers.se 9

How do we choose implementation? Possible criteria: Framework dependencies Programmability “Good” performance Yiannis Nikolakopoulos ioaniko@chalmers.se 10

Interpreting “good” Throughput: The more operations completed per time unit the better. Is this enough? Yiannis Nikolakopoulos ioaniko@chalmers.se 11

Non-fairness Yiannis Nikolakopoulos ioaniko@chalmers.se 12

What to measure? Yiannis Nikolakopoulos ioaniko@chalmers.se 13 Operations by thread i Average operations per thread

Implementation Parameters Yiannis Nikolakopoulos ioaniko@chalmers.se 14 Programming Environments C++JavaC# (.NET,Mono) Synchronization Methods TAS, TTAS, Lock-free, Array lock PMutex, Lock-free memory management Reentrant, synchronized lock construct, Mutex NUMA Architectures Intel Nehalem, 2 x 6 core (24 HW threads) AMD Bulldozer, 4 x 12 core (48 HW threads) Do they influence fairness?

Experiment Parameters Different levels of contention Number of threads Measured time intervals Yiannis Nikolakopoulos ioaniko@chalmers.se 15

Outline Queue – Fairness – Intel vs AMD – Throughput vs Fairness Hash Table – Intel vs AMD – Scalability Introduction Experiment Setup Highlights of Study and Results Conclusion Yiannis Nikolakopoulos ioaniko@chalmers.se 16

Fairness can change along different time intervals 24 Threads, High contention Yiannis Nikolakopoulos ioaniko@chalmers.se 17 Observations: Queue

Significantly different fairness behavior in different architectures 24 Threads, High contention Yiannis Nikolakopoulos ioaniko@chalmers.se 18 Observations: Queue Fairness

Significantly different fairness behavior in different architectures 24 Threads, High contention Lock-free is less affected in this case Yiannis Nikolakopoulos ioaniko@chalmers.se 19 Observations: Queue Fairness

Queue: Throughput vs Fairness Fairness 0.6 s, IntelThroughput Yiannis Nikolakopoulos ioaniko@chalmers.se 20 0 0,2 0,4 0,6 0,8 1 2468122448 Fairness Threads C++ TTASLock-freePMutex 0 2 4 6 8 10 12 14 16 2468122448 Operations per ms (thousands) Threads C++

Observations: Hash table Operations are distributed in different buckets Things get interesting when #threads > #buckets Tradeoff between throughput and fairness – Different winners and losers – Contention is lowered in the linked list components Yiannis Nikolakopoulos ioaniko@chalmers.se 21

Fairness differences in Hash table across architectures 24 Threads, High contention Yiannis Nikolakopoulos ioaniko@chalmers.se 22 Observations: Hash table

Fairness differences in Hash table across architectures 24 Threads, High contention Lock-free is again not affected Yiannis Nikolakopoulos ioaniko@chalmers.se 23 Observations: Hash table

In C++, custom memory management and lock-free implementations excel in scalability and performance. Yiannis Nikolakopoulos ioaniko@chalmers.se 24

Conclusion Complex synchronization mechanisms (Pmutex, Reentrant lock) pay off in heavily contended hot spots Scalability via more complex, inherently parallel designs and implementations Tradeoff between throughput and fairness – LF Hash table – Reentrant lock vs Array Lock vs LF Queue Fairness can be heavily influenced by HW – Interesting exceptions Yiannis Nikolakopoulos ioaniko@chalmers.se 25 Which is the fastest/most scalable? Is fairness influenced by NUMA?

Behavior of Synchronization Methods in Commonly Used Languages and Systems Yiannis Nikolakopoulos Joint work with: D. Cederman, B.

Similar presentations

Presentation on theme: "Behavior of Synchronization Methods in Commonly Used Languages and Systems Yiannis Nikolakopoulos Joint work with: D. Cederman, B."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Behavior of Synchronization Methods in Commonly Used Languages and Systems Yiannis Nikolakopoulos Joint work with: D. Cederman, B.

Similar presentations

Presentation on theme: "Behavior of Synchronization Methods in Commonly Used Languages and Systems Yiannis Nikolakopoulos Joint work with: D. Cederman, B."— Presentation transcript:

Similar presentations

About project

Feedback