Download presentation
Presentation is loading. Please wait.
Published byBritton Robbins Modified over 9 years ago
1
Conditional Memory Ordering Christoph von Praun, Harold W.Cain, Jong-Deok Choi, Kyung Dong Ryu Presented by: Renwei Yu Published in Proceedings of the 33nd International Symposium on Computer Architecture, 2006.
2
Motivation Modern multiprocessor systems need memory barrier instructions in the program to specify the memory ordering Conventionally, we can guarantee memory ordering by using locks or barriers, it leads to superfluous memory barriers in programs. We need a mechanism to reduce unnecessary memory ordering.
3
Redundancies of memory ordering in conventional locking algorithms Lock operation on lock variable l Unlock operation on lock variablel Neither private nor shared caches provide both goals
4
Source of memory ordering redundancy Thread-confinement of lock variables. Memory ordering that occurs for lock variables that are solely accessed by a single thread are redundant Thread locality of locking. Locality of locking is a situation where consecutive acquires of a lock variable are made by the same thread Eager releases and repetitive acquires. CMPs change Latency-Capacity Tradeoff in two ways
5
CMO-conditional memory ordering CMO is demonstrated on a lock algorithm that identifies those dynamic lock/unlock operations for which memory ordering is unnecessary, and speculatively omits the associated memory ordering instructions. When ordering is required, this algorithm relies on a hardware mechanism for initiating a memory ordering operation on another processor.
6
CMO-conditional memory ordering Acquire of lock l with conditional memory ordering
7
CMO-conditional memory ordering Release of lock l with conditional memory ordering
8
CMO-conditional memory ordering Memory synchronization model is different: the release synchronization is omitted at the unlock operation and “recovered” at the lock operation – only if necessary. Necessity is determined according to a release number that is communicated between the thread that unlocks l and the thread that subsequently locks l.
9
Release numbers relnum ⇐ (id & release ctr. of current proc) a value that reflects a combination of a processor id and a counter of the release synchronization operations(release counter) that the respective processor performed at a certain stage during the execution of a program.
10
Conditional memory ordering Based on the release number, the system arranges that release synchronization is recovered at the processor that previously released the lock, but only if necessary. (sync conditional) implies (sync acquire) at the processor that issues the instruction.
11
Hardware support for CMO Logical operation Release vector entry Register operand Comparison of release counters Release vector support To support low latency reads, a copy of the release vector is mirrored in local storage at each processor. Broadcast operation Release hints Instruct a processor to increment its release counter as soon as the conditions are met
12
Evaluation S-CMO: A software CMO prototype The result show that CMO avoids memory ordering operations for the vast majority of dynamic acquire and release operations across a set of multithreaded Java workloads, leading to significant speedups for many. However, performance improvements in the software prototype are hindered by the high cost of remote memory ordering.
13
Experimental Methodology Use a set of single and multi-threaded Java benchmarks from Java Grande and SPEC benchmark suites. Run these applications on IBM’s J9 productive virtual machine. Performed on both Power4 and Power5 multiprocessor systems running AIX, with 4 and 6 processors respectively.
14
Software CMO prototype with hardware support Hardware-based (sync conditional) and (sync remote) implementation
15
Software CMO prototype with hardware support CMO performance while varying remote sync latency in high-cost (Power4)memory ordering implementation.
16
Software CMO prototype with hardware support CMO performance while varying remote sync latency in high-cost (Power5)memory ordering implementation.
17
Future Proposal Hardware Proposal Software Proposal
18
Summary It developed a algorithm called conditional memory ordering (CMO), that can eliminates redundant memory ordering operations and improves the performance of the system effectively. It summaries the characters the of synchronization and memory ordering operations in lock intensive Java workloads and demonstrate that a lot of memory ordering operations occur superfluously. It evaluates the performance improvement of CMO. It gives a Hardware proposals of CMO and its hardware implementation using a software prototype and an analytical model.
19
Conclusions CMO can significantly improve the performance of multiprocessor systems. With hardware support, CMO offers significant performance benefits across our set of Java benchmarks when assuming a reasonable remote synchronization latency.
20
Thank you Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.