Release Consistency Yujia Jin 2/27/02
Motivations Place partial order on memory accesses for correct parallel program behavior Relax partial order for memory accesses overlap Tradeoff between programmer productivity and processor performance
Basic Assumptions Uniprocessor control and data dependences are respected Memory coherence Read write to same the address is serialized
Previous Consistency Models Sequential Consistency (SC) Each processor runs in program order Operations of all processors serialized Processor Consistency (PC) Same as SC except Read can bypass write before write is performed Non-atomic write Weak Consistency (WCsc) Memory access cannot be reordered pass synchronization accesses Synchronization accesses are sequential consistent
Access Classification Shared access CompetingNon-Competing synchronizationNon-synchronization acquirerelease
Key Observations Acquire Getting permission from other processors for subsequent memory accesses Previous memory accesses can be overlapped Release Giving permission to other processors for previous memory accesses Subsequent memory accesses can be overlapped
Release Consistency Cannot start memory access before previous acquires are performed Cannot start release access before previous memory accesses are performed Competing accesses in PC
Comparison store load store SC store load store PC acquire release load/store … load/store nsync store load/store … load/store … load/store acquire release load/store … load/store acquire release nsync store acquire release WCsc acquire release load/store … load/store nsync store load/store … load/store … load/store acquire release load/store … load/store acquire release nsync store acquire release RCpc
Properly-Labeled Program Add enough sync L lables such that there is an appropriate sync L separating any possible pairing of two conflicting memory accesses from different processors, where at least one of the access is ordinary L Competing special L (shared access) shared L Less conservative label gives better performance Label by compiler or by programmer thourgh predefined synchronization constucts Shared access CompetingNon-Competing synchronizationNon-synchronization acquirerelease shared L special L sync L nsync L ordinary L acq L rel L
Implementation Use fence operations to block memory accesses Conditional block, only if some relevant memory accesses have not performed Three types of fence, full, write, immediate Fence operations is flexible, can implement SC, PC, WC, RC, …
Discussion One step closer to message passing? With the additional complexity, how much improvement do we get?