Download presentation
Presentation is loading. Please wait.
Published byJuniper Fields Modified over 8 years ago
1
HARD: Hardware-Assisted lockset- based Race Detection P.Zhou, R.Teodorescu, Y.Zhou. HPCA’07 Shimin Chen LBA Reading Group Presentation
2
Motivation Data race detection important S/W solutions slow (not good for production runs) Previous H/W solutions focus on happens- before relation Cannot detect potential races
3
Motivating Example
4
Solution: HARD (h/w lockset) Challenges: – How to efficiently store and maintain lockset for each variable in hardware? – How to efficiently perform the set operation in the lockset algorithm? Main ideas (will be detailed later) – h/w bloom filter – Piggybacking on cache coherence protocols – Reset all bloom filters after exiting a barrier
5
Outline LockSet (refresh our memory) HARD Evaluation Conclusion
6
Main Lockset Algorithm Idea: accesses to every shared variable should be protected by some common lock. Data structures: – Thread t’s current lock set: L(t) – Candidate set for a variable v: C(v) Algorithms – Modify L(t) upon lock acquire and release – Initiate C(v) to be a set of all locks – When t accesses v, C(v)=C(v) L(t) – If C(v) == then report violation on variable v
7
Reducing False Positives
8
Outline LockSet (refresh our memory) HARD Evaluation Conclusion
9
HARD Overview LState: exclusive, shared, etc. BFVector: candidate lock set for the cache line Lock Register: Thread’s lockset Counter Register: used for resolving hash collisions (more detail later) 2bits16bits 32bits
10
HARD Overview: Operations A lock a ‘1’ in bloom filter Fetching a line from memory: set the BFVector to all 1s, LState to exclusive Update BFVector and LState on accesses Communicate them through coherence protocol Lock register: thread’s lock set 2b16b 32b
11
Bloom Filter Bloom filter: A bit vector that represents a set of keys – A key is hashed d (e.g. d=3) times and represented by d bits Construct: for every key in the set, set its 3 bits in vector Membership Test: given a key, check if all its 3 bits are 1 – Definitely not in the set if some bits are 0 – May have false positives 00011100011001000001 Bit 0 =H 0 (key)Bit 1 =H 1 (key)Bit 2 =H 2 (key) Filter
12
Representing LockSet as Bloom Filter 4 hash functions Lockset Intersection: bloom filter intersection Lockset empty: any of the 4bits are all 0
13
False Negative Caused by Bloom Filter
14
Prob of False Negatives Suppose the candidate set contains m locks Given a lock, probability of recognizing it as a member: prob_whole = prob_part k prob_part = 1 – (1-1/n) m When k=4, n=4: – 0.0039 (m=1), 0.037 (m=2), 0.111 (m=3) – Paper says: “experiments show that no races were missed” But what if the thread currently holds multiple locks? n bits k parts k=4, n=4
15
If threads hold 1 to 8 locks (not in the paper) n bits =4 k parts =4 ----------------------------------------------- m=1 m=2 m=3 m=4 t=1 : 0.0039 0.0366 0.1117 0.2184 t=2 : 0.0078 0.0719 0.2109 0.3891 t=3 : 0.0117 0.1059 0.2991 0.5225 t=4 : 0.0155 0.1387 0.3774 0.6267 t=5 : 0.0194 0.1702 0.4469 0.7083 t=6 : 0.0232 0.2006 0.5087 0.7720 t=7 : 0.0270 0.2299 0.5636 0.8218 t=8 : 0.0308 0.2581 0.6123 0.8607 -----------------------------------------------
16
Try another design n bits =8 k parts =8 ----------------------------------------------- m=1 m=2 m=3 m=4 t=1 : 0.0000 0.0000 0.0001 0.0009 t=2 : 0.0000 0.0000 0.0003 0.0017 t=3 : 0.0000 0.0000 0.0004 0.0026 t=4 : 0.0000 0.0000 0.0006 0.0034 t=5 : 0.0000 0.0000 0.0007 0.0043 t=6 : 0.0000 0.0001 0.0008 0.0051 t=7 : 0.0000 0.0001 0.0010 0.0060 t=8 : 0.0000 0.0001 0.0011 0.0069 -----------------------------------------------
17
Unlock operation remove bit from bloom filter? 32 bit counter register each bloom filter bit has 2 bit counter Increment the 2-bit counter if the bloom filter bit is set Unlock: decrement the 2-bit counter, if 0, clear bloom filter bit 2b16b 32b
18
Candidate Set and LState Communications must broadcast changes to C(v) if cache line is in shared state
19
Handling Barriers Set BFVectors to all 1s after exiting a barrier (what if t2 does not hold any lock?)
20
Three Approximations Bloom filter to represent lockset Lockset info only in cache – Can only detect races in a short window of execution Cache line granularity – False sharing – Compiler to put shared variables to different lines? – Removing false sharing is generally good
21
Outline LockSet (refresh our memory) HARD Evaluation Conclusion
22
Methodology SESC: cycle-accurate execution-driven simulator (MIPS instruction set) Six SPLASH-2 benchmarks Randomly inject a data race: randomly remove a dynamic instance of lock and corresponding unlock Compare with happens-before, ideal lockset
23
Bug detected, false alarms Ideal: word-granularity, keep state in memory, perfect lockset # of false alarms is # of source code locations, dynamic errors are much more
24
Mainly bus traffic increase Note that HARD requires bloom filter operation per memory access in processor pipeline
25
Conclusion Main idea: bloom filter to represent lockset Three approximations: – Bloom filter to represent lockset – Lockset info only in cache – Cache line granularity Problems: – Lockset: false positives – Seems hard to add operations into processor pipeline – Are these the right approximations for monitoring production runs?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.