Presentation is loading. Please wait.

Presentation is loading. Please wait.

Notary: Hardware Techniques to Enhance Signatures Luke Yen Collaborator: Prof. Stark C. Draper Advisor: Prof. Mark D. Hill University of Wisconsin, Madison.

Similar presentations


Presentation on theme: "Notary: Hardware Techniques to Enhance Signatures Luke Yen Collaborator: Prof. Stark C. Draper Advisor: Prof. Mark D. Hill University of Wisconsin, Madison."— Presentation transcript:

1 Notary: Hardware Techniques to Enhance Signatures Luke Yen Collaborator: Prof. Stark C. Draper Advisor: Prof. Mark D. Hill University of Wisconsin, Madison MICRO-41 - November 11, 2008 www.cs.wisc.edu/multifacet/papers/micro08_notary.pdf

2 Executive Summary Tackle 2 problems with hardware signatures: Problem 1: Best signature hashing (i.e., H 3 ) has high area & power overheads Solution 1: Use entropy analysis to guide lower-cost hashing (Page-Block-XOR, PBX) that performs similar to H 3 –Ex: 160 gates for H 3 vs 20 gates for PBX Problem 2: Spurious signature conflicts caused by signature bits set by private memory addrs Solution 2: Avoid inserting private stack addrs, propose privatization interface for higher performance 10/21/2015 University of Wisconsin-Madison 2

3 Outline Signature background Entropy Entropy results & PBX Privatization Methodology & workloads Results Conclusions & Future Work 10/21/2015 University of Wisconsin-Madison 3

4 Signature background Signatures (hardware Bloom filters) used to summarize and detect conflicts with a transaction’s read- and write-sets –Inspired by Bulk system [Ceze,ISCA’06] –Implemented in LogTM-SE [Yen,HPCA’07] –Can have false positives, but never false negatives –Also proposed for non-TM purposes (e.g., SC violation detection, atomicity violation detection, race recording) Ex: Use k Bloom filters of size m/k, with independent hash functions 10/21/2015 University of Wisconsin-Madison 4

5 Signature hash functions Which hash function is best? [Sanchez, MICRO’07] –Bit-selection? Hash simply decodes some number of input bits –H 3 ? Each bit of a hash value is an XOR of (on avg.) half of the input address bits 10/21/2015 University of Wisconsin-Madison 5 Result: H 3 better with >=2 hash functions However, H 3 uses many multi-level XOR trees Can we improve this? LogTM-SE w/ 2kb signatures

6 H 3 implementation Num XOR Ex: 2kb signatures, k=2, c=10, 32-bit addr = 160 XOR gates per signature Can we reduce the total gate count? 10/21/2015 University of Wisconsin-Madison 6

7 Outline Signature background Entropy Entropy results & PBX Privatization Methodology & workloads Results Conclusions & Future Work 10/21/2015 University of Wisconsin-Madison 7

8 Entropy overview Not all address bits have equal randomness –Ex: High-level address bits unlikely to change if working set size is small Key insight: If input bits are random and those bits are used as inputs to hash functions, random hash values result –Use entropy to measure bit randomness Entropy – measure of the uncertainty of a random variable x 10/21/2015 University of Wisconsin-Madison 8

9 Entropy formally defined Entropy = p(x i ) = the probability of the occurrence of value x i N = number of sample values random variable x can take on Entropy = amount of information required on average to describe outcome of variable x (in bits) –Ex: What is the best possible lossless compression? 10/21/2015 University of Wisconsin-Madison 9 n-bit field has constant value All bit patterns in n-bit field equally likely Entropy value of n-bit field 0 bits n bits min max Other cases

10 Our measures of entropy For our workloads, we care about: Q1: What is the best achievable entropy? –Global entropy – upper bound on entropy of address Q2: How does entropy change within an address? –Local entropy – entropy of bit-field within the address 10/21/2015 University of Wisconsin-Madison 10 Addr 31 6 Global entropy Addr 31 6 Local entropy NSkip

11 Outline Signature background Entropy Entropy results & PBX Privatization Methodology & workloads Results Conclusions & Future Work 10/21/2015 University of Wisconsin-Madison 11

12 Entropy results Workloads to be described later Global entropy is at most 16 bits Bit-window for local entropy is 16 bits wide (NSkip from 0-10) –Smaller windows (<16b) may not reach global entropy value –Larger windows (>16b) hides some fine-grain info 10/21/2015 University of Wisconsin-Madison 12

13 Entropy results summary More entropy results in our MICRO paper In summary, for our workloads entropy monotonically decreases when moving towards high-order bits –We calculate the average entropy across the entire workload’s execution –May miss entropy changes due to program phase behavior Our Page-Block-XOR (PBX) hash takes advantage of this overall trend 10/21/2015 University of Wisconsin-Madison 13

14 Page-Block-XOR (PBX) Motivated by 3 findings: –(1) Lower-order bits have most entropy Follows from our entropy results –(2) XORing two bit-fields produces random hash values From prior work on XOR hashing (e.g., data placement in caches, DRAM) –(3) Bit-field overlaps can lead to higher false positives Correlation between the two bit-fields can reduce the range of hash values produced (worse for larger signatures) 10/21/2015 University of Wisconsin-Madison 14

15 PBX implementation For 2kb signatures with 2 hash functions: –20 XOR gates for PBX vs 160 XOR gates for H 3 ! 10/21/2015 University of Wisconsin-Madison 15 PPN and Cache-index fields not tied to system params: Use entropy to find two non-overlapping bit-fields with high randomness

16 Summary thus far Problem 1: H 3 has high area & power overheads Solution 1: Use entropy analysis to guide lower-cost PBX –Ex: 160 gates for H 3 vs 20 gates for PBX Problem 2: Spurious signature conflicts caused by signature bits set by private memory addrs Solution 2: To be described 10/21/2015 University of Wisconsin-Madison 16

17 Outline Signature background Entropy Entropy results & PBX Privatization Methodology & workloads Results Conclusions & Future Work 10/21/2015 University of Wisconsin-Madison 17

18 Motivation False conflicts caused by thread-private addrs –Avoid conflicts if addrs not inserted in thread’s signatures 10/21/2015 University of Wisconsin-Madison 18

19 Privatization solutions Two solutions proposed: –(1) Remove private stack references from sigs. Very little work for programmer/compiler Benefits depend on fraction of stack addresses versus all transactional references –(2) Language-level interface (e.g., private_malloc(), shared_malloc() ) Even higher performance boost For skilled programmer WARNING: Incorrectly marking shared objects as private can lead to program errors! 10/21/2015 University of Wisconsin-Madison 19

20 Page-based implementation Each page is assigned a status, private or shared –Invariant: Page is shared if any object is shared If stack is private, library marks stack pages as private If using privatization heap functions, mark heap pages accordingly 10/21/2015 University of Wisconsin-Madison 20

21 OS support OS allocates different physical page frames for shared and private pages –Sets a per-frame bit in translation entry if shared –Reduce number of page frames used by packing objects with same status together Signatures insert memory addresses of transactional references to shared pages –Query page sharing bit in HW TLB & current transactional status 10/21/2015 University of Wisconsin-Madison 21

22 Outline Signature background Entropy Entropy results & PBX Privatization Methodology & workloads Results Conclusions & Future Work 10/21/2015 University of Wisconsin-Madison 22

23 Methodology Full-system simulation using Simics and Wisconsin GEMS timing modules Transistor-level design for area & power of XOR gates CACTI for Bloom filter bit array area & power Simulated system –Single-chip CMP –16 single-threaded,in-order cores –32kB, 4-way private L1 I & D, write-back –8MB, 8-way shared L2 cache –MESI directory protocol –Signatures from 64b-64kb (8B-8kB) & “Perfect” 10/21/2015 University of Wisconsin-Madison 23

24 Workloads Micro-benchmarks –BTree – read and write ops on shared tree –Sparse Matrix – algorithm from dense column vector multiplication kernel SPLASH-2 apps –Barnes & Raytrace – exert most signature pressure Stanford STAMP apps –Vacation, Genome, Delaunay, Bayes, Labyrinth DNS server –BIND 10/21/2015 University of Wisconsin-Madison 24

25 Outline Signature background Entropy Entropy results & PBX Privatization Methodology & workloads Results Conclusions & Future Work 10/21/2015 University of Wisconsin-Madison 25

26 PBX vs H 3 area & power Area & power overheads (2kb, k=4): 10/21/2015 University of Wisconsin-Madison 26 Type of overhead Bloom filter bit array H 3 hashPBX hash H 3 sig.PBX sig.% savings for PBX sig. Area (mm 2 ) 2.70e-28.10e-34.70e-43.50e-22.70e-223 Power (mW) 1.80e21.04e11.021.90e21.81e24.7

27 PBX vs H 3 execution time 10/21/2015 University of Wisconsin-Madison 27 PBX performs similar to H 3 Additional workload results in paper

28 Privatization results summary Removing private stack references from signatures did not help much –Most addr references not to stack –Most likely because running with SPARC ISA. Other ISAs (e.g., x86) likely has more benefits Privatization interface helps four workloads –Remainder either does not have private heap structures or does not have high transactional duty cycle 10/21/2015 University of Wisconsin-Madison 28

29 Privatization interface results 10/21/2015 University of Wisconsin-Madison 29

30 Outline Signature background Entropy Entropy results & PBX Privatization Methodology & workloads Results Conclusions & Future Work 10/21/2015 University of Wisconsin-Madison 30

31 Conclusions Tackle 2 problems with signature designs: –(1) Area and power overheads of H 3 hashing E.g., 160 XOR gates for H 3, 20 for PBX –(2) False conflicts due to signature bits set by private memory references Our solutions: –(1) Use entropy analysis to guide hashing function (PBX), a low-cost alternative that performs similarly to H 3 –(2) Prevent private stack references from entering signatures, and propose a privatization interface for heap allocations Notary can be applied to non-TM uses: –PBX hashing can directly transfer –Privatization may transfer if addr filtering applies 10/21/2015 University of Wisconsin-Madison 31

32 Future Work Dynamic entropy calculation: –How to adapt PBX hashing to entropy changes over time? Dynamic privatization characteristics: –How common is it for objects to change sharing status (i.e., from private to shared, and vice versa)? 10/21/2015 University of Wisconsin-Madison 32

33 BACKUP SLIDES 10/21/2015 University of Wisconsin-Madison 33

34 Privatization interface 10/21/2015 University of Wisconsin-Madison 34 Privatization functionUsage shared_malloc(size), private_malloc(size) Dynamic allocation of shared and private memory objects shared_free(ptr), private_free(ptr) Frees up memory allocated by shared or private allocators privatize_barrier(num_threads, ptr, size), publicize_barrier(num_threads, ptr, size) Program threads come to a common point to privatize or publicize an object. Must be used outside of transactions

35 Dynamic privatization Dynamically switch from private to shared, and vice versa If transitioning from private -> shared, safe to mark page as shared (at cost of performance) If transitioning from shared -> private, default policy is to disallow if there exists other shared objects on same page Otherwise, trap to user software and let programmer call shared_free(), followed by private_malloc() on object 10/21/2015 University of Wisconsin-Madison 35

36 Bit-field overlaps harmful for PBX 10/21/2015 University of Wisconsin-Madison 36

37 Removing stack refs doesn’t help significantly 10/21/2015 University of Wisconsin-Madison 37

38 Entropy of commercial workloads 10/21/2015 University of Wisconsin-Madison 38

39 10/21/2015 University of Wisconsin-Madison 39 Signature Operation Example Program: xbegin LD A ST B LD C LD D ST C … 00000000 00000100 00000010 00100100 00100010 Hash Function(s) 00000000 R W A B C D External ST E 00100100 00100010 ALIAS FALSE POSITIVE: CONFLICT! External ST F 00100100 00100010 NO CONFLICT

40 Type of Hash Functions In real programs, addresses neither independent nor uniformly distributed (key assumptions to derive P FP (n)) But can generate hash values that are almost uniformly distributed and uncorrelated with good (universal/almost universal) hash functions Hash functions considered: 10/21/2015 University of Wisconsin-Madison 40 Bit-selection (inexpensive, low quality) H 3 [Carter, CSS79] (moderate, higher quality)


Download ppt "Notary: Hardware Techniques to Enhance Signatures Luke Yen Collaborator: Prof. Stark C. Draper Advisor: Prof. Mark D. Hill University of Wisconsin, Madison."

Similar presentations


Ads by Google