Presentation is loading. Please wait.

Presentation is loading. Please wait.

G. Venkataramani, I. Doudalis, Y. Solihin, M. Prvulovic HPCA ’08 Reading Group Presentation 02/14/2008.

Similar presentations


Presentation on theme: "G. Venkataramani, I. Doudalis, Y. Solihin, M. Prvulovic HPCA ’08 Reading Group Presentation 02/14/2008."— Presentation transcript:

1 G. Venkataramani, I. Doudalis, Y. Solihin, M. Prvulovic HPCA ’08 Reading Group Presentation 02/14/2008

2  Tainting Schemes extremely useful for security and debugging purposes ◦ Eg TaintCheck, PointerCheck  Implemented in Software ◦ Usually some kind of DBI ◦ Extremely Versatile ◦  Really Slow ◦  Problems with Multithreaded Apps, JIT compilation, and self-modifying Code

3  So, make hardware for it ◦ Multiple examples: Raksha, Minos, etc ◦ Fast ◦ Can deal with strange codes that trouble S/W ◦  Extensive modifications in the OoO core, caches, buses, memories required ◦  Limit the state which can be manipulated, usually to a few bits, easily managed by H/W ◦ So, who is going to implement it?  Solution: FlexiTaint ◦ Use H/W to accelerate what the S/W is doing  Common Case Propagation, and metadata manipulation

4 RISC ISA

5  Taint State 1..16 bits per word  1-Level table in the application address space ◦ Protected from the application ◦ No need to widen buses, caches etc ◦ L1-T cache for taint bits: 4 kB for 2-bit states  No changing L1-D, no port contention ◦ Taint state shares L2

6  2 Registers for that ◦ MTBR: Memory Taint Base Register: start of the table ◦ FTCR: FlexiTaint configuration Register: bits/word ◦ Both must be saved on a context switch by the O/S  All loads/stores prefetch taint state to L1-T  State 0..0 is assumed to be a safe one  State can manipulated directly by special instructions ◦ Must be added somehow after special events  Read a file, malloc, input purging etc

7  Takes place after the OoO core ◦ Can be turned off and completely bypassed if unnecessary  The normal Commit becomes Pre-CoMmiT  A software handler receives 4 arguments: ◦ OpCode, Reg1 State, Reg2 State, Mem State  And returns the output state and whether an exception should be raised  Handler address stored in TPCHR ◦ Restricted access register

8  The answer of the S/W handler for the same inputs will be the same ◦ Cache it  128 entry direct mapped response cache  Indexed by opcode, Reg1 state, Reg2 state, Mem State (folded in 7 bits)  Stores the Output State and Exception bit  Cleared every time the TPCHR (software handler address register) is changed ◦ Usually on context switch

9 After the OoO core has ended. Size of the Architectural Register File, NOT the physical one State of Reg0 hardwired to 0 Reserved for instructions that touch memory  Example: For instructions that do not touch memory ◦ Remember RISC ISA ALARM! 128-entry Direct Mapped Cleared when TPCHR changes

10 Suppresses silent stores  Example: Stores

11  Still, TPCache lookups take 1 cycle  If dependent instructions were retired in the same cycle, the In Order taint propagation will stall ◦ Pressure to the physical register file and ROB  Well, usually 0..00 is good, and when zeroes are combined, the result is 0..00  Also, if only one Non-zero, then usually you have unary propagation  Create a table to store that

12  Stores for each opcode (256) 2-bit value ◦ 512 bits total, must be stored on context switch  Really fast lookups, allows for same-cycle propagation

13  4 stage in order pipeline ◦ Receives non-speculative instructions  First 2 stages: Look up ◦ Filter TPT ◦ L1-T  3 rd stage Taint Propagation ◦ TPC Lookup ◦ Or trivial propagation through Filter TPT  4 th stage commit

14  Summary of what the O/S needs to store on context switches ◦ TPCHR (handler address) ◦ FTCR (state size) ◦ MTBR (shadow state address) ◦ Filter TPT content (64 bytes)  The TPCache can simply be discarded  All state in the address space of the application ◦ So swapping, virtualization, etc normally

15  Data and Metadata accessed in 2 different cycles ◦ Potential consistency issues  Solution for Loads: ◦ Prefetch State when data address is resolved ◦ If state does not hit in the L1-T a few cycles later, replay the load  Solution for Stores: ◦ Prefetch State (same with load) ◦ Write only when data/metadata both hit in the L1  Usually L1-T is always a hit due to prefetch

16  1 st : TaintCheck 1 bit state per word ◦ Allows for maximum optimization 10 in the Filter TPT (unary propagation and zero optimization) ◦ TPCache and S/W will consider XOR R1,R1,R1 cases  2 nd : 1-bit PointerCheck ◦ Stores which words are valid heap pointers ◦ Good for leak detection ◦ And something that Raksha cannot handle ◦ Filter TPT: 01 (non-pointers produce non-pointers)  3 rd : A Combination with 2-bit states ◦ Filter TPT: 01 (untainted non-pointers produce untainted non-pointers)

17 TaintCheck Rules1-bit Heap PointerCheck

18  SESC simulator  8-core system  4-issue OoO superscalar cores @ 2.93GHz  L1-D 32-Kbytes, 8-way set associative, dual ported, 64 byte blocks  L2 4MBytes 16-way set associate, single- ported, 64-byte blocks ◦ Small for 8 core system  L1-T: 4 KB, 4-ways set associative, dual ported, 64-byte blocks  Bus 64-bits wide @ 1333 MHz

19 ~1% for SPEC 2K and 4% for Splash2 Splash 2 is worse due to false sharing of metadata

20

21

22 Smaller Cache line → Less false sharing of Metadata

23  For 4 KB ~1% overhead for SPEC 2k  8 KB minimal gains  2 KB 2.8% overhead  Conclusion: 4 KB is fine for 1 and 2 bit states

24  Use FlexiTaint to simulate previously proposed hardware  And implement the lifeguard that they couldn’t handle (1- bit Heap PointerCheck)  Obviously FlexiTaint proves better

25  Versatile scheme to handle most lifeguards with low overhead  Nice idea to cache the answer of the software handler  In general, a good idea ◦ With its limitation though (LockSet)  Questions?


Download ppt "G. Venkataramani, I. Doudalis, Y. Solihin, M. Prvulovic HPCA ’08 Reading Group Presentation 02/14/2008."

Similar presentations


Ads by Google