Download presentation
Presentation is loading. Please wait.
Published byDwayne Marshall Modified over 9 years ago
1
G. Venkataramani, I. Doudalis, Y. Solihin, M. Prvulovic HPCA ’08 Reading Group Presentation 02/14/2008
2
Tainting Schemes extremely useful for security and debugging purposes ◦ Eg TaintCheck, PointerCheck Implemented in Software ◦ Usually some kind of DBI ◦ Extremely Versatile ◦ Really Slow ◦ Problems with Multithreaded Apps, JIT compilation, and self-modifying Code
3
So, make hardware for it ◦ Multiple examples: Raksha, Minos, etc ◦ Fast ◦ Can deal with strange codes that trouble S/W ◦ Extensive modifications in the OoO core, caches, buses, memories required ◦ Limit the state which can be manipulated, usually to a few bits, easily managed by H/W ◦ So, who is going to implement it? Solution: FlexiTaint ◦ Use H/W to accelerate what the S/W is doing Common Case Propagation, and metadata manipulation
4
RISC ISA
5
Taint State 1..16 bits per word 1-Level table in the application address space ◦ Protected from the application ◦ No need to widen buses, caches etc ◦ L1-T cache for taint bits: 4 kB for 2-bit states No changing L1-D, no port contention ◦ Taint state shares L2
6
2 Registers for that ◦ MTBR: Memory Taint Base Register: start of the table ◦ FTCR: FlexiTaint configuration Register: bits/word ◦ Both must be saved on a context switch by the O/S All loads/stores prefetch taint state to L1-T State 0..0 is assumed to be a safe one State can manipulated directly by special instructions ◦ Must be added somehow after special events Read a file, malloc, input purging etc
7
Takes place after the OoO core ◦ Can be turned off and completely bypassed if unnecessary The normal Commit becomes Pre-CoMmiT A software handler receives 4 arguments: ◦ OpCode, Reg1 State, Reg2 State, Mem State And returns the output state and whether an exception should be raised Handler address stored in TPCHR ◦ Restricted access register
8
The answer of the S/W handler for the same inputs will be the same ◦ Cache it 128 entry direct mapped response cache Indexed by opcode, Reg1 state, Reg2 state, Mem State (folded in 7 bits) Stores the Output State and Exception bit Cleared every time the TPCHR (software handler address register) is changed ◦ Usually on context switch
9
After the OoO core has ended. Size of the Architectural Register File, NOT the physical one State of Reg0 hardwired to 0 Reserved for instructions that touch memory Example: For instructions that do not touch memory ◦ Remember RISC ISA ALARM! 128-entry Direct Mapped Cleared when TPCHR changes
10
Suppresses silent stores Example: Stores
11
Still, TPCache lookups take 1 cycle If dependent instructions were retired in the same cycle, the In Order taint propagation will stall ◦ Pressure to the physical register file and ROB Well, usually 0..00 is good, and when zeroes are combined, the result is 0..00 Also, if only one Non-zero, then usually you have unary propagation Create a table to store that
12
Stores for each opcode (256) 2-bit value ◦ 512 bits total, must be stored on context switch Really fast lookups, allows for same-cycle propagation
13
4 stage in order pipeline ◦ Receives non-speculative instructions First 2 stages: Look up ◦ Filter TPT ◦ L1-T 3 rd stage Taint Propagation ◦ TPC Lookup ◦ Or trivial propagation through Filter TPT 4 th stage commit
14
Summary of what the O/S needs to store on context switches ◦ TPCHR (handler address) ◦ FTCR (state size) ◦ MTBR (shadow state address) ◦ Filter TPT content (64 bytes) The TPCache can simply be discarded All state in the address space of the application ◦ So swapping, virtualization, etc normally
15
Data and Metadata accessed in 2 different cycles ◦ Potential consistency issues Solution for Loads: ◦ Prefetch State when data address is resolved ◦ If state does not hit in the L1-T a few cycles later, replay the load Solution for Stores: ◦ Prefetch State (same with load) ◦ Write only when data/metadata both hit in the L1 Usually L1-T is always a hit due to prefetch
16
1 st : TaintCheck 1 bit state per word ◦ Allows for maximum optimization 10 in the Filter TPT (unary propagation and zero optimization) ◦ TPCache and S/W will consider XOR R1,R1,R1 cases 2 nd : 1-bit PointerCheck ◦ Stores which words are valid heap pointers ◦ Good for leak detection ◦ And something that Raksha cannot handle ◦ Filter TPT: 01 (non-pointers produce non-pointers) 3 rd : A Combination with 2-bit states ◦ Filter TPT: 01 (untainted non-pointers produce untainted non-pointers)
17
TaintCheck Rules1-bit Heap PointerCheck
18
SESC simulator 8-core system 4-issue OoO superscalar cores @ 2.93GHz L1-D 32-Kbytes, 8-way set associative, dual ported, 64 byte blocks L2 4MBytes 16-way set associate, single- ported, 64-byte blocks ◦ Small for 8 core system L1-T: 4 KB, 4-ways set associative, dual ported, 64-byte blocks Bus 64-bits wide @ 1333 MHz
19
~1% for SPEC 2K and 4% for Splash2 Splash 2 is worse due to false sharing of metadata
22
Smaller Cache line → Less false sharing of Metadata
23
For 4 KB ~1% overhead for SPEC 2k 8 KB minimal gains 2 KB 2.8% overhead Conclusion: 4 KB is fine for 1 and 2 bit states
24
Use FlexiTaint to simulate previously proposed hardware And implement the lifeguard that they couldn’t handle (1- bit Heap PointerCheck) Obviously FlexiTaint proves better
25
Versatile scheme to handle most lifeguards with low overhead Nice idea to cache the answer of the software handler In general, a good idea ◦ With its limitation though (LockSet) Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.