Qin Zhao (MIT) Derek Bruening (VMware) Saman Amarasinghe (MIT) Efficient Memory Shadowing for 64-bit Architectures ISMM 2010, Toronto, Canada June 6, 2010
Dynamic Program Analysis Understand Program Behavior –Optimization –Debugging –Security –Memory management Shadow Memory Tools –Maintain meta-data for every memory location –Update meta-data on every memory operation ISMM, Toronto, Canada, 6/6/2010 2
Examples Memory Error Detection –MemCheck [VEE’07] –Purify [USENIX’92] –Dr. Memory Dynamic Information Flow Tracking –LIFT [MICRO’39] –TaintTrace [ISCC’06] Multi-threaded Program Analysis –Eraser [TCS’97] –Helgrind Memory Usage Analysis –CETS [ISMM’10] –Staleness ISMM, Toronto, Canada, 6/6/2010 3
Shadow Memory System Shadow Memory Manager –Meta-data for application memory –Memory mapping scheme (addr A addr S ) DMS (Direct Mapping) SMS (Segmented Mapping) Instrumentor –Every memory operation Address calculation Meta-data update –Expensive MemCheck (~25x) –~12x for addr A addr S ISMM, Toronto, Canada, 6/6/2010 a.out stack libc Application Memory Shadow Memory heap 4
Direct Mapping Scheme (DMS) Single memory region for entire address space. Translation: Issue: address conflict between mem A and mem S ISMM, Toronto, Canada, 6/6/2010 lea [addr] %r1 add %r1 disp %r1 Slowdown relative to native execution Application Shadow 5
Slowdown relative to native execution Segmented Mapping Scheme (SMS) Shadow segment per application segment Translation: –Segment lookup (address indexing) –Address translation ISMM, Toronto, Canada, 6/6/2010 lea [addr] %r1 mov %r1 %r2 shr %r2, 16 %r2 add %r1, disp[%r2] %r1 addr A addr S App 1 Shd 1 Shd 2 App 2 Segment table 6
Kernel space Shadow Memory Mapping Scaling to 64-bit Architecture –DMS Infeasible due to memory layout ISMM, Toronto, Canada, 6/6/2010 a.out Unusable space stack User space vsyscall
Shadow Memory Mapping Scaling to 64-bit Architecture –DMS Infeasible due to memory layout –Single-Level SMS Too big (~4 billion entries) ISMM, Toronto, Canada, 6/6/2010 addr A 8
Shadow Memory Mapping Scaling to 64-bit Architecture –DMS Infeasible due to memory layout –Single-Level SMS Too big (~4 billion entries) –Multi-Level SMS Even more expensive ISMM, Toronto, Canada, 6/6/2010 Slowdown relative to native execution addr A 9
Umbra (CGO’10) Scaling to 64-bit Architecture –Single-Level SMS is too big but sparse Umbra (CGO’10) –Eliminate empty entries –Compact table –Walk the table to find the entry ISMM, Toronto, Canada, 6/6/
Slowdown relative to native execution Umbra (CGO’10) Reference Uni-Cache –Software cache per instr per thread Segment tag & displacement Check uni-cache before table walk 99.97% hit ratio ISMM, Toronto, Canada, 6/6/ tag = addr A & mask; if (cache tag != tag) { … // table walk} addr S = addr A + cache disp
EMS64: Key Idea Umbra EMS64 –Speculatively use a disp without check –Smart shadow memory placement Notified by memory access violation fault for incorrect displacement ISMM, Toronto, Canada, 6/6/ tag = addr A & mask; if (cache tag != tag) { … // table walk (0.03%)} addr S = addr A + cache disp
EMS64: Example A0 A2 S0 0: Application 2: Shadow 11: Application 12: Unavailable S2 10: Shadow 13: Unavailable 15: Unavailable 14: Unavailable 6: Shadow 7: Application A1 S1 Displacement: {-1, 2} ISMM, Toronto, Canada, 6/6/ : Reserved 13: Unavailable/Reserved 15: Unavailable/Reserved
EMS64: Potential Problem A0 A2 S0 0: Application 2: Shadow 11: Application 12: Unavailable S2 10: Shadow 14: Unavailable 6: Shadow 7: Application A1 S1 Displacement: {-1, 2} ISMM, Toronto, Canada, 6/6/ : Reserved 13: Unavailable/Reserved 15: Unavailable/Reserved
EMS64: Final Solution A0 A2 S0 0: Application 2: Shadow 11: Application 12: Unavailable S2 10: Shadow 13: Unavailable/Reserved 15: Unavailable/Reserved 14: Unavailable 6: Shadow 7: Application A1 S1 Displacement: {-1, 2} ISMM, Toronto, Canada, 6/6/ : Reserved 4: Reserved 5: Reserved 1: Reserved 12: Unavailable/Reserved 8: Reserved
Slot Finding Problem Given n slots: –k Application slots –x Empty slots –y Reserved slots Find k S-slots. –For each slot A i, there is one associated slot S with displacement d i where d i = S i - A i. –For each slot A i and each existing displacement d j where d i ≠d j, slot ((A i + d j ) mod n) is an R-slot or an E-slot. –For each slot S and any existing valid displacement d i slot, slot ((S + d i ) mod n) is an R-slot or an E-slot. ISMM, Toronto, Canada, 6/6/ A0A0 A1A1 E0E0 E1E1 E2E2 E3E3 E4E4 R0R0 AiAi Application slot Shadow slot EiEi Empty slot RiRi Reserved slot SiSi S0S0 S1S1 R1R1 R2R2
Slot Finding Problem Given n slots: –k Application slots –x Empty slots –y Reserved slots Can We Find k S-slots? –Depends on layout! –Guarantee to find it, for 48-bit address space, if Application memory < 250 GB –Proof x ≥ 8k 2 +2k+1 We can always find an S i for A i if #E-slot > #conflicts ISMM, Toronto, Canada, 6/6/ AiAi Application slot Shadow slot EiEi Empty slot RiRi Reserved slot SiSi
Implementation & Optimization Implementation –Shadow memory allocation –Add signal handler –Remove reference uni-cache check Optimization –Restore uni-cache checks for instructions that access multiple segments, e.g., references from memcpy When number of access violation exceed 2 ISMM, Toronto, Canada, 6/6/2010 lea [addr] %r1 add %r1, unicache disp %r1 18
Experimental Results Slowdown relative to native execution ISMM, Toronto, Canada, 6/6/
Thank You Download – Q & A ISMM, Toronto, Canada, 6/6/
Slot Finding Example Can always find a solution –No AiAi Application slot SiSi Shadow slot EiEi Empty slot RiRi Reserved slot A0A0 A1A1 E0E0 E1E1 E2E2 E3E3 E4E4 R0R0 E-slotsS 0 (disp)ConflictS 1 (disp)Conflict E0E0 Х (1)E A 1 X (7)E A 0 E1E1 √ (3)√ (1) E2E2 X (4)E A 0 X (2)A A 1 E3E3 X (5)E A 1 X (3)E A 0 E4E4 X (6)A A 0 X (4)E A 1 ISMM, Toronto, Canada, 6/6/