Department of Computer Sciences ISMM No Bit Left Behind: The Limits of Heap Data Compression Jennifer B. Sartor* Martin Hirzel †, Kathryn S. McKinley* *U Texas at Austin, † IBM Watson
Department of Computer Sciences ISMM Current State Managed languages ubiquitous Embedded devices Multicore Need memory efficiency! CPUL1 L2 CPU L1 L2
Department of Computer Sciences ISMM Memory Efficiency of Managed Languages X COST X 8-94% information content in heap in 37 benchmarks. [Mitchell & Sevitsky, OOPSLA 07] X Boxed objects X Trailing zeros in arrays X Redundant objects X Extra bit-width X Data structure back-bones bzip2 86% OPPORTUNITY Memory layout abstraction (Location + size) != identity
Department of Computer Sciences ISMM Related Work Ananian & Rinard. LCTES 03Dom value field hash Appel & Goncalves. Tech Report 93Eql obj sharing, Const field elide, Bit-width reduction Chen, Kandemir & Irwin. VEE 05Dom value field elide Chen, et al. OOPSLA 03Zero compr, Trail zero trim Cooprider & Regehr. PLDI 07Value set indirection Marinov & O’Callahan. OOPSLA 03Eql obj sharing Stephenson, Babb & Amarasinghe. PLDI 00 Const field elide, Bit-width reduction Titzer, et al. PLDI 07Value set indirection Zilles. ISMM 07Bit-width reduction
Department of Computer Sciences ISMM Limit Study Quantitatively compare heap data compression Surveyed literature Savings equations Methodology for evaluation Apples-to-apples comparison Future work: implementation Hybrid techniques Findings: array & hybrid compression 58%
Department of Computer Sciences ISMM Hybrid Array Compression x0001 x005 8 x000 1 x000 4 x000 1 x000 0 x000 1 Redundancy Equal array sharing x0001 x005 8 x000 1 x000 4 x000 1 x000 0 x000 1
Department of Computer Sciences ISMM Equal Object Sharing Marinov & O’Callahan. OOPSLA 03; Appel & Goncalves. Tech Report 93 Two objects are equal if both Same class & all fields have same value Strictly-equal: pointer fields identical Deep: objects pointer targets are equal JVM store only 1 copy in hashtable 14% Class C, N objects, D distinct; save:
Department of Computer Sciences ISMM Hybrid Array Compression x0001 x005 8 x000 1 x000 4 x000 1 x000 0 x000 1 Redundancy Equal array sharing Value set indirection x0001 x005 8 x000 1 x000 4 x000 1 x000 0 x000 1 Dictionary: x0001x005 8 x000 4 x
Department of Computer Sciences ISMM Value Set Indirection & Caching Cooprider & Regehr/ Titzer, et al. PLDI 07 For object field or array elements with large range of values Dictionary (or cache) of 256 most frequent values, instance stores small 1 byte indices 14% If > 256 values, 255 in dictionary, 256th says to store rest (M) in hashtable w/ objectID
Department of Computer Sciences ISMM Hybrid Array Compression 2 x00A 0 x007 3 x000 2 x000 1 x010 1 x000 0 Remove zeros Trim trailing zeros Bit width reduce Zero compress x00A 0 x007 3 x000 2 x000 1 x x0A 0 x07 3 x00 2 x00 1 x x0 A x7 3 x2x00 1 x xAF
Department of Computer Sciences ISMM Zero-based Object Compression Chen, et al. OOPSLA 03 Remove bytes that are entirely zero Per object bit-map: 1 bit per byte Store only non-zero bytes 45% Savings:
Department of Computer Sciences ISMM Hybrid Array Compression 2 x00A 0 x007 3 x000 2 x000 1 x010 1 x000 0 Remove zeros Trim trailing zeros Bit width reduce Zero compress x00A 0 x007 3 x000 2 x000 1 x x0A 0 x07 3 x00 2 x00 1 x x0 A x7 3 x2x00 1 x xAF
Department of Computer Sciences ISMM Methodology Program run Heap dump series Analysis representation t Model 1 – Model n … s Limit savings Garbage Collection snapshot
Department of Computer Sciences ISMM Experimental Details Jikes Research Virtual Machine Java-in-Java DaCapo benchmarks + pseudojbb heap snapshots per benchmark MarkSweep with 2x min heap Analysis Per class Objects and arrays separated JVM+app vs application (separated in paper) Per heap snapshot, and over all snapshots
Department of Computer Sciences ISMM TechniqueClassArrayGC/Ru n Lempel-Ziv compression XGC Strictly-equal object sharing ObjTypeGC Deep-equal object sharing ObjTypeGC Zero-based object compression ObjInstGC Trailing zero array trimming InstGC Bit-width reduction FldInstGC/Run Dominant-value field hashing FldGC Lazy invariant computation FldGC Value set indirection FldTypeGC Value set caching FldTypeGC Constant field elision FldRun Dominant-value field elision FldRun
Department of Computer Sciences ISMM Value Indirection & Cache Deep Equal Sharing Zero Compression Hybrid Compression
Department of Computer Sciences ISMM Stability of Savings fop: snapshots over time
Department of Computer Sciences ISMM Conclusions Limit study compare apples-to-apples heap data compression techniques Potential to reduce memory inefficiencies in managed languages Arrays Hybrids Future: save space Challenge: efficient detection & recovery Thank you!