No Bit Left Behind: The Limits of Heap Data Compression Jennifer B. Sartor* Martin Hirzel†, Kathryn S. McKinley* *U Texas at Austin, †IBM Watson
Current State CPU L1 L2 CPU L1 L2 Managed languages ubiquitous Embedded devices Multicore CPU L1 L2 CPU L1 L2 Need memory efficiency!
Memory Efficiency of Managed Languages COST 8-94% information content in heap in 37 benchmarks. [Mitchell & Sevitsky, OOPSLA 07] Boxed objects Trailing zeros in arrays Redundant objects Extra bit-width Data structure back-bones bzip2 86% OPPORTUNITY Memory layout abstraction (Location + size) != identity
Related Work Ananian & Rinard. LCTES 03 Equal obj sharing Appel & Goncalves. Tech Report 93 Dom value field hash, const field elide, Bit-width Chen, Kandemir & Irwin. VEE 05 Dom value field elide Chen, et al. OOPSLA 03 Zero compr, Trail zero trim Cooprider & Regehr. PLDI 07 Value set indirection Marinov & O’Callahan. OOPSLA 03 Eql obj sharing Stephenson, Babb & Amarasinghe. PLDI 00 Const field elide, Bit-width reduction Titzer, et al. PLDI 07 Zilles. ISMM 07 Bit-width reduction
Limit Study 58% Quantitatively compare heap data compression Surveyed literature Savings equations Methodology for evaluation Apples-to-apples comparison Future work: implementation Hybrid techniques 58% Findings: array & hybrid compression
Compression Example Redundancy x0001 x0058 x0004 x0000 x0001 x0058 Equal array sharing
Equal Object Sharing 14% Two objects are equal if both Marinov & O’Callahan. OOPSLA 03; Appel & Goncalves. Tech Report 93 Two objects are equal if both Same class & all fields have same value Strictly-equal: pointer fields identical Deep: objects pointer targets are equal JVM store only 1 copy in hashtable 14% Class C, N objects, D distinct; save:
Compression Example Redundancy x0001 x0058 x0004 x0000 Dictionary: Equal array sharing Value set indirection Dictionary: x0001 x0058 x0004 x0000 1 2 3
Value Set Indirection & Caching Cooprider,Regehr’07/ Titzer,Palsberg’07 For object field or array elements with large range of values Dictionary 256 distinct values, instance stores small 1 byte indices If > 256 values, 255 in dictionary, 256th says to store rest (M) in hashtable w/ objectID 14%
Compression Example x00A0 x0073 x0002 x0001 x0101 x0000 x00A0 x0073 Trim trailing zeros x00A0 x0073 x0002 x0001 x0101 8 5 Bit width reduce x0A0 x073 x002 x001 x101 8 5 Zero compress x0A x73 x2 x001 x101 8 5 xAF 8 5 10101111
Zero-based Object Compression Chen, et al. OOPSLA 03 Remove bytes that are entirely zero Per object bit-map: 1 bit per byte Store only non-zero bytes Savings: 45%
Compression Example x00A0 x0073 x0002 x0001 x0101 x0000 x00A0 x0073 Trim trailing zeros x00A0 x0073 x0002 x0001 x0101 8 5 Bit width reduce x0A0 x073 x002 x001 x101 8 5 Zero compress x0A x73 x2 x001 x101 8 5 xAF
Analysis representation Methodology Garbage Collection Program run Heap dump series Analysis representation t Model 1 – Model n … s Limit savings snapshot
Experimental Details Jikes Research Virtual Machine Java-in-Java DaCapo benchmarks + pseudojbb 20-25 heap snapshots per benchmark MarkSweep with 2x min heap Analysis Per class Objects and arrays separated JVM+app vs application (separated in paper) Per heap snapshot, and over all snapshots
Technique Class Array GC/Run Lempel-Ziv compression X GC Strictly-equal object sharing Obj Type Deep-equal object sharing Zero-based object compression Inst Trailing zero array trimming Bit-width reduction Fld Dominant-value field hashing Lazy invariant computation Value set indirection Value set caching Constant field elision Run Dominant-value field elision
Savings (average over all benchmarks)
Stability of Savings fop: snapshots over time
Conclusions Limit study compare apples-to-apples heap data compression techniques Potential to reduce memory inefficiencies in managed languages Arrays Hybrids Future: save space Challenge: efficient detection & recovery Thank you!
Value Indirection & Cache Deep Equal Sharing Zero Compression Hybrid Compression