Moshovos © 1 Memory State Compressors for Gigascale Checkpoint/Restore Andreas Moshovos
Moshovos © 2 Gigascale Checkpoint/Restore Several Potential Uses: Debugging Runtime Checking Reliability Gigascale Speculation Many instructions checkpoint Restore trigger Instruction Stream
Moshovos © 3 Key Issues & This Study n Track and Restore Memory State n I/O? n This Work: Memory State Compression n Goals: l Minimize On-Chip Resources l Minimize Performance Impact n Contributions: l Used Value Prediction to simplify compression hardware l Fast, Simple and Inexpensive l Benefits whether used alone or not
Moshovos © 4 Outline n Gigascale Checkpoint/Restore n Compressor Architecture: Challenges n Value-Prediction-Based Compressors n Evaluation
Moshovos © 5 Our Approach to Gigascale CR (GCR) n Checkpoint: l blocks that were written into n Current Memory State + Checkpoint = Previous Memory State Checkpoints: Can be large (Mbytes) and we may want many checkpoint begins Restore trigger Checkpoint memory block on first write Restore all checkpointed memory blocks 4 5
Moshovos © 6 Checkpoint Storage Requirements 32K 1M 32M 1K Checkpoint Interval in Instructions Max. Checkpoint Size in Bytes 1G
Moshovos © 7 Architecture of a GCR Compressor L1 Data Cache Compressor Alignment Network Main Memory in-buffer out-buffer Size Resources & Performance Previous work: Compressor = Dictionary-Based Relatively Slow, Complex Alignment, order 10K of Transistors 64K In-Buffer ~3.7% Avg. Slowdown
Moshovos © 8 Our Compression Architecture n Standalone: l ~Compression, - Resources n In Combination: l -Resources (in-buffer), +Compression, +Performance L1 Data Cache Dictionary Compressor Alignment Network Main Memory in-buffer out-buffer VP Compressor Simple Alignment VP stageOptional
Moshovos © 9 Value-Predictor-Based Compression value Value Predictor value 0 1 Input streamOutput stream predicted mispredicted
Moshovos © 10 Example 0 22 VP TIME
Moshovos © 11 Block VP-Based Compressor n Shown is Last-Outcome Predictor n Studied Others (four combinations per word) word 0 value 01 Input stream Output stream mispredicted words word 1 word 15 address VP 1 value Header (one word) single entry predictors Cache block Half-word alignment
Moshovos © 12 Evaluation n Compression Rates l Compared with LZW n Performance l As a function of in-buffer size
Moshovos © 13 Methodology n Simplescalar v3 n SPEC CPU 2000 with reference inputs n Ignore first checkpoint to avoid artificially skewing the results n Simulated up to: l 80Billion instructions (compression rates) l 5Billion instructions (performance) n 8-way OOO Superscalar n 64K L1D, L1I, 1M UL2
Moshovos © 14 Compression Rate vs. LZW better 256M Instructions Checkpoint Interval
Moshovos © 15 Performance Degradation n LZW + 64K buffer = ~3.7% slowdown n LZW + LO + 1K buffer = 1.6% slowdown better
Moshovos © 16 Summary n Memory State Compression for Gigascale CR n Many Potential Applications n Used Simple Value-Prediction Compressors l Few Resources l Low Complexity l Fast Performance n Can be Used Alone n Can be Combined with Dictionary-based Compressors l Reduced on-chip buffering l Better Performance n Main memory compression?