CAFO: Cost Aware Flip Optimization for Asymmetric Memories RAKAN MADDAH *, SEYED MOHAMMAD SEYEDZADEH AND RAMI MELHEM COMPUTER SCIENCE DEPARTMENT UNIVERSITY OF PITTSBURGH HPCA 2015
Introduction DRAM and NAND Flash are facing physical limitations putting their scalability into question DRAM: Decrease in cell reliability and Increase in power consumption NAND Flash: Endurance degradation and Increase in number of transient and hard errors Phase-Change Memory (PCM) and Spin-Transfer Torque Random Access Memory (STT-RAM)are a promising alternative Scalability, low access latency and close to zero leakage power Initial assessments and evaluations are encouraging
Challenges PCM and STT-RAM have a number of challenges that needs to be dealt with before deployment in functional systems PCM suffers from limited endurance STT-RAM suffers from high write bit error rate Solution: Bit flip minimization Service write requests while flipping as few bits as possible Preserves PCM’s endurance and improves STT-RAM’s write reliability
Previous Work Differential Write: compares old data against new data and then only flips differing cells. Flip-N-Write: encodes write data into either its regular or inverted form and then picks the encoding that yields in less flips in comparison against old data Flip-Min: encodes write data into a set of data vectors and then picks the vector that yields in less flips in comparison against old data Old New Saves 2 bit flips Old New Saves 3 bit flips Old New New 2 Saves 4 bit flips New 3
Write Asymmetries PCM The RESET state is more detrimental to endurance than the set state STT-RAM Anti-parallel magnetization is more prone to write errors than parallel magnetization SET (“1”) RESET (“0”) Time Power Free Layer Oxide Layer Reference Layer Free Layer Oxide Layer Reference Layer Parallel magnetization (“0”) Anti-parallel magnetization (“1”)
Contribution Observation: existing schemes fail to exploit the write asymmetry Saves 1 bit flip Old New Saves 3 bit flips Writing a “0” is 4 times more detrimental to endurance than writing a“1” Number of bit flips is oblivious to the write asymmetry!
Contribution Observation: existing schemes fail to exploit the write asymmetry Focusing solely on the number of bit flips is oblivious to the write asymmetry Proposal: move from the concept of “bit flip reduction” to “cost reduction” Cost Aware Flip Optimization (CAFO) Cost model: captures the write asymmetry and assigns a cost for a given write operation Coding engine: encodes the write data into a form that result in overall cost reduction
Cost Model Compare write data to currently stored data and associate a cost to each cell The costs “a”, “b”, “c” and “d” depend on the technology being modeled and the optimization objective (endurance, energy, error rate) acdbabdb Currently Stored Data New Data Cost of Writing a: 0 1, b: 1 0, c: 0 0, d:1 1 With a write cost we can define a gain among different encodings
Gain Calculation C= 2a + 3b + 1c + 2d = 8 C encoded = 1a + 2b + 2c + 3d = 5 Gain G = C- C encoded = 8 – 5 = Currently Stored Data New Data acdbabdb Cost of Writing cbadcdad Encoded Data a: 0 1, b: 1 0, c: 0 0, d:1 1 Costs: a = 1, b = 2, c = 0, d = A positive gain implies that it is less costly to write the data encoded How to encode Data?
Encoding Auxiliary bits Auxiliary bits serve as inversion flags Coding steps: 1.Compute rows gain 2.Flip all rows with positive gain
Encoding Auxiliary bits serve as inversion flags Coding steps: 1.Compute rows gain 2.Flip all rows with positive gain 3.Compute columns gain 4.Flip all columns with positive gain 5.Repeat process until all rows and columns show a zero or negative gain Alteration between row and column flips yields in additional cost reduction
Encoding example Costs: a = 1, b = 1, c = 0, d = 0—”1” represents a cell that is to be flipped, “0” otherwise Gain
Encoding example Costs: a = 1, b = 1, c = 0, d = 0—”1” represents a cell that is to be flipped, “0” otherwise Flip rows with + gain
Encoding example Costs: a = 1, b = 1, c = 0, d = 0—”1” represents a cell that is to be flipped, “0” otherwise Flip columns with + gain Flip rows with + gain
Encoding example Costs: a = 1, b = 1, c = 0, d = 0—”1” represents a cell that is to be flipped, “0” otherwise flips 33 flips Encoding terminates as no row or column shows a positive gain Flip columns with + gain Flip rows with + gain
Row only Inversion FN W flips 25 flips
Encoding example Costs: a = 1, b = 1, c = 0, d = 0—”1” represents a cell that is to be flipped, “0” otherwise flips 33 flips Flip columns with + gain Flip rows with + gain Can We do better?
Encoding Optimization Write cost can be further reduced even if no row or column shows a positive gain flips 3 flips Flip row and column together Gain
Encoding Optimization Write cost can be further reduced even if no row or column shows a positive gain Flipping both a row and a column, leaves their intersecting cell un- inverted The local gain of the intersecting cell has to subtracted from the total gain of the corresponding row and columns Gain is achieved if G r + G c – 2g r+c > 0 GcGc GrGr g r+c Flip row and column together Gain
Encoding Optimization (cont.) Generalize to Flipping 1 column with multiple rows (Vice Versa) Gain flips4 flips Flip 2 rows and 2column together
Aux. Bits Cost The cost of updating the auxiliary bits can be easily incorporated in the gain calculation a: 0 1, b: 1 0, c: 0 0, d:1 1 acdbabdc b cabdcdbad C = 2a + 3b + 2c +d = 8 C inverted = 2a + 2b + 2c +2d = 6 G= C – C inverted = = 2 Cost of Writing Gain Currently Stored Data New Data Cost of Writing Inverted Data Costs: a = 1, b = 2, c = 0, d = 0 Old aux bit has to be flipped to “0” Old aux bit stays the same
Decoding Simple: XOR the corresponding vertical and horizontal aux bits Output of “1”: read cell value inverted Output of “0”: read cell valued un-inverted Encode Decode
Decoding Simple: XOR the corresponding vertical and horizontal aux bits Output of “1”: read cell value inverted Output of “0”: read cell valued un-inverted Encode Decode
Evaluation Compare Against Flip-Min and Flip-N-Write (FNW) Experiment with various block sizes of matching space overhead Compute average cost reduction achieved by every scheme relative to differential write Experiment with random input stream and memory traces collected from various SPEC benchmark programs Model both PCM and STT-RAM through setting the cost labels to match the underlying technology
Cost Reduction vs. Cost oblivious FNW and Flip-Min Overhead: 3.125% Overhead: 12.5%Overhead: 6.25%
Cost Reduction vs. Cost oblivious FNW and Flip-Min Overhead: 3.125% Overhead: 12.5%Overhead: 6.25%
Cost Reduction vs. Cost aware FNW and Flip-Min Overhead: 12.5%Overhead: 6.25% Overhead: 3.125% Cost Model Improves FNW and Flip Min
Cost Model Improvement
Optimization Isolation At least 15% of cost reduction without encoding optimization
STT-RAM Cost Reduction Costs: a = 1, b = 0, c = 0, d = 0 Overhead: 12.5%Overhead: 6.25% Overhead: 3.125%
Benchmark Data Costs: a = 1, b = 2, c = 0, d = 0 Block Size: 128B (6.25% overhead)
Conclusion Bit flip Minimization techniques are oblivious to write asymmetries Move from the concept of bit flip minimization to cost Reduction CAFO ◦Cost model that captures the asymmetry in the write cost ◦2D Encoder that minimizes the overall cost of write operations
Questions?