CML CML Cache Vulnerability Equations for Protecting Data in Embedded Processor Caches from Soft Errors † Aviral Shrivastava, € Jongeun Lee, † Reiley Jeyapaul.

Slides:



Advertisements
Similar presentations
SE-292 High Performance Computing Memory Hierarchy R. Govindarajan
Advertisements

Lecture 12 Reduce Miss Penalty and Hit Time
LEVERAGING ACCESS LOCALITY FOR THE EFFICIENT USE OF MULTIBIT ERROR-CORRECTING CODES IN L2 CACHE By Hongbin Sun, Nanning Zheng, and Tong Zhang Joseph Schneider.
CML Efficient & Effective Code Management for Software Managed Multicores CODES+ISSS 2013, Montreal, Canada Ke Bai, Jing Lu, Aviral Shrivastava, and Bryce.
Technical University of Lodz Department of Microelectronics and Computer Science Elements of high performance microprocessor architecture Memory system.
CML CML Presented by: Aseem Gupta, UCI Deepa Kannan, Aviral Shrivastava, Sarvesh Bhardwaj, and Sarma Vrudhula Compiler and Microarchitecture Lab Department.
5th International Conference, HiPEAC 2010 MEMORY-AWARE APPLICATION MAPPING ON COARSE-GRAINED RECONFIGURABLE ARRAYS Yongjoo Kim, Jongeun Lee *, Aviral Shrivastava.
Spring 2003CSE P5481 Introduction Why memory subsystem design is important CPU speeds increase 55% per year DRAM speeds increase 3% per year rate of increase.
1 Lecture 14: Cache Innovations and DRAM Today: cache access basics and innovations, DRAM (Sections )
Memory System Performance October 29, 1998 Topics Impact of cache parameters Impact of memory reference patterns –matrix multiply –transpose –memory mountain.
Embedded Computer Architecture 5KK73 TU/e Henk Corporaal Bart Mesman Data Memory Management Part d: Data Layout for Caches.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
Review CPSC 321 Andreas Klappenecker Announcements Tuesday, November 30, midterm exam.
1 Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
EENG449b/Savvides Lec /13/04 April 13, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG Computer.
Compilation Techniques for Energy Reduction in Horizontally Partitioned Cache Architectures Aviral Shrivastava, Ilya Issenin, Nikil Dutt Center For Embedded.
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Oct. 30, 2002 Topic: Caches (contd.)
1  2004 Morgan Kaufmann Publishers Chapter Seven.
1 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value is stored as a charge.
Cache intro CSE 471 Autumn 011 Principle of Locality: Memory Hierarchies Text and data are not accessed randomly Temporal locality –Recently accessed items.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy (Part II)
11/10/2005Comp 120 Fall November 10 8 classes to go! questions to me –Topics you would like covered –Things you don’t understand –Suggestions.
1 CSE SUNY New Paltz Chapter Seven Exploiting Memory Hierarchy.
Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr CS-447– Computer Architecture.
A Compiler-in-the-Loop (CIL) Framework to Explore Horizontally Partitioned Cache (HPC) Architectures Aviral Shrivastava*, Ilya Issenin, Nikil Dutt *Compiler.
Roza Ghamari Bogazici University.  Current trends in transistor size, voltage, and clock frequency, future microprocessors will become increasingly susceptible.
A Novel Cache Architecture with Enhanced Performance and Security Zhenghong Wang and Ruby B. Lee.
Copyright © 2008 UCI ACES Laboratory Kyoungwoo Lee 1, Aviral Shrivastava 2, Nikil Dutt 1, and Nalini Venkatasubramanian 1.
Energy-Efficient Cache Design Using Variable-Strength Error-Correcting Codes Alaa R. Alameldeen, Ilya Wagner, Zeshan Chishti, Wei Wu,
CML CML Compiler-Managed Protection of Register Files for Energy-Efficient Soft Error Reduction Jongeun Lee, Aviral Shrivastava* Compiler Microarchitecture.
Garo Bournoutian and Alex Orailoglu Proceedings of the 45th ACM/IEEE Design Automation Conference (DAC’08) June /10/28.
CML CML Compiler Optimization to Reduce Soft Errors in Register Files Jongeun Lee, Aviral Shrivastava* Compiler Microarchitecture Lab Department of Computer.
LA-LRU: A Latency-Aware Replacement Policy for Variation Tolerant Caches Aarul Jain, Cambridge Silicon Radio, Phoenix Aviral Shrivastava, Arizona State.
1 Virtual Memory Main memory can act as a cache for the secondary storage (disk) Advantages: –illusion of having more physical memory –program relocation.
CML SSDM: Smart Stack Data Management for Software Managed Multicores Jing Lu Ke Bai, and Aviral Shrivastava Compiler Microarchitecture Lab Arizona State.
Yun-Chung Yang TRB: Tag Replication Buffer for Enhancing the Reliability of the Cache Tag Array Shuai Wang; Jie Hu; Ziavras S.G; Dept. of Electr. & Comput.
CML Smart Cache Cleaning: Energy Efficient Vulnerability Reduction in Embedded Processors Reiley Jeyapaul, and Aviral Shrivastava Compiler Microarchitecture.
1 Seoul National University Cache Memories. 2 Seoul National University Cache Memories Cache memory organization and operation Performance impact of caches.
Using Loop Invariants to Detect Transient Faults in the Data Caches Seung Woo Son, Sri Hari Krishna Narayanan and Mahmut T. Kandemir Microsystems Design.
Computer Organization CS224 Fall 2012 Lessons 45 & 46.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
Cache Memory Chapter 17 S. Dandamudi To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer,  S. Dandamudi.
Copyright © 2010 Houman Homayoun Houman Homayoun National Science Foundation Computing Innovation Fellow Department of Computer Science University of California.
1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.
1  2004 Morgan Kaufmann Publishers Chapter Seven Memory Hierarchy-3 by Patterson.
Static Identification of Delinquent Loads V.M. Panait A. Sasturkar W.-F. Fong.
1  1998 Morgan Kaufmann Publishers Chapter Seven.
LECTURE 12 Virtual Memory. VIRTUAL MEMORY Just as a cache can provide fast, easy access to recently-used code and data, main memory acts as a “cache”
Prefetching Techniques. 2 Reading Data prefetch mechanisms, Steven P. Vanderwiel, David J. Lilja, ACM Computing Surveys, Vol. 32, Issue 2 (June 2000)
Static Analysis to Mitigate Soft Errors in Register Files Jongeun Lee, Aviral Shrivastava Compiler Microarchitecture Lab Arizona State University, USA.
1 Chapter Seven. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value.
CML CML A Software Solution for Dynamic Stack Management on Scratch Pad Memory Arun Kannan, Aviral Shrivastava, Amit Pabalkar, Jong-eun Lee Compiler Microarchitecture.
Memory Management memory hierarchy programs exhibit locality of reference - non-uniform reference patterns temporal locality - a program that references.
CSE 351 Section 9 3/1/12.
Memory COMPUTER ARCHITECTURE
UnSync: A Soft Error Resilient Redundant Multicore Architecture
Ke Bai and Aviral Shrivastava Presented by Bryce Holton
Bojian Zheng CSCD70 Spring 2018
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
Virtual Memory 4 classes to go! Today: Virtual Memory.
Static Analysis to Mitigate Soft Error Failures in Processors
Partially Protected Caches to Reduce Failures Due to Soft Errors in Multimedia Applications Kyoungwoo Lee, Aviral Shrivastava, Ilya Issenin, Nikil Dutt,
Reiley Jeyapaul and Aviral Shrivastava Compiler-Microarchitecture Lab
Virtual Memory Overcoming main memory size limitation
Code Transformation for TLB Power Reduction
Reiley Jeyapaul and Aviral Shrivastava Compiler-Microarchitecture Lab
Cache - Optimization.
Main Memory Background
Principle of Locality: Memory Hierarchies
Presentation transcript:

CML CML Cache Vulnerability Equations for Protecting Data in Embedded Processor Caches from Soft Errors † Aviral Shrivastava, € Jongeun Lee, † Reiley Jeyapaul † Compiler and Microarchitecture Lab, € High Performance Computing Lab, Arizona State University, USA UNIST, Ulsan, South Korea LCTES 2010 Stockholm, Sweden 6/28/

CML CML Phenomenon of Soft Error □ Transient Faults □ Random and spontaneous bit- changes in system □ Can be caused by □ Circuit noise □ Cross-talk □ More than 50% due to radiation strike 6/28/

CML CML Masking Effects Logic Masking Electrical Masking Latching Window Masking Microarchitectural Masking Software Masking 6/28/

CML CML Growing Problem Soft Error rate is currently about 1 per year Increasing exponentially with technology scaling Projected to become 1 per day in a decade Will soon become a problem in earth-bound electronics 6/28/

CML CML Caches most vulnerable 6/28/2015http:// 5 Temporal masking is very effective Caches occupy majority of chip-area Much higher % of transistors – More than 80% of the transistors in Itanium 2 are in caches. Caches operated at low voltage levels for higher speed and low-power – Even low energy particles can cause errors ECC is not enough – has high power and performance overheads for L1 cache – ECC used up in manufacturing error correction

CML CML Cache Vulnerability A cache location is vulnerable if –It will be read by the processor, or it will be committed to memory –AND it is dirty Note: Non dirty data is not vulnerable –Can always re-read non-dirty data from lower level of memory Instantaneous (cache) Vulnerability (bytes) is the number of cache locations that are vulnerable [Mukherjee 2003] Total (cache) Vulnerability of a program (in bytes * cycles) is the summation of cache vulnerability in each cycle of program execution. 6 6/28/2015http:// R W RRR CE Time W

CML CML Existing Schemes Hardened memory cells –8T, 10T designs, add cross resistance High power and performance overhead Error Correction Codes –Single Error Correction, and Double Error Detection (SECDED) –Need log 2 n bits to protect n-bits –Most popular, but high overhead for L1 cache Increase power consumption by >25% [Phelan 2003] –ECC used up in covering manufacturing defects Write-through cache –Zero vulnerability, but high cache-memory traffic Periodically write-back all dirty lines –Simple, but not very smart. Less protection for high overhead. 6/28/2015http:// 7 Need Efficient technique for Vulnerability Reduction

CML CML Explore Compiler Techniques Need to reduce the amount of time, data is vulnerable in the cache Vulnerability depends on the access pattern of data 6/28/2015http:// 8 for ( i : 0 ≤ i < N ) { for ( k : 0 ≤ k < N ) { for ( j : 0 ≤ j < N ) { A[i][k] += B[i][j] * C[j][k] } for ( i : 0 ≤ i < N ) { for ( k : 0 ≤ k < N ) { for ( j : 0 ≤ j < N ) { A[i][k] += B[i][j] * C[j][k] } Completely compute A[i][k] in the innermost loop for ( i : 0 ≤ i < N ) { for ( j : 0 ≤ j < N ) { for ( k : 0 ≤ k < N ) { A[i][k] += B[i][j] * C[j][k] } for ( i : 0 ≤ i < N ) { for ( j : 0 ≤ j < N ) { for ( k : 0 ≤ k < N ) { A[i][k] += B[i][j] * C[j][k] } Need A[i][k] across iterations of outermost loop Low Vulnerability but also High Runtime Low Vulnerability but also High Runtime

CML CML MatMul Loop Interchange 9 Loop Interchange on Matrix Multiplication Interesting configurations exist, with low vulnerability and low runtime. Vulnerability trend not same as performance 9 Opportunities may exist to trade off little runtime for large savings in vulnerability 96% variation in vulnerability for 16% variation in runtime 6/28/2015http://

CML CML How to Exploit the trade-off? Need to compute the vulnerability –Can be done by simulation –Run the application with different data access patterns, and pick the one with the least vulnerability May be applicable for extremely embedded systems Runtime maybe an issue –Some program run indefinitely Number of configurations to run is too large –E.g., Array padding How to scale the results to slightly different configuration –E.g., increase cache size 6/28/2015http:// 10 Need efficient method of computing vulnerability

CML CMLOutline Growing threat of soft errors Efficient techniques needed for L1 cache protection Need efficient techniques to estimate vulnerability Cache Miss Equations Vulnerability Calculations Experiments 6/28/2015http:// 11

CML CML Access and Cache Space k j i (0,0,0) i = 1 i = N (1,4,2) Cache Space m n line 2 (0,0) Access Space: Every point is an iteration of the loop for (i=0; i < N; i++) for (j=0; j < N; j++) for (k=0; k < N; k++) A[i][k] += B[i][j] * C[j][k] endFor for (i=0; i < N; i++) for (j=0; j < N; j++) for (k=0; k < N; k++) A[i][k] += B[i][j] * C[j][k] endFor MemAddr: MemAddr: Iteration  Memory Address AF(1,2,4) = C+N 2 +4N+2 Memory Space x y C(4,2) (0,0) N N CacheAddr: CacheAddr: Memory Address  Cache Address Cache Line = (MemAddr/L) L: # lines in the cache Reference & Access

CML CML Data Reuse k j i (0,0,0) i = 1 i = N i 1 (0,4,2) i 2 (1,4,2) i N (N,4,2) Access Space: Every point is an iteration of the loop for (i=0; i < N; i++) for (j=0; j < N; j++) for (k=0; k < N; k++) A[i][k] += B[i][j] * C[j][k] endFor for (i=0; i < N; i++) for (j=0; j < N; j++) for (k=0; k < N; k++) A[i][k] += B[i][j] * C[j][k] endFor Data Space x y C(4,2) (0,0) N N When the same data is accessed from iteration and iteration, we say, there is data reuse in direction = (1,0,0) 13 6/28/2015http://

CML CML Cache Miss k j i C(4,2) (0,0,0) i = 1 i = N i N (N,4,2) cache Miss Another iteration accesses data of array B, mapped to the same cache location causing a cache Miss. B(0,7) p(0,4,2) i(1,4,2) (1,0,0) for (i=0; i < N; i++) for (j=0; j < N; j++) for (k=0; k < N; k++) A[i][k] += B[i][j] * C[j][k] endFor for (i=0; i < N; i++) for (j=0; j < N; j++) for (k=0; k < N; k++) A[i][k] += B[i][j] * C[j][k] endFor B(0,7) Memory Space x y C(4,2) (0,0) N N Cache Space m n (0,0) evicted from the cache The element of array C is evicted from the cache and replaced by an element from array B. line 2 6/28/

CML CML Cache Misses Cache Miss Equation –Returns 1 if the reuse in reference r along the reuse vector v was not realized at iteration j due to a conflict by reference q at iteration k. 6/28/2015http:// 15 j,r j-v, r k,q

CML CML Cache Misses Miss Iterations –Iterations at which the reference r misses, along the reuse vector r, due to interference with another reference q. 6/28/2015http:// 16 Hit: No k exists Miss: because k exists

CML CML Cache Misses Miss Iterations due to multiple references –There is a miss at iteration j, if there is a miss due to any reference 6/28/2015http:// 17 Miss: because of reference q Miss: because of reference s k1, q k2, s

CML CML Cache Miss Miss Iterations due to multiple reuse vectors –There will be a miss at iteration j if there is miss along all the reuse vectors 6/28/2015http:// 18 Miss: Because of the smallest reuse vector

CML CMLOutline Growing threat of soft errors Efficient techniques needed for L1 cache protection Need efficient techniques to estimate vulnerability Cache Miss Equations Vulnerability Calculations Experiments 6/28/2015http:// 19

CML CML Computing Vulnerability StateAccessReadWrite Dirty Hit(1)None Repl. Miss(2) Cold MissNone CleanAnyNone (1) Hit Vul. p = j-vj (2) Miss Vul. p = j-vjk* k 6/28/

CML CML Challenges in Vul. Estimation Miss(j): I  {0,1} –Miss at iteration j is a Boolean function Vul(j): I  I + –Vulnerability at iteration j is an integer function –How to represent integer function as a set? Much more complexity: –Misses are in iterations, while vulnerability is in cycles –Only dirty blocks are vulnerable 6/28/2015http:// 21

CML CML Computing Vulnerability Suppose a variable is accessed several times –Cold miss –Incremental Vul. –Post-access Vul. Incremental Vul. –Compute vulnerability from the last access –Total Vul. = Sum of Incremental Vul. 6/28/2015http:// 22 Cold Miss Last Access

CML CML Computing Vulnerability Two key ideas: 1.If vulnerability at iteration j = l –Make l copies of vector j 2.Compute Non-vulnerability –And then subtract it from total possible vulnerability 6/28/2015http:// 23

CML CML Computing Vulnerability Access Non Vulnerability If no k exists –ANV = ф 6/28/2015http:// 24 j j -v HIT

CML CML Computing Vulnerability Access Non Vulnerability If a k exists –Then ANV = {(j,1), (j,2), …(j,|j|-|k|)} 6/28/2015http:// 25 j j -v MISS ANV contains all the points on the RED line

CML CML Computing Vulnerability Access Non Vulnerability If multiple k exist –Then ANV = {(j,1), (j,2), …(j,|j|-|k*|)} –Where k* is the smallest k 6/28/2015http:// 26 j j -v MISS k k k*

CML CML Computing Vulnerability Access Non Vulnerability across references –ANV for multiple references is the maximum of the individual ANVs 6/28/2015http:// 27 j j -v MISS k1,q k2,s k*

CML CML Computing Vulnerability Access Vulnerability –AV = Total possible vulnerability - ANV 6/28/2015http:// 28 j j -v MISS k*

CML CML Why not compute AV directly? We computed What if we compute 6/28/2015http:// 29 j j -v k1 k2

CML CML Other Issues Identifying cold misses Computing post-access vulnerability Cache block effect Translating from iterations to cycles Derived reuse vectors Computing no. of elements in a set 6/28/2015http:// 30

CML CMLOutline Growing threat of soft errors Efficient techniques needed for L1 cache protection Need efficient techniques to estimate vulnerability Cache Miss Equations Vulnerability Calculations Experiments 6/28/2015http:// 31

CML CML Experimental Setup Simplify CVEs in Omega –Output: set containing vulnerability of loop. Count the number of elements with Barvinok Benchmark kernels from Spec200 and Multimedia kernels Simplescalar configured to single-issue in-order processor with 32KB direct mapped data cache and 25 cycle L1 miss penalty 6/28/2015http:// 32

CML CML Interesting Trade-off exists! 6/28/ % vulnerability reduction for 16% runtime trade-off 55% vulnerability reduction for 6.5% runtime improvement

CML CMLValidation 6/28/2015http:// 34 High Correlation between ACV and CV Variation in CV: 19X Variation in Runtime: 1.7X  Can trade off lot of vulnerability with little performance impact Min Vul: ikj Min Runtime: ijk  Not the same trend Min Vul with only 5.7% runtime penalty

CML CML Application of CVE (case study) 6/28/2015http:// 35 Cache vulnerability calculated for varying array placement offsets on swim

CML CMLConclusion Soft Errors are soon to become a major concern even in terrestrial computing systems Caches are most vulnerable, and for L1 cache: –ECC is costly –ECC may not be enough Need nimble techniques to reduce vulnerability without much power and performance overheads Compiler techniques can change the read/write access pattern of data –therefore can effect vulnerability of program Interesting trade-off between vul. and runtime may exist in code generation Exploiting them using simulation may not be feasible –Need efficient techniques to estimate vulnerability Proposed re-use vector based analysis to estimate vulnerability –Starting point for compiler support 6/28/2015http:// 36

CML CMLQuestions? 6/28/2015http:// 37

CML CML Hit Vulnerability k j i (0,0,0) i = N Reuse Direction: Direction along which the data element is reused. Access Iterations: - Iterations accessing the array element. Cache Miss Iterations: - Iterations at which reuse is not realized due to reference X (same or different) Vulnerable Accesses (Cache Hits): - Iterations at which the reuse is realized (hits). i Vulnerable Iterations (Read Vulnerability): - Iterations between successive reuses. Access Iteration Cache Miss Iteration 6/28/

CML CML Miss Vulnerability Cache Interference Points (CIP) - The set of possible interference points { j } x y VI j2j2 j4j4 j3j3 j1j1 q Vulnerable Iterations Vulnerability Intermediate Iterations - The set of Intermediate Iterations { v } The set of points between any existing j and the iteration i. All v points are greater than the first CIP for every iteration i. 6/28/