Modeling of Digital Systems

Modeling of Digital Systems
CS 812 High Level Design & Modeling of Digital Systems MEMORY SYNTHESIS Bhuvan Middha (csu98133) Arun Kejariwal (eeu98172)

Presentation Plan Motivation Impact of Memory Architecture Decisions Optimizations in Memory Synthesis Memory Assignment of array variables Scratch-Pad Memory Conclusion References

Motivation Rate of Performance Improvement is different CPU Speed
Memory Year Speed CPU Rate of Performance Improvement is different

Impact on Processor Pipeline
Dec ALU MEM IF WB IF Dec ALU MEM WB IF Dec ALU MEM WB Clock cycle determined by slowest pipeline stage

Impact of Memory Architecture Decisions
Area 50-70% of ASIC/ASIP may be memory Performance 10-90% of system performance may be memory related Power 25-40% of system power may be memory related

Issues in Memory Synthesis
Number of distributed registers Number of register files Number of register file ports On-chip or Off-chip memory Cache Parameters Cache Vs Scratch pad Number of memory ports Memory bus Bandwidth Data Organization and Partitioning

Optimizations in Memory Synthesis
Code Optimizations R-M-W Mode Clustering of Scalar variables Reordering Hoisting Loop Transformations Memory assignment of array variables Hardware Optimizations Scratch Pad Banking

Storing Multi-dimensional Arrays: Row-major
int X [4][4]; Row-major Storage Physical Memory Logical Memory 15

Storing Multi-dimensional Arrays: Column-major
int X [4][4]; Column-major Storage Physical Memory Logical Memory 15

Storing Multi-dimensional Arrays: Tile-based
int X [4][4]; Tile-based Storage Physical Memory Logical Memory 15

Array Layout and Data Cache
a[i] int a [1024]; int b[1024]; int c [1024]; ... for (i = 0; i < N; i++) c [i] = a [i] + b [i]; b b[i] c Data Cache (Direct-mapped, 512 words) c[i] Memory Problem: Every access leads to cache miss

Data Alignment a a[i] int a [1024]; int b[1024]; int c [1024]; ...
for (i = 0; i < N; i++) c [i] = a [i] + b [i]; DUMMY b b[i] DUMMY c c[i] Data Cache (Direct-mapped, 512 words) Memory Data alignment avoids cache conflicts

Data Layout Transformation
Splitting structs into individual arrays Account for pointer arithmetic, dereferencing Clustering of arrays

Motivating Example Arrays Loop 1 Loop 2 struct x { int a; int b;
int q [1000]; ... avg = 0; for (i = 0; i < 1000; i++) avg = avg + p[i].a; avg = avg / 1000; for (i = 0; i < 1000; i++) { p[i].b = p[i].b + avg; q[i] = p[i].b + 1; } Arrays Loop 1 Loop 2

Cache Performance: Loop 1
Data Cache [Direct-mapped 4 lines, 2 words/line] struct x { int a; int b; } p [1000]; int q [1000]; ... avg = 0; for (i = 0; i < 1000; i++) avg = avg + p[i].a; avg = avg / 1000; for (i = 0; i < 1000; i++) { p[i].b = p[i].b + avg; q[i] = p[i].b + 1; } Useless Data p[0].a p[0].b Loop 1 p[1].a p[1].b 1 p[2].a p[2].b 2 p[3].a p[3].b 3 Line

Data Cache [Direct-mapped 4 lines, 2 words/line] struct x { int a; int b; } p [1000]; int q [1000]; ... avg = 0; for (i = 0; i < 1000; i++) avg = avg + p[i].a; avg = avg / 1000; for (i = 0; i < 1000; i++) { p[i].b = p[i].b + avg; q[i] = p[i].b + 1; } p[0].a p[0].b p[1].a p[1].b 1 Loop 2 q[0] q[1] 2 3 Line Useless Data

Cache Performance 1000 cache misses for p[i].a 1500 cache misses
struct x { int a; int b; } p [1000]; int q [1000]; ... avg = 0; for (i = 0; i < 1000; i++) avg = avg + p[i].a; avg = avg / 1000; for (i = 0; i < 1000; i++) { p[i].b = p[i].b + avg; q[i] = p[i].b + 1; } 1000 cache misses for p[i].a 1500 cache misses 1000 misses for p[i].b 500 misses for q[i] Cache miss rate: 62.5%

Transformed Data Layout
struct x { int a; int b; } p [1000]; int q [1000]; struct y { int q; // originally q int b; // originally x.b } r [1000]; int a [1000]; // originally x.a Loop 2 Loop 1 ... avg = 0; for (i = 0; i < 1000; i++) avg = avg + a[i]; avg = avg / 1000; for (i = 0; i < 1000; i++) { r[i].b = r[i].b + avg; r[i].q = r[i].b + 1; }

Data Cache [Direct-mapped 4 lines, 2 words/line] struct y { int q; // originally q int b; // originally x.b } r [1000]; int a [1000]; // originally x.a ... avg = 0; for (i = 0; i < 1000; i++) avg = avg + a[i]; avg = avg / 1000; for (i = 0; i < 1000; i++) { r[i].b = r[i].b + avg; r[i].q = r[i].b + 1; } a[0] a[1] Loop 1 a[2] a[3] 1 2 3 No useless data in cache Line

Data Cache [Direct-mapped 4 lines, 2 words/line] struct y { int q; // originally q int b; // originally x.b } r [1000]; int a [1000]; // originally x.a ... avg = 0; for (i = 0; i < 1000; i++) avg = avg + a[i]; avg = avg / 1000; for (i = 0; i < 1000; i++) { r[i].b = r[i].b + avg; r[i].q = r[i].b + 1; } r[0].q r[0].b r[1].q r[1].b 1 2 Loop 2 3 No useless data in cache Line

Cache Performance Cache miss rate: 37.5% 500 cache misses
struct y { int q; // originally q int b; // originally x.b } r [1000]; int a [1000]; // originally x.a ... avg = 0; for (i = 0; i < 1000; i++) avg = avg + a[i]; avg = avg / 1000; for (i = 0; i < 1000; i++) { r[i].b = r[i].b + avg; r[i].q = r[i].b + 1; } 500 cache misses 1000 cache misses Cache miss rate: 37.5%

Clustering of Arrays 8 + 16 24 int a[16], b[16], c[16] For i = 0 to 7
a[i] = b[i+3] + 3 For j = 0 to 15 a[i] = b[i] * c[i] a b 16 16 16 c

Scratch Pad Memory Data memory residing on chip
Address space disjoint from off-chip memory Same address and data bus as that for off chip memory Guaranteed small access time as no read/write miss

Memory Address Space On-chip Memory CPU Off-chip Memory Data Cache
1 1-cycle cycle On-chip Memory CPU P-1 Off-chip Memory P Data Cache (on-chip) Memory Address Space 1-cycle 1 cycle 10-20 cycles 10-20 cycles N-1

Scratch Pad Model Organization of scratch pad memory
No comparison is needed A priori knowledge of the memory objects an added advantage Scratch pad memory constitutes the data array unit, decoder unit and the peripheral unit

Why Scratchpad? Unordered array variables and scalars lead to a large number of conflict misses in the cache Accesses are data dependent, so data layout techniques are ineffective Example : char BrightnessLevel[512][512] int Hist[256] for i = 0 to 512 for j = 0 to 512 level = Brightnesslevel[I][j] Hist[level] = Hist[level]+1

Data Partitioning Minimize the interference between different variables in the data cache Partitioning of variables is governed by the following code characteristics : - scalar variables and constants - size of arrays - life time of variables - access frequency of variables - loop conflicts

Access Frequency of Variables and Loop Conflicts
Variable Access Count (VAC) Interference Access Count (IAC) Interference Factor (IF) IF(u) = VAC(u) + IAC(u) Map variables with high IF values into the scratch pad memory Loop Conflict Factor (LCF) Map variables with high LCF number to scratch pad memory

Formulation of Partitioning problem
Total Conflict Factor (TCF) TCF(u) = IF(u) + LCF(u) Given a set of n arrays with corresponding TCF values find an optimal subset such that total size <= size of SRAM and total TCF value is maximized Similar to knapsack problem except the fact that several arrays with non intersecting lifetimes can share the same SRAM space

Conclusion Z-Buffering - Graphics Stream buffers - Data pre-fetching
Stride Prediction tables - predict memory references Inter-array windowing - multi-dimensional arrays

References Books Survey Paper
P. Panda, N. Dutt, A. Nicolau - Memory issues in embedded systems-on-chip: optimization and exploration, Kluwer Academic Publishers, 1999 F. Catthoor, S. Wuytack, E. De Greef, F. Balasa, L. Nachtergaele, A. Vandecappelle – Custom memory management methodology, Kluwer Academic Publishers, 1998o Survey Paper P. Panda, F. Catthoor, N. Dutt, K. Danckaert, E. Brockmeyer, C. Kulkarni, A. Vandecappelle – Data and Memory Optimization Techniques for Embedded Systems, ACM Transactions on Design Automation of Embedded Systems, April 2001

Modeling of Digital Systems

Similar presentations

Presentation on theme: "Modeling of Digital Systems"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Modeling of Digital Systems

Similar presentations

Presentation on theme: "Modeling of Digital Systems"— Presentation transcript:

Similar presentations

About project

Feedback