Applications of Systolic Array FTR, IIR filtering, and 1-D convolution. 2-D convolution and correlation. Discrete Furier transform Interpolation 1-D and 2-D median filtering Geometric warping - Signal and image processing:
Matrix-vector multiplication Matrix-matrix multiplication Matrix triangularization (solution of linear systems, matrix inversion) QR decomposition (eigenvalue, least-square computation) Solution of triangular linear systems - Matrix arithmetic: Applications of Systolic Array
Data structure Graph algorithm Language recognition Dynamic programming Encoder (polynomial division) Relational data-base operations - Non-numeric applications:
Matrix Multiplication RecurencesC ij (1) = 0 C ij (k+1) = C ij (k) + a jk b kj C ij = C ij (n+1) Band widthw a, w b Total step : 3n + min(w a, w b )
RecurencesCij(1) = 0 Cij(k+1) = Cij(k) + ajkbkj Cij = Cij(n+1)
Systolic array multiplier of numbers For n*n multiplier (3n+1)n/2 cells are required Before saturation, 3n clock cycles are required for the multiplication. After saturation, a product will be output on every clock cycle.
Systolic array multiplier of numbers Basic Cell The main idea is calculate partial product and direct them to appropriate places
Systolic array multiplier of numbers Multiplier structure Basic Cell Delay elements
Performance of 8-bit multiplier
On-the-fly least-squares solutions using one and two dimensional systolic array, with p=4. Triangular Architecture For solving triangular linear systems
Systolic Organization for future (nano) technologies To effectively utilize a given technology, the constraints of that technology must be well understood. System designers must consider the limitations of the technology to design a system where those limitations do not impact the overall performance significantly.
Systolic Organization: requirements reconfigurable — to exploit application dependent parallelisms, high level language programmable — to provide task control and flexibility, scalable — to easily extend the architecture to many applications, capable of supporting SIMD organizations for vector operations and MIMD for non homogeneous parallelism requirements.
Systolic Organization is the future Systolic operation and organization is a design philosophy that is aimed to satisfy the architectural constraints imposed by the advances in silicon technology. This design is becoming even more important for all new nano-technologies It offers simplicity, regularity, modularity, and localized communications.
Principle of Local Communication Systolic arrays are typically characterized as having intensive local communications and computations yet, with decentralized parallelism in a compact package. Systolic arrays capitalize on processes which can be performed in a regular, modular, rhythmic, synchronous, and concurrent manner that require intensive repetitive computations.
New concept in Computer Architecture Systolic arrays originally were proposed for fixed or special purpose instances, However, this concept has been extended to more general purpose SIMD and MIMD architectures.
Systolic Characteristics The systolic cells are synchronized by a single global clock. The input data streams are fed to the systolic array only at its boundaries. Different data streams can flow in different directions at different speeds through the array.
Systolic different than pipelined Systolic architectures differ from pipelined systems because; Most of the stages are identical, Input data is not consumed, Input data streams can flow in different directions, Modules may be organized in a two- dimensional (or higher) configuration.
Systolic different than array processors Systolic different than array processors Systolic architectures differ from array of processors because; Processors in systolic organizations are synchronized by a single global clock, but are locally controlled — different systolic cells can perform different operations at the same time.
Systolic Characteristics Systolic Characteristics Systolic architectures allow higher throughputs — concurrent operations of a large number of the processing cells. Ability to increase the execution speed of compute-bound applications without increasing the I/O requirements — reusability of the input data.
Automatic Design? Automatic Design? Algorithms and Mapping: Designers must be intimately familiar with the algorithms that they are implementing on systolic arrays. The heuristic design of systolic arrays from an algorithm is slow, error prone, requires simulation for verification, and often results in a non optimum solution. Automatic array synthesis is a research area of interest. However, most array designs are based on heuristics.
Integration Integration into Existing Systems: Generally, systolic processors are integrated into an existing host as a backend processor.
Systolic Issues Integration into Existing Systems: System integration is often nontrivial because of the array ’ s high I/O requirements. Often, an additional memory subsystem is added between the existing host and the systolic array to support data access and data multiplexing and de- multiplexing since the existing I/O channel of the host rarely satisfies the bandwidth required by the systolic array.
Systolic Issues Systolic Issues Cell Granularity: Low level or high level cell granularity will directly affect the array ’ s throughput, flexibility, and the set of algorithms which may be efficiently executed.
Systolic Issues Cell Granularity: The basic operation performed in each cycle by each cell can range from logical or bit wise operations to word level multiplication and addition to a complete program. Granularity is subject to technology capabilities and limitations as well as design goals. Packaging will also introduce input/output pin restrictions.
Systolic Issues Extensibility: Since systolic arrays are built around the cellular building blocks, the cell design should be sufficiently flexible to allow it to be used in a wide variety of topologies implemented in a wide variety of substrate technologies.
Systolic Issues Clock Synchronization: Clock lines of different lengths within integrated chips, as well as external to the chips, can introduce skews. Clock skew risk is greater when data flow within the systolic array is bi-directional. Wave-front arrays reduce the clock skew problem by introducing more complicated asynchronous inter cellular communications.
Systolic Issues Reliability: As integrated circuits grow larger and larger, inherent fault tolerant abilities must be added if the same degree of reliability is to be maintained. Also diagnostics should be built in at design time so proper operation can more easily be verified.