System Development. Numerical Techniques for Matrix Inversion.

Slides:

Advertisements

Similar presentations

Computer Architecture

Advertisements

Systolic Arrays & Their Applications

Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.

Lecture 19: Parallel Algorithms

1 (Review of Prerequisite Material). Processes are an abstraction of the operation of computers. So, to understand operating systems, one must have a.

Control path Recall that the control path is the physical entity in a processor which: fetches instructions, fetches operands, decodes instructions, schedules.

Data Dependencies Describes the normal situation that the data that instructions use depend upon the data created by other instructions, or data is stored.

A NOVEL APPROACH TO SOLVING LARGE-SCALE LINEAR SYSTEMS Ken Habgood, Itamar Arel Department of Electrical Engineering & Computer Science GABRIEL CRAMER.

Give qualifications of instructors: DAP

Why Systolic Architecture ?. Motivation & Introduction We need a high-performance, special-purpose computer system to meet specific application. I/O and.

Operating Systems Lecture 10 Issues in Paging and Virtual Memory Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard. Zhiqing.

1/1/ /e/e eindhoven university of technology Microprocessor Design Course 5Z008 Dr.ir. A.C. (Ad) Verschueren Eindhoven University of Technology Section.

Programming Languages Marjan Sirjani 2 2. Language Design Issues Design to Run efficiently : early languages Easy to write correctly : new languages.

CSCI 4717/5717 Computer Architecture

Lecture 9: Coarse Grained FPGA Architecture October 6, 2004 ECE 697F Reconfigurable Computing Lecture 9 Coarse Grained FPGA Architecture.

MATH 685/ CSI 700/ OR 682 Lecture Notes

1 Parallel Algorithms II Topics: matrix and graph algorithms.

Numerical Algorithms Matrix multiplication

CS 151 Digital Systems Design Lecture 37 Register Transfer Level

Zheming CSCE715.  A wireless sensor network (WSN) ◦ Spatially distributed sensors to monitor physical or environmental conditions, and to cooperatively.

Extensible Processors. 2 ASIP Gain performance by:  Specialized hardware for the whole application (ASIC). −  Almost no flexibility. −High cost.  Use.

Reference: Message Passing Fundamentals.

Data Parallel Algorithms Presented By: M.Mohsin Butt

Chapter 5: Computer Systems Organization Invitation to Computer Science, Java Version, Third Edition.

Recap – Our First Computer WR System Bus 8 ALU Carry output A B S C OUT F 8 8 To registers’ input/output and clock inputs Sequence of control signal combinations.

Chapter 2: Impact of Machine Architectures What is the Relationship Between Programs, Programming Languages, and Computers.

0 What is a computer?  Simply put, a computer is a sophisticated electronic calculating machine that:  Accepts input information,  Processes the information.

GPGPU platforms GP - General Purpose computation using GPU

Study of AES Encryption/Decription Optimizations Nathan Windels.

Chapter 1 Introduction. Computer Architecture selecting and interconnecting hardware components to create computers that meet functional, performance.

EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)

03/12/20101 Analysis of FPGA based Kalman Filter Architectures Arvind Sudarsanam Dissertation Defense 12 March 2010.

1 Chapter 1 Parallel Machines and Computations (Fundamentals of Parallel Processing) Dr. Ranette Halverson.

Intro to Architecture – Page 1 of 22CSCI 4717 – Computer Architecture CSCI 4717/5717 Computer Architecture Topic: Introduction Reading: Chapter 1.

Chapter 5: Computer Systems Organization Invitation to Computer Science, Java Version, Third Edition.

EEL Software development for real-time engineering systems.

Advanced Computer Architecture 0 Lecture # 1 Introduction by Husnain Sherazi.

A Reconfigurable Low-power High-Performance Matrix Multiplier Architecture With Borrow Parallel Counters Counters : Rong Lin SUNY at Geneseo

© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 Basic Parallel Programming Concepts Computational.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M

ELEN 033 Lecture #1 Tokunbo Ogunfunmi Santa Clara University.

CDA 3101 Fall 2013 Introduction to Computer Organization The Arithmetic Logic Unit (ALU) and MIPS ALU Support 20 September 2013.

ECEG-3202 Computer Architecture and Organization Chapter 3 Top Level View of Computer Function and Interconnection.

Rinoy Pazhekattu. Introduction  Most IPs today are designed using component-based design  Each component is its own IP that can be switched out for.

Chapter 4 MARIE: An Introduction to a Simple Computer.

System-level power analysis and estimation September 20, 2006 Chong-Min Kyung.

Computer Organization CDA 3103 Dr. Hassan Foroosh Dept. of Computer Science UCF © Copyright Hassan Foroosh 2002.

1 Copyright  2001 Pao-Ann Hsiung SW HW Module Outline l Introduction l Unified HW/SW Representations l HW/SW Partitioning Techniques l Integrated HW/SW.

3/12/2013Computer Engg, IIT(BHU)1 CONCEPTS-1. Pipelining Pipelining is used to increase the speed of processing It uses temporal parallelism In pipelining,

Onlinedeeneislam.blogspot.com1 Design and Analysis of Algorithms Slide # 1 Download From

Winter-Spring 2001Codesign of Embedded Systems1 Essential Issues in Codesign: Architectures Part of HW/SW Codesign of Embedded Systems Course (CE )

Full Design. DESIGN CONCEPTS The main idea behind this design was to create an architecture capable of performing run-time load balancing in order to.

VLSI SP Course 2001 台大電機吳安宇 1 Why Systolic Architecture ? H. T. Kung Carnegie-Mellon University.

Recap – Our First Computer WR System Bus 8 ALU Carry output A B S C OUT F 8 8 To registers’ read/write and clock inputs Sequence of control signal combinations.

A Level Computing – a2 Component 2 1A, 1B, 1C, 1D, 1E.

Introduction Introduction to VHDL Entities Signals Data & Scalar Types

CS1251 Computer Architecture

Lesson 4 Synchronous Design Architectures: Data Path and High-level Synthesis (part two) Sept EE37E Adv. Digital Electronics.

Chapter 5: Computer Systems Organization

Numerical Algorithms Quiz questions

BIC 10503: COMPUTER ARCHITECTURE

ECE 352 Digital System Fundamentals

Presentation transcript:

System Development

Numerical Techniques for Matrix Inversion

The Elementary Technique Matrix Inversion using Co-Factors

Inversion using Co-Factors? Not Suitable Computationally!! This technique is a very bad contender for implementation Complexity : ‘N!’ (N x N-1 x N-2 x … x 3 x 2 x 1) (Evaluated for SIMD machines) A recursive algorithm may lend an elegant solution but – Devours memory resources with extreme greed – Drags the processor out from the Grand Prix into a traffic jam Therefore, a computationally extremely expensive algorithm with magnanimous memory requirements Above all a SPS hardware architecture for this technique is a distant reality because of the irregular global communication requirements lend to it by its recursive algorithm

Any Alternatives? Fortunately YES! A technique which employs LU Decomposition and Triangular Matrix Inversion for it’s solution Complexity : N 3 (Evaluated for SIMD machines) What are these numerical techniques? (We’ll soon get to learn them) The distinct advantage of these techniques is the fact that their solution is a mimicry of the Gaussian Elimination procedure, which in turn is an excellent contender for systolic implementations

To the Computationally Efficient Numerical Techniques Matrix Inversion using LU Decomposition and Triangular Matrix Inversion

Upper Triangular Matrix

Lower Triangular Matrix

A Systolic Architecture for Triangular Matrix Inversion Matrix Order is 4 x 4

Regular Cells

Boundary Cells

The following architecture’s abstract computational working has been illustrated using the upper triangular matrix. The same architecture, after some arrangement of data, can be employed for the computation of a lower triangular matrix.

Array for LU Decomposition? Left for you to practice! Try to develop an idea of it’s dataflow independently and without any help. It will lend you and excellent understanding systolic data flows.

A Systolic System for the Complete Matrix Inversion Algorithm

Mapping Mapping is a procedure through which we can achieve the phenomenon of Resource Reuse. Mapping means that two or more algorithms use the same hardware architecture for their execution. It turns out that the most excellent contenders for Resource Reuse are Arithmetic Blocks or as in our case the Processing Elements. Usually, before mapping algorithms on to the same set of Processing Elements we need to develop a Scheduling Algorithm. A Scheduling Algorithm decides that at ‘which time interval’ will a particular processing element execute ‘what data’ for a particular algorithm, out of the given set of algorithms required to be mapped onto the system.

An Example for Mapping The Square Matrix Multiplication Array on the Band Matrix Multiplication Array

The Array for Band Matrices

The Array for Square Matrices

The Combined or “Mapped” Array The ‘maroon’ lines represent common connections to each array

The control signal, in sense, will perform the scheduling of operations

In experience, I’ve found the Muliplexer to be arguably the single most important logic element for Datapath design. It’s use is especially imperative to resource efficient system design, as well as in devising the data-flow (data routing) between various devices within the system. Therefore, learning to utilize and eventually control multiplexers in system interconnection is critically essential for system design. I’ll assert upon the fact that you develop a clever understanding of this device as expertees with it will facilitate your design process and help you groom into excellent ‘Special-Purpose-System’ Datapath Designers. A Sincere Advice!!

General Framework for Datapath Development involving Processing Elements which require various Data Sources

Procedure that can be adopted for routing data of varoius algorithms and tasks that maybe utilizing the same Processing Elements

The Do-Yourself Thing

Resource Efficiency ‘Mapping’ is a technique that results in reduced Logic Resource Consumption. Another effective technique for Area Optimization is developing ‘Partially-Parallel/Semi-Parallel Architectures’ from the Fully-Parallel Algorithm Data-path. This is actually considered as a ‘Time to Area Tradeoff’ approach and is valid only and until it suffices the Real-time requirements of the Special Purpose System being developed.

I’ll throw light upon SPS Semi-Parallel Architectures using the Matrix Multiplication Problem

The Single Processing Element Approach

The Fully Parallel (Simple and Systolic) Architecture for Matrix Multiplication

The Semi-Parallel (Simple and Systolic) Architecture for Matrix Multiplication

Towards Complete Systems

Kalman Filter Equations

Extended Kalman Filter Equations

The Local Control These are usually state machines or counters In this particular example they are used to – Generate addresses and read/write signals for the data storages – Specify the function to be performed by the processing elements of the array – May also be used for selecting data inputs of multiplexers for data transfer between the arrays and also for set, reset and load operations for various registers

The Global Control These are usually wait-assert or interrupt based state-machines This may be again a state machine or a counter (at times rather large and complex) May be a Programmable State Machine! Programmable State Machine? These are like small microcontrollers that can be programmed through software

HW/SW Co-Design HW/SW stands for hardware software co-design The concept is to solve the problem partially in software and the rest in hardware Why software? Because sequential problems are more suited to software solutions Let’s understand the particular example of Kalman/H-Infinity Filter design using the Xilinx 8- bit PicoBlaze or KCPSM (Constant Coded Programmable State Machine)

A Glance at the PicoBlaze Architecture

But Why? Why PicoBlaze?

Application of Wait-Assert type Global Control in Kalman System Design

Down Memory Lane Remeber and Relate!!

Q & A s