PMLAB, IECS, FCU Designing Efficient Matrix Transposition on Various Interconnection Networks Using Tensor Product Formulation Presented by Chin-Yi Tsai.

Slides:



Advertisements
Similar presentations
Section 13-4: Matrix Multiplication
Advertisements

Parallel Processing (CS 730) Lecture 7: Shared Memory FFTs*
Parallel Matrix Operations using MPI CPS 5401 Fall 2014 Shirley Moore, Instructor November 3,
Arrays and Matrices CSE, POSTECH. 2 2 Introduction Data is often available in tabular form Tabular data is often represented in arrays Matrix is an example.
Taxanomy of parallel machines. Taxonomy of parallel machines Memory – Shared mem. – Distributed mem. Control – SIMD – MIMD.
Arrays.
Arrays. 1D Array Representation In C 1-dimensional array x = [a, b, c, d] map into contiguous memory locations Memory abcd start location(x[i]) = start.
Parallel Programming: Techniques and Applications Using Networked Workstations and Parallel Computers Chapter 11: Numerical Algorithms Sec 11.2: Implementing.
Maths for Computer Graphics
1 Friday, October 20, 2006 “Work expands to fill the time available for its completion.” -Parkinson’s 1st Law.
Parallel Routing Bruce, Chiu-Wing Sham. Overview Background Routing in parallel computers Routing in hypercube network –Bit-fixing routing algorithm –Randomized.
October 14-15, 2005Conformal Computing Geometry of Arrays: Mathematics of Arrays and  calculus Lenore R. Mullin Computer Science Department College.
Digital Switching in Quantum Domain I. –Ming Tsai and Sy-Yen Kuo Presented by Chin-Yi Tsai.
A Flexible Parallel Architecture Adapted to Block-Matching Motion-Estimation Algorithms Santanu Dutta, and Wayne Wolf IEEE Trans. On CSVT, vol. 6, NO.
CSCI-455/552 Introduction to High Performance Computing Lecture 22.
1 Matrix Addition, C = A + B Add corresponding elements of each matrix to form elements of result matrix. Given elements of A as a i,j and elements of.
Automatic Generation of Customized Discrete Fourier Transform IPs Grace Nordin, Peter A. Milder, James C. Hoe, Markus Püschel Carnegie Mellon University.
LAB-12 2-D Array I Putu Danu Raharja Information & Computer Science Department CCSE - King Fahd University of Petroleum & Minerals.
Interconnection Networks in Multiprocessor Systems By: Wallun Chan Course: CS 147 Text: Chapter 12, p Professor: Sin-Min Lee.
CE 311 K - Introduction to Computer Methods Daene C. McKinney
Chapter 7 Matrix Mathematics Matrix Operations Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
CS32310 MATRICES 1. Vector vs Matrix transformation formulae Geometric reasoning allowed us to derive vector expressions for the various transformation.
Lecture 12: Parallel Sorting Shantanu Dutt ECE Dept. UIC.
College of Nanoscale Science and Engineering A uniform algebraically-based approach to computational physics and efficient programming James E. Raynolds.
Outline  introduction  Sorting Networks  Bubble Sort and its Variants 2.
Matrix Multiplication Instructor : Dr Sushil K Prasad Presented By : R. Jayampathi Sampath Instructor : Dr Sushil K Prasad Presented By : R. Jayampathi.
DISCRETE COMPUTATIONAL STRUCTURES CS Fall 2005.
Stefanos Zafeiriou Machine Learning(395) Course 395: Machine Learning – Math. Intro. Brief Intro to Matrices, Vectors and Derivatives: Equality: Two matrices.
A Reconfigurable Low-power High-Performance Matrix Multiplier Architecture With Borrow Parallel Counters Counters : Rong Lin SUNY at Geneseo
Mar. 1, 2001Parallel Processing1 Parallel Processing (CS 730) Lecture 9: Distributed Memory FFTs * Jeremy R. Johnson Wed. Mar. 1, 2001 *Parts of this lecture.
Two-dimensional fiber array with integrated topology for short-distance optical interconnections Makoto Naruse 1),2), Alvaro Cassinelli 3), and Masatoshi.
Embedding long paths in k-ary n-cubes with faulty nodes and links
An Introduction to Programming with C++ Fifth Edition Chapter 11 Arrays.
PARALLELIZATION OF MULTIPLE BACKSOLVES James Stanley April 25, 2002 Project #2.
1 Interconnection Networks. 2 Interconnection Networks Interconnection Network (for SIMD/MIMD) can be used for internal connections among: Processors,
Winter 2014Parallel Processing, Fundamental ConceptsSlide 1 2 A Taste of Parallel Algorithms Learn about the nature of parallel algorithms and complexity:
2016/1/6Part I1 A Taste of Parallel Algorithms. 2016/1/6Part I2 We examine five simple building-block parallel operations and look at the corresponding.
A Flexible Interleaved Memory Design for Generalized Low Conflict Memory Access Laurence S.Kaplan BBN Advanced Computers Inc. Cambridge,MA Distributed.
Arrays.
Lecture 9 Architecture Independent (MPI) Algorithm Design
Matrix Multiplication The Introduction. Look at the matrix sizes.
PARALLEL COMPUTATION FOR MATRIX MULTIPLICATION Presented By:Dima Ayash Kelwin Payares Tala Najem.
Fast VLSI Implementation of Sorting Algorithm for Standard Median Filters Hyeong-Seok Yu SungKyunKwan Univ. Dept. of ECE, Vada Lab.
Unit-8 Sorting Algorithms Prepared By:-H.M.PATEL.
1 Data Structures and Algorithms Outline This topic will describe: –The concrete data structures that can be used to store information –The basic forms.
Intro to Matrices Reference: “3D Math Primer for Graphics and Game Development”, chapter 7.1.
Matrix Algebra Definitions Operations Matrix algebra is a means of making calculations upon arrays of numbers (or data). Most data sets are matrix-type.
Lecture 5 Graph Theory prepped by Lecturer ahmed AL tememe 1.
CSc 8530 Matrix Multiplication and Transpose By Jaman Bhola.
Distributed-Memory or Graph Models
A Concurrent Matrix Transpose Algorithm Pourya Jafari.
Optimizing Interconnection Complexity for Realizing Fixed Permutation in Data and Signal Processing Algorithms Ren Chen, Viktor K. Prasanna Ming Hsieh.
Interconnection structures
Linear Algebra review (optional)
Dynamic connection system
Richard Dorrance Literature Review: 1/11/13
NETWORK-ON-CHIP HARDWARE ACCELERATORS FOR BIOLOGICAL SEQUENCE ALIGNMENT Author: Souradip Sarkar; Gaurav Ramesh Kulkarni; Partha Pratim Pande; and Ananth.
Array Processor.
High Performance Computing (CS 540)
Unit-2 Divide and Conquer
2. Matrix Algebra 2.1 Matrix Operations.
CSCI N207 Data Analysis Using Spreadsheet
Parallel Programming in C with MPI and OpenMP
Numerical Algorithms Quiz questions
High Performance Computing & Bioinformatics Part 2 Dr. Imad Mahgoub
Linear Algebra review (optional)
Matrix Addition, C = A + B Add corresponding elements of each matrix to form elements of result matrix. Given elements of A as ai,j and elements of B as.
Arrays and Matrices Prof. Abdul Hameed.
Matrix Multiplication Sec. 4.2
Presentation transcript:

PMLAB, IECS, FCU Designing Efficient Matrix Transposition on Various Interconnection Networks Using Tensor Product Formulation Presented by Chin-Yi Tsai

PMLAB, IECS, FCU 2 Outline Introduction Tensor Product Notation Matrix Transposition Designing Matrix Transposition on Various Interconnection Networks Conclusions and Future Work

PMLAB, IECS, FCU 3 Introduction Matrix transposition is a simple, but an important computational problem. A matrix is a two-dimensional data structure which is stored in a one-dimensional computer memory. A simple double-loop transposition program will perform poorly in modern computer architecture with memory hierarchy.

PMLAB, IECS, FCU 4 Introduction (cont ’ d) We develop matrix transposition algorithms on various interconnection networks, including omega, baseline and hypercube networks. Tensor product has been successfully used for designing block recursive algorithm, such as FFT, Strassen ’ s matrix multiplication, parallel prefix algorithm, Hilbert space-filling curve, and Karatsuba ’ s multiplication. Tensor product formulas are also suitable for specifying interconnection networks.

PMLAB, IECS, FCU 5 Introduction (cont ’ d) Different interconnection networks have their own architectural characteristics and properties. Distributed-memory algorithms and VLSI circuit design. A major goal of this study is to provide an effective way for designing VLSI circuits of DSP algorithms.

PMLAB, IECS, FCU 6 Tensor Product Notation Let A and B be two matrices of size and, respectively Stride permutation

PMLAB, IECS, FCU 7 Matrix Transposition Matrix transposition can be viewed as changing the elements from the row-major order to column- major order. Matrix A is stored in a computer memory, the index scheme of element : –Row-major order –Column-major order Various matrix transposition algorithms can be designed by manipulating stride permutation:

PMLAB, IECS, FCU 8 Matrix Transposition (cont ’ d) Step1: blocks with qs elements of each block Step2: perform transposition of matrix for pr blocks Step3: transpose a block matrix with each block of qs elements Step4: convert a block structure order of blocks with qs elements of each blcok to the row- major order of the transposed matrix

PMLAB, IECS, FCU 9 Designing Matrix Transposition on Various Interconnection Networks We consider two kinds of networks: –multistage interconnection network, –direct interconnection network. The basic component of multistage interconnection network is a switching element. A direct interconnection network is a set of processors connected by a set of links. x0x0 x1x1 y1y1 y0y0 x0x0 x1x1 y1y1 y0y0

PMLAB, IECS, FCU 10 Designing Matrix Transposition on Various Interconnection Networks Suppose that N=2 n, Omega network Baseline network Hypercube network

PMLAB, IECS, FCU 11

PMLAB, IECS, FCU 12

PMLAB, IECS, FCU

PMLAB, IECS, FCU 14 Deviation of Algorithm on Omega Interconnection Network

PMLAB, IECS, FCU 15 Omega Interconnection Network

PMLAB, IECS, FCU 16 Deviation of Algorithm on Baseline Interconnection Network Bit-reversal operation Partial bit-reversal operation

PMLAB, IECS, FCU 17 Baseline Interconnection Network

PMLAB, IECS, FCU 18 Hypercube Interconnection Network

PMLAB, IECS, FCU 19 Deviation of Algorithm on Hypercube Interconnection Network

PMLAB, IECS, FCU Hypercube Interconnection Network (cont ’ d)

PMLAB, IECS, FCU 21 Conclusions and Future Work We use tensor product as the framework to design matrix transposition algorithms on various interconnection networks. To manipulate stride permutation operations to fit into networks. VLSI circuit design for DSP and image processing algorithms on various interconnection networks.