Parallel Implementation of the Inversion of Polynomial Matrices Alina Solovyova-Vincent March 26, 2003 A thesis submitted in partial fulfillment of the.

Slides:

Advertisements

Similar presentations

A Large-Grained Parallel Algorithm for Nonlinear Eigenvalue Problems Using Complex Contour Integration Takeshi Amako, Yusaku Yamamoto and Shao-Liang Zhang.

Advertisements

Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.

Factoring of Large Numbers using Number Field Sieve Matrix Step Chandana Anand, Arman Gungor, and Kimberly A. Thomas ECE 646 Fall 2006.

A NOVEL APPROACH TO SOLVING LARGE-SCALE LINEAR SYSTEMS Ken Habgood, Itamar Arel Department of Electrical Engineering & Computer Science GABRIEL CRAMER.

Chapter 3: The Efficiency of Algorithms Invitation to Computer Science, Java Version, Third Edition.

Potential for parallel computers/parallel programming

A Parallel Matching Algorithm Based on Image Gray Scale Liang Zong, Yanhui Wu cso, vol. 1, pp , 2009 International Joint Conference on Computational.

Understanding Application Scaling NAS Parallel Benchmarks 2.2 on NOW and SGI Origin 2000 Frederick Wong, Rich Martin, Remzi Arpaci-Dusseau, David Wu, and.

A Parallel Computational Model for Heterogeneous Clusters Jose Luis Bosque, Luis Pastor, IEEE TRASACTION ON PARALLEL AND DISTRIBUTED SYSTEM, VOL. 17, NO.

Parallel Programming on the SGI Origin2000 With thanks to Moshe Goldberg, TCC and Igor Zacharov SGI Taub Computer Center Technion Mar 2005 Anne Weill-Zrahia.

CSE5304—Project Proposal Parallel Matrix Multiplication Tian Mi.

Chapter 3: The Efficiency of Algorithms Invitation to Computer Science, C++ Version, Third Edition Additions by Shannon Steinfadt SP’05.

Idiom Recognition in the Polaris Parallelizing Compiler Bill Pottenger and Rudolf Eigenmann Presented by Vincent Yau.

May 29, Final Presentation Sajib Barua1 Development of a Parallel Fast Fourier Transform Algorithm for Derivative Pricing Using MPI Sajib Barua.

Performance Optimization of Clustal W: Parallel Clustal W, HT Clustal and MULTICLUSTAL Arunesh Mishra CMSC 838 Presentation Authors : Dmitri Mikhailov,

Complex Function Simplification Program Algorithm Daniel Chromeck and Michael Harris Presentation Date: Thursday August 12, 2008 Mentor: Dr George Antoniou.

Chapter 3: The Efficiency of Algorithms Invitation to Computer Science, C++ Version, Fourth Edition.

N. Karampetakis, S. Vologiannidis

Chapter 3: The Efficiency of Algorithms

Complexity 19-1 Parallel Computation Complexity Andrei Bulatov.

Chapter 3: The Efficiency of Algorithms Invitation to Computer Science, C++ Version, Third Edition Additions by Shannon Steinfadt SP’05.

Monica Garika Chandana Guduru. METHODS TO SOLVE LINEAR SYSTEMS Direct methods Gaussian elimination method LU method for factorization Simplex method of.

P.Krusche / A. Tiskin - Efficient LLCS Computation using Bulk-Synchronous Parallelism Efficient Longest Common Subsequence Computation using Bulk-Synchronous.

A stack based tree searching method for the implementation of the List Sphere Decoder ASP-DAC 2006 paper review Presenter : Chun-Hung Lai.

MATRICES Using matrices to solve Systems of Equations.

Computer Science: A Structured Programming Approach Using C1 Objectives ❏ To understand the basic concepts and uses of arrays ❏ To be able to define C.

Matrix Equations Step 1: Write the system as a matrix equation. A three-equation system is shown below.

Assignment Solving System of Linear Equations Using MPI Phạm Trần Vũ.

Feng Lu Chuan Heng Foh, Jianfei Cai and Liang- Tien Chia Information Theory, ISIT IEEE International Symposium on LT Codes Decoding: Design.

Exercise problems for students taking the Programming Parallel Computers course. Janusz Kowalik Piotr Arlukowicz Tadeusz Puzniakowski Informatics Institute.

Slides Prepared from the CI-Tutor Courses at NCSA By S. Masoud Sadjadi School of Computing and Information Sciences Florida.

Calculating Discrete Logarithms John Hawley Nicolette Nicolosi Ryan Rivard.

Copyright © 2007 Pearson Education, Inc. Slide 7-1.

Parallelism and Robotics: The Perfect Marriage By R.Theron,F.J.Blanco,B.Curto,V.Moreno and F.J.Garcia University of Salamanca,Spain Rejitha Anand CMPS.

Heterogeneous Parallelization for RNA Structure Comparison Eric Snow, Eric Aubanel, and Patricia Evans University of New Brunswick Faculty of Computer.

Performance Evaluation of Parallel Processing. Why Performance?

Chao “Bill” Xie, Victor Bolet, Art Vandenberg Georgia State University, Atlanta, GA 30303, USA February 22/23, 2006 SURA, Washington DC Memory Efficient.

Independent Component Analysis (ICA) A parallel approach.

YOU LI SUPERVISOR: DR. CHU XIAOWEN CO-SUPERVISOR: PROF. LIU JIMING THURSDAY, MARCH 11, 2010 Speeding up k-Means by GPUs 1.

Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.

Performance Prediction for Random Write Reductions: A Case Study in Modelling Shared Memory Programs Ruoming Jin Gagan Agrawal Department of Computer and.

Parallel muiticategory Support Vector Machines (PMC-SVM) for Classifying Microarray Data 研究生 : 許景復單位 : 光電與通訊研究所.

Lecture 9 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.

Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing.

On the Use of Sparse Direct Solver in a Projection Method for Generalized Eigenvalue Problems Using Numerical Integration Takamitsu Watanabe and Yusaku.

Lab 2 Parallel processing using NIOS II processors

Thinking in Parallel – Implementing In Code New Mexico Supercomputing Challenge in partnership with Intel Corp. and NM EPSCoR.

Distributed simulation with MPI in ns-3 Joshua Pelkey and Dr. George Riley Wns3 March 25, 2011.

Shouqing Hao Institute of Computing Technology, Chinese Academy of Sciences Processes Scheduling on Heterogeneous Multi-core Architecture.

1a.1 Parallel Computing and Parallel Computers ITCS 4/5145 Cluster Computing, UNC-Charlotte, B. Wilkinson, 2006.

Chapter 7: Systems of Equations and Inequalities; Matrices

Potential for parallel computers/parallel programming

Department of Computer Science University of California, Santa Barbara

Parallel Inversion of Polynomial Matrices

Chapter 3: The Efficiency of Algorithms

Chapter 8 Arrays Objectives

2009 AAG Annual Meeting Las Vegas, NV March 25th, 2009

Using matrices to solve Systems of Equations

Numerical Algorithms Quiz questions

Chapter 3: The Efficiency of Algorithms

Chapter 8 Arrays Objectives

Potential for parallel computers/parallel programming

Potential for parallel computers/parallel programming

Complexity Measures for Parallel Computation

Potential for parallel computers/parallel programming

Department of Computer Science University of California, Santa Barbara

Potential for parallel computers/parallel programming

Using matrices to solve Systems of Equations

FREERIDE: A Framework for Rapid Implementation of Datamining Engines

Presentation transcript:

Parallel Implementation of the Inversion of Polynomial Matrices Alina Solovyova-Vincent March 26, 2003 A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science with a major in Computer Science.

Acknowledgments I would like to thank Dr. Harris for his generous help and support. I would like to thank my committee members, Dr. Kongmunvattana and Dr. Fadali for their time and helpful comments.

Overview Introduction Existing algorithms Busłowicz’s algorithm Parallel algorithm Results Conclusions and future work

Definitions A polynomial matrix is a matrix which has polynomials in all of its entries. H(s) = H n s n +H n-1 s n-1 +H n-2 s n-2 +…+H o, where H i are constant r x r matrices, i=0, …, n.

Definitions Example: s+2 s 3 + 3s 2 +s s 3 s 2 +1 n=3 – degree of the polynomial matrix r=2 – the size of the matrix H H o = H 1 = …

Definitions H -1 (s) – inverse of the matrix H(s) One of the ways to calculate it H -1 (s) = adj H(s) /det H(s)

Definitions A rational matrix can be expressed as a ration of a numerator polynomial matrix and a denominator scalar polynomial.

Who Needs It??? Multivariable control systems Analysis of power systems Robust stability analysis Design of linear decoupling controllers … and many more areas.

Existing Algorithms Leverrier’s algorithm ( 1840) [sI-H] - resolvent matrix Exact algorithms Approximation methods

The Selection of the Algorithm Before Buslowicz’s algorithm (1980) After Large degree of polynomial operations Lengthy calculations Not very general Some improvements at the cost of increased computational complexity

Buslowicz’s Algorithm Benefits: More general than methods proposed earlier Only requires operations on constant matrices Suitable for computer programming Drawback: the irreducible form cannot be ensured in general

Details of the Algorithm Available upon request

Challenges Encountered (sequential) Several inconsistencies in the original paper:

Challenges Encountered (parallel) for(k=0; k<n*i+1; k++) { } Dependent loops for (i=2; i<r+1; i++) { calculations requiring R[i-1][k] } O(n 2 r 4 )

Challenges Encountered (parallel) Loops of variable length for(k=0; k<n*i+1; k++) { for(ll=0; ll<min+1; ll++) { main calculations } Varies with k

Shared and Distributed Memory Main differences Synchronization of the processes Shared Memory (barrier) Distributed memory (data exchange) for (i=2; i<r+1; i++) { calculations requiring R[i-1] *Synchronization point }

Platforms Distributed memory platforms: SGI 02 NOW MIPS R MHz P IV NOW 1.8 GHz P III Cluster 1GHz P IV Cluster Zeon 2.2GHz

Platforms Shared memory platforms: SGI Power Challenge MPIS R10000 SGI Origin MPIS R MHz

Understanding the Results n – degree of polynomial (<= 25) r – size of a matrix (<=25) Sequential algorithm – O(n 2 r 5 ) Average of multiple runs Unloaded platforms

Sequential Run Times (n=25, r=25) PlatformTimes (sec) SGI O2 NOW P IV NOW22.94 P III Cluster26.10 P IV Cluster18.75 SGI Power Challenge SGI Origin

Results – Distributed Memory Speedup SGI O2 NOW - slowdown P IV NOW - minimal speedup

Speedup (P III & P IV Clusters)

Results – Shared Memory Excellent results!!!

Speedup (SGI Power Challenge)

Speedup (SGI Origin 2000) Superlinear speedup!

Run times (SGI Power Challenge) 8 processors

Run times (SGI Origin 2000) n =25

Run times (SGI Power Challenge) r =20

Efficiency P III Cluster 89.7%76.5%61.3%58.5%40.1%25.0% P IV Cluster 88.3%68.2%49.9%46.9%26.1%15.5% SGI Power Challenge 99.7%98.2%97.9%95.8%n/a SGI Origin %98.7%99.0%98.2%93.8%n/a

Conclusions We have performed an exhaustive search of all available algorithms; We have implemented the sequential version of Busłowicz’s algorithm; We have implemented two versions of the parallel algorithm; We have tested parallel algorithm on 6 different platforms; We have obtained excellent speedup and efficiency in a shared memory environment.

Future Work Study the behavior of the algorithm for larger problem sizes (distributed memory). Re-evaluate message passing in distributed memory implementation. Extend Buslowicz’s algorithm to inverting multivariable polynomial matrices H(s 1, s 2 … s k ).

Questions