Presented by: Tim Olson, Architect

Slides:

Advertisements

Similar presentations

Clare Smtih SHARC Presentation1 The SHARC Super Harvard Architecture Computer.

Advertisements

Is There a Real Difference between DSPs and GPUs?

Vectors, SIMD Extensions and GPUs COMP 4611 Tutorial 11 Nov. 26,

Lecture 4 Introduction to Digital Signal Processors (DSPs) Dr. Konstantinos Tatas.

Slides Prepared from the CI-Tutor Courses at NCSA By S. Masoud Sadjadi School of Computing and Information Sciences Florida.

Computer Abstractions and Technology

CSC457 Seminar YongKang Zhu December 6 th, 2001 About Network Processor.

Slide 1 Freescale Semiconductor. Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are.

Multiprocessors ELEC 6200: Computer Architecture and Design Instructor : Agrawal Name: Nam.

Computational Astrophysics: Methodology 1.Identify astrophysical problem 2.Write down corresponding equations 3.Identify numerical algorithm 4.Find a computer.

A System Solution for High- Performance, Low Power SDR Yuan Lin 1, Hyunseok Lee 1, Yoav Harel 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 and Krisztian.

VIRAM-1 Architecture Update and Status Christoforos E. Kozyrakis IRAM Retreat January 2000.

Hitachi SR8000 Supercomputer LAPPEENRANTA UNIVERSITY OF TECHNOLOGY Department of Information Technology Introduction to Parallel Computing Group.

1 Fast Communication for Multi – Core SOPC Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab.

Copyright © 2006, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners Intel® Core™ Duo Processor.

1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.

Emotion Engine A look at the microprocessor at the center of the PlayStation2 gaming console Charles Aldrich.

Computer performance.

Motivation Mobile embedded systems are present in: –Cell phones –PDA’s –MP3 players –GPS units.

Semiconductor Memory 1970 Fairchild Size of a single core –i.e. 1 bit of magnetic core storage Holds 256 bits Non-destructive read Much faster than core.

1 Design of an SIMD Multimicroprocessor for RCA GaAs Systolic Array Based on 4096 Node Processor Elements Adaptive signal processing is of crucial importance.

One-Chip TeraArchitecture 19 martie 2009 One-Chip TeraArchitecture Gheorghe Stefan

Implementation of Parallel Processing Techniques on Graphical Processing Units Brad Baker, Wayne Haney, Dr. Charles Choi.

Company LOGO High Performance Processors Miguel J. González Blanco Miguel A. Padilla Puig Felix Rivera Rivas.

Real-Time HD Harmonic Inc. Real Time, Single Chip High Definition Video Encoder! December 22, 2004.

C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Sections 1.5 – 1.11.

Multiprocessing. Going Multi-core Helps Energy Efficiency William Holt, HOT Chips 2005 Adapted from UC Berkeley "The Beauty and Joy of Computing"

© 2007 SET Associates Corporation SAR Processing Performance on Cell Processor and Xeon Mark Backues, SET Corporation Uttam Majumder, AFRL/RYAS.

3/15/2002CSE Final Remarks Concluding Remarks SOAP.

TI DSPS FEST 1999 Implementation of Channel Estimation and Multiuser Detection Algorithms for W-CDMA on Digital Signal Processors Sridhar Rajagopal Gang.

SkauConfidential and Proprietary Honeywell Page 1 Algorithm Flow: GMTI ECCM Beamforming (Jammer Nulling) 3.4 GOPS Adaptive Weights 0.12 GOPS Matrix Formation.

A few issues on the design of future multicores André Seznec IRISA/INRIA.

Hardware Benchmark Results for An Ultra-High Performance Architecture for Embedded Defense Signal and Image Processing Applications September 29, 2004.

Chapter 1 Introduction to the Systems Approach

Copyright © 2004, Dillon Engineering Inc. All Rights Reserved. An Efficient Architecture for Ultra Long FFTs in FPGAs and ASICs  Architecture optimized.

Playstation2 Architecture Architecture Hardware Design.

Fundamentals of Programming Languages-II

Design of A Custom Vector Operation API Exploiting SIMD Intrinsics within Java Presented by John-Marc Desmarais Authors: Jonathan Parri, John-Marc Desmarais,

WorldScape Defense Company, L.L.C. Company Proprietary Slide 1 An Ultra-High Performance Scalable Processing Architecture for HPC and Embedded Applications.

Presented by Jeremy S. Meredith Sadaf R. Alam Jeffrey S. Vetter Future Technologies Group Computer Science and Mathematics Division Research supported.

1 Design of an MIMD Multimicroprocessor for DSM A Board Which turns PC into a DSM Node Based on the RM Approach 1 The RM approach is essentially a write-through.

“Processors” issues for LQCD January 2009 André Seznec IRISA/INRIA.

IBM Cell Processor Ryan Carlson, Yannick Lanner-Cusin, & Cyrus Stoller CS87: Parallel and Distributed Computing.

HPEC-1 SMHS 7/7/2016 MIT Lincoln Laboratory Focus 3: Cell Sharon Sacco / MIT Lincoln Laboratory HPEC Workshop 19 September 2007 This work is sponsored.

Feeding Parallel Machines – Any Silver Bullets? Novica Nosović ETF Sarajevo 8th Workshop “Software Engineering Education and Reverse Engineering” Durres,

Sridhar Rajagopal Bryan A. Jones and Joseph R. Cavallaro

Itanium® 2 Processor Architecture

Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Parallel Hardware Dr. Xiao Qin Auburn.

CMSC 611: Advanced Computer Architecture

Distributed Processors

Morgan Kaufmann Publishers

Reconfigurable Computing MONARCH/MCHIP

How does an SIMD computer work?

Laxmi Narayan Bhuyan SIMD Architectures Laxmi Narayan Bhuyan

Many-core Software Development Platforms

عمارة الحاسب.

Introduction to Digital Signal Processors (DSPs)

Array Processor.

Chapter 1 Introduction.

Computer Organization

Introduction and History of Cray Supercomputers

Coe818 Advanced Computer Architecture

Chapter 1 Introduction.

Hybrid Programming with OpenMP and MPI

Chapter 4 Multiprocessors

Chip&Core Architecture

What Choices Make A Killer Video Processor Architecture?

CSE 502: Computer Architecture

Presentation transcript:

Presented by: Tim Olson, Architect An Innovative High-Performance Architecture for Vector and Matrix Math Algorithms Presented by: Tim Olson, Architect HPEC 2002 – September 24, 2002 Authors: Veeraraghavan Anantha, Ph.D.; Christophe Harlé, Ph.D.; Tim Olson, George Yost, Ph.D. © 2002 Intrinsity, Inc. Intrinsity, the Intrinsity logo, the Intrinsity dot logo, Advanced Signal Processor, and FastMATH are trademarks of Intrinsity, Inc. MIPS is among the registered trademarks and MIPS32 is among the trademarks of MIPS Technology, Inc. RapidIO and the RapidIO logo are trademarks of the RapidIO Trade Association. All other trademarks are for reference purposes only and are the property of their respective owners.

Intrinsity FastMATH™ Vector and Matrix Math Processor Optimized for real-time and adaptive signal processing needs: Innovative architecture: On-chip matrix coprocessor and MIPS32™ ISA RISC core 4 × 4 array of processors, each with sixteen 32-bit registers, two 40-bit MACs 64 GOPS (peak) Matrix and vector math native instructions: 1-, 8-, 16-, 32-bit support; convenient complex math Descriptor-based DMA controller 1 Mbyte on-chip cache-coherent L2 cache 2 GHz SIMD 4 × 4 matrix engine with multiprocessor scalability due to high bandwidth RapidIO™ interfaces Fixed-point math High-level (e.g., C) language programmable Compiler built-in matrix intrinsics Vector/matrix library Speed plus an architecture designed for parallel computations

Intrinsity FastMATH Vector and Matrix Math Processor 2 GHz MIPS® scalar engine: dual issue instructions 2 GHz interconnected 4 × 4 matrix processor with 16 registers RapidIO ports balance I/O and processor speed

FastMATH Example: Applications and Performance FFTs OFDM Fast correlation/convolution Smart Antenna Beamforming using matrix-matrix multiplication Parallel weight matrix calculation over 16 users CDMA Multi-User Detection Jacobi iteration via matrix-matrix multiplication Calculation distributed over FastMATH processors via RapidIO channels