KERRY BARNES WILLIAM LUNDGREN JAMES STEED

Slides:



Advertisements
Similar presentations
WATERLOO ELECTRICAL AND COMPUTER ENGINEERING 20s: Computer Hardware 1 WATERLOO ELECTRICAL AND COMPUTER ENGINEERING 20s Computer Hardware Department of.
Advertisements

Accelerators for HPC: Programming Models Accelerators for HPC: StreamIt on GPU High Performance Applications on Heterogeneous Windows Clusters
8. Code Generation. Generate executable code for a target machine that is a faithful representation of the semantics of the source code Depends not only.
Multi-core systems System Architecture COMP25212 Daniel Goodman Advanced Processor Technologies Group.
Computer Abstractions and Technology
Implementation of 2-D FFT on the Cell Broadband Engine Architecture William Lundgren Gedae), Kerry Barnes (Gedae), James Steed (Gedae)
1 BGL Photo (system) BlueGene/L IBM Journal of Research and Development, Vol. 49, No. 2-3.
ELEC Fall 05 1 Very- Long Instruction Word (VLIW) Computer Architecture Fan Wang Department of Electrical and Computer Engineering Auburn.
CS 300 – Lecture 2 Intro to Computer Architecture / Assembly Language History.
Chapter 6: An Introduction to System Software and Virtual Machines
0 HPEC 2010 Automated Software Cache Management.
Chapter 6: An Introduction to System Software and Virtual Machines Invitation to Computer Science, C++ Version, Fourth Edition ** Re-ordered, Updated 4/14/09.
Presenter MaxAcademy Lecture Series – V1.0, September 2011 Introduction and Motivation.
Programming the Cell Multiprocessor Işıl ÖZ. Outline Cell processor – Objectives – Design and architecture Programming the cell – Programming models CellSs.
Advances in Language Design
Computer Architecture Lecture 01 Fasih ur Rehman.
1 Lecture 2 : Computer System and Programming. Computer? a programmable machine that  Receives input  Stores and manipulates data  Provides output.
Computer System Architectures Computer System Software
1.1 1 Introduction Foundations of Computer Science  Cengage Learning.
1 CSC 1401 S1 Computer Programming I Hamid Harroud School of Science and Engineering, Akhawayn University
Shared memory systems. What is a shared memory system Single memory space accessible to the programmer Processor communicate through the network to the.
Codeplay CEO © Copyright 2012 Codeplay Software Ltd 45 York Place Edinburgh EH1 3HP United Kingdom Visit us at The unique challenges of.
Gedae, Inc. Implementing Modal Software in Data Flow for Heterogeneous Architectures James Steed, Kerry Barnes, William Lundgren Gedae, Inc.
Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
1 Chapter 1 Parallel Machines and Computations (Fundamentals of Parallel Processing) Dr. Ranette Halverson.
COMPUTER SCIENCE &ENGINEERING Compiled code acceleration on FPGAs W. Najjar, B.Buyukkurt, Z.Guo, J. Villareal, J. Cortes, A. Mitra Computer Science & Engineering.
Gedae Portability: From Simulation to DSPs to the Cell Broadband Engine James Steed, William Lundgren, Kerry Barnes Gedae, Inc
Introduction Computer Organization and Architecture: Lesson 1.
Sogang University Advanced Computing System Chap 1. Computer Architecture Hyuk-Jun Lee, PhD Dept. of Computer Science and Engineering Sogang University.
Programming Examples that Expose Efficiency Issues for the Cell Broadband Engine Architecture William Lundgren Gedae), Rick Pancoast.
4.2.1 Programming Models Technology drivers – Node count, scale of parallelism within the node – Heterogeneity – Complex memory hierarchies – Failure rates.
Copyright © 2007 Addison-Wesley. All rights reserved.1-1 Reasons for Studying Concepts of Programming Languages Increased ability to express ideas Improved.
Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors.
Guiding Principles. Goals First we must agree on the goals. Several (non-exclusive) choices – Want every CS major to be educated in performance including.
Part 1.  Intel x86/Pentium family  32-bit CISC processor  SUN SPARC and UltraSPARC  32- and 64-bit RISC processors  Java  C  C++  Java  Why Java?
Areas of Computing Study. Artificial Intelligence Databases and Data Science Human-Centered Computing Networking Information Security System Software.
Group May Bryan McCoy Kinit Patel Tyson Williams.
Computer Architecture 2 nd year (computer and Information Sc.)
Overview of Operating Systems Introduction to Operating Systems: Module 0.
Gedae, Inc. Gedae: Auto Coding to a Virtual Machine Authors: William I. Lundgren, Kerry B. Barnes, James W. Steed HPEC 2004.
CPSC 171 Introduction to Computer Science System Software and Virtual Machines.
1 chapter 1 Computer Architecture and Design ECE4480/5480 Computer Architecture and Design Department of Electrical and Computer Engineering University.
CDP Tutorial 3 Basics of Parallel Algorithm Design uses some of the slides for chapters 3 and 5 accompanying “Introduction to Parallel Computing”, Addison.
… begin …. Parallel Computing: What is it good for? William M. Jones, Ph.D. Assistant Professor Computer Science Department Coastal Carolina University.
DR. SIMING LIU SPRING 2016 COMPUTER SCIENCE AND ENGINEERING UNIVERSITY OF NEVADA, RENO Session 2 Computer Organization.
Lecture 27 Multiprocessor Scheduling. Last lecture: VMM Two old problems: CPU virtualization and memory virtualization I/O virtualization Today Issues.
Parallel IO for Cluster Computing Tran, Van Hoai.
Uses some of the slides for chapters 3 and 5 accompanying “Introduction to Parallel Computing”, Addison Wesley, 2003.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 April 28, 2005 Session 29.
Gedae Portability: From Simulation to DSPs to the Cell Broadband Engine James Steed, William Lundgren, Kerry Barnes Gedae, Inc
1 Potential for Parallel Computation Chapter 2 – Part 2 Jordan & Alaghband.
Computer Architecture Furkan Rabee
Gedae, Inc. Implementing Modal Software in Data Flow for Heterogeneous Architectures James Steed, Kerry Barnes, William Lundgren Gedae, Inc.
Computer Organization Exam Review CS345 David Monismith.
Computer Organization and Architecture Lecture 1 : Introduction
Yuanrui Zhang, Mahmut Kandemir
For Massively Parallel Computation The Chaotic State of the Art
A Common Machine Language for Communication-Exposed Architectures
课程名 编译原理 Compiling Techniques
Multi-Processing in High Performance Computer Architecture:
Multi-Processing in High Performance Computer Architecture:
Foundations of Computer Science
Digital Processing Platform
How Efficient Can We Be?: Bounds on Algorithm Energy Consumption
Chapter 1 Introduction.
COMP60621 Fundamentals of Parallel and Distributed Systems
EE 4xx: Computer Architecture and Performance Programming
Principles of Programming Languages
COMP60611 Fundamentals of Parallel and Distributed Systems
Presentation transcript:

KERRY BARNES WILLIAM LUNDGREN JAMES STEED AMON ARNON FRIEDMANN, PH.D. (TI, HECTOR RIVERA (TI, 0 HPEC 2011 Implementation of 2DFFT Processing and Large Dense Matrix Multiply on the TMS320C6678 Processor

HPEC Presentation History 1 Graphical Programming and Autocoding for Multiprocessor Systems The Expressiveness of the GEDAE Graph Language – 1999 Implementation of the Intelligent Detector-Tracker Algorithm on Embedded Hardware Connected to a Local Area Network – 2002 Gedae Runtime Kernel Performance Characterization – 2004 Gedae: Auto Coding to a Virtual Machine – 2005 How Code Generation Will Save Moore’s Law – 2006 The Impact of Programming Difficulty on Hardware Obsolescence – 2007 Programming Examples that Expose Efficiency Issues for the Cell Broadband Engine Architecture – 2008 Simple, Efficient, Portable Decomposition of Large Data Sets – 2009 Implementation of 2-D FFT on the Cell Broadband Engine Architecture – 2010 Automated Software Cache Management – 2010

2 "When we start talking about parallelism and ease of use of truly parallel computers, we're talking about a problem that's as hard as any that computer science has faced. I would be panicked if I were in industry." John Hennessy, Stanford President, Computing Pioneer “A Conversation with Hennessy & Patterson,” ACM Queue Magazine, 4:10, 1/07 Gedae's technology was developed from first principles to target parallel computers. The effort spanned a quarter century.

The Concept is Familiar 3 Idea™ Compiler Gedae Architecture Class Gedae von Neumann FORTRAN & Others Compiler von Neumann Architecture

The Language Enables the Compiler 4 Idea™ Compiler Gedae Architecture Class The schedule of distributed execution is important because it dramatically effects resource use and ultimately throughput and power consumption. Gedae’s Idea language allows the compiler to reason about concurrency, resource use and the schedule of execution to produce efficient parallel code that runs on architectures with N CPUs, M memories, K interconnects

TMS320C6678 Architecture 5 Key Characteristics: Maximum of 160 gflops Maximum memory IO of 16 gB/s Power consumption about 10 watts

Summary of Results 6 Theoretical † Practical † Measured CPUMemoryCPUMemoryCPUMemory 2D FFT (512x512) 1.48e-46.55e-43.58e-47.28e-4TBD Matrix Mult (4K x 4K) 8.56e-11.57e e-2TBD Summary of Processing and IOTimes TheoreticalPracticalMeasured 2D FFT (512x512) TBD Matrix Mult (4K x 4K) TBD Summary of GFLOP Sustained † practical is the rate achievable based on available optimized vector functions theoretical is the rate achievable based on the balance between adds and multiplies