A Common Machine Language for Communication-Exposed Architectures

Slides:



Advertisements
Similar presentations
Accelerators for HPC: Programming Models Accelerators for HPC: StreamIt on GPU High Performance Applications on Heterogeneous Windows Clusters
Advertisements

ECE669 L3: Design Issues February 5, 2004 ECE 669 Parallel Computer Architecture Lecture 3 Design Issues.
TU/e Processor Design 5Z0321 Processor Design 5Z032 Computer Systems Overview Chapter 1 Henk Corporaal Eindhoven University of Technology 2011.
Lecture 2-Berkeley RISC Penghui Zhang Guanming Wang Hang Zhang.
Introduction CS 524 – High-Performance Computing.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Dec 5, 2005 Topic: Intro to Multiprocessors and Thread-Level Parallelism.
Instruction Level Parallelism (ILP) Colin Stevens.
Multiprocessors ELEC 6200: Computer Architecture and Design Instructor : Agrawal Name: Nam.
Chapter Hardwired vs Microprogrammed Control Multithreading
ELEC Fall 05 1 Very- Long Instruction Word (VLIW) Computer Architecture Fan Wang Department of Electrical and Computer Engineering Auburn.
Static Translation of Stream Programming to a Parallel System S. M. Farhad PhD Student Supervisor: Dr. Bernhard Scholz Programming Language Group School.
State Machines Timing Computer Bus Computer Performance Instruction Set Architectures RISC / CISC Machines.
RISC. Rational Behind RISC Few of the complex instructions were used –data movement – 45% –ALU ops – 25% –branching – 30% Cheaper memory VLSI technology.
1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.
University of Michigan Electrical Engineering and Computer Science Amir Hormati, Mehrzad Samadi, Mark Woh, Trevor Mudge, and Scott Mahlke Sponge: Portable.
INTEL CONFIDENTIAL Why Parallel? Why Now? Introduction to Parallel Programming – Part 1.
Computer performance.
Computer Architecture ECE 4801 Berk Sunar Erkay Savas.
Semiconductor Memory 1970 Fairchild Size of a single core –i.e. 1 bit of magnetic core storage Holds 256 bits Non-destructive read Much faster than core.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
1 Chapter 1 Parallel Machines and Computations (Fundamentals of Parallel Processing) Dr. Ranette Halverson.
Master Program (Laurea Magistrale) in Computer Science and Networking High Performance Computing Systems and Enabling Platforms Marco Vanneschi 1. Prerequisites.
A Reconfigurable Architecture for Load-Balanced Rendering Graphics Hardware July 31, 2005, Los Angeles, CA Jiawen Chen Michael I. Gordon William Thies.
Super computers Parallel Processing By Lecturer: Aisha Dawood.
Computer Organization and Architecture Tutorial 1 Kenneth Lee.
Spring 2003CSE P5481 Issues in Multiprocessors Which programming model for interprocessor communication shared memory regular loads & stores message passing.
Dean Tullsen UCSD.  The parallelism crisis has the feel of a relatively new problem ◦ Results from a huge technology shift ◦ Has suddenly become pervasive.
Introduction to Computer Organization
M U N - February 17, Phil Bording1 Computer Engineering of Wave Machines for Seismic Modeling and Seismic Migration R. Phillip Bording February.
Pipelining and Parallelism Mark Staveley
A Common Machine Language for Communication-Exposed Architectures Bill Thies, Michal Karczmarek, Michael Gordon, David Maze and Saman Amarasinghe MIT Laboratory.
Von Neumann Computers Article Authors: Rudolf Eigenman & David Lilja
StreamIt on Raw StreamIt Group: Michael Gordon, William Thies, Michal Karczmarek, David Maze, Jasper Lin, Jeremy Wong, Andrew Lamb, Ali S. Meli, Chris.
3/12/2013Computer Engg, IIT(BHU)1 CONCEPTS-1. Pipelining Pipelining is used to increase the speed of processing It uses temporal parallelism In pipelining,
High-Bandwidth Packet Switching on the Raw General-Purpose Architecture Gleb Chuvpilo Saman Amarasinghe MIT LCS Computer Architecture Group January 9,
Linear Analysis and Optimization of Stream Programs Masterworks Presentation Andrew A. Lamb 4/30/2003 Professor Saman Amarasinghe MIT Laboratory for Computer.
Distributed and Parallel Processing George Wells.
William Stallings Computer Organization and Architecture 6th Edition
Computer Organization and Architecture Lecture 1 : Introduction
INTRODUCTION TO MULTISCALAR ARCHITECTURE
Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Parallel Hardware Dr. Xiao Qin Auburn.
Topics to be covered Instruction Execution Characteristics
CST 303 COMPUTER SYSTEMS ARCHITECTURE (2 CREDITS)‏
COMP 740: Computer Architecture and Implementation
Advanced Architectures
15-740/ Computer Architecture Lecture 3: Performance
SOFTWARE DESIGN AND ARCHITECTURE
Advanced Topic: Alternative Architectures Chapter 9 Objectives
Architecture & Organization 1
Multi-Processing in High Performance Computer Architecture:
5.2 Eleven Advanced Optimizations of Cache Performance
Chapter 14 Instruction Level Parallelism and Superscalar Processors
Vector Processing => Multimedia
Architecture & Organization 1
Computer Architecture Lecture 4 17th May, 2006
BIC 10503: COMPUTER ARCHITECTURE
Intro to Architecture & Organization
Instruction Level Parallelism and Superscalar Processors
Coe818 Advanced Computer Architecture
AN INTRODUCTION ON PARALLEL PROCESSING
Project ARIES Advanced RAM Integration for Efficiency and Scalability
CC423: Advanced Computer Architecture ILP: Part V – Multiple Issue
RAW Scott J Weber Diagrams from and summary of:
EE 4xx: Computer Architecture and Performance Programming
COMS 361 Computer Organization
Computer Architecture
Advanced Topic: Alternative Architectures Chapter 9 Objectives
Husky Energy Chair in Oil and Gas Research
Presentation transcript:

A Common Machine Language for Communication-Exposed Architectures Bill Thies, Michal Karczmarek, Michael Gordon, David Maze and Saman Amarasinghe MIT Laboratory for Computer Science HPCA Work-in-Progress Session, February 2002

A Common Machine Language for Communication-Exposed Architectures Language Designers Have Been Ignoring Architects Bill Thies, Michal Karczmarek, Michael Gordon, David Maze and Saman Amarasinghe MIT Laboratory for Computer Science HPCA Work-in-Progress Session, February 2002

Back in The Good Old Days… Architecture: simple von-Neumann “Common Machine Language”: C Abstracts away idiosyncratic differences Instruction set • Pipeline depth Cache configuration • Register layout Exposes common properties Program counter • Arithmetic instructions Monolithic memory Efficient implementations on many machines Portable: everyone uses it

Programming Language Evolution

Programming Language Evolution

Languages Have Not Kept Up C von-Neumann machine Modern architecture Two choices: Develop cool architecture with complicated, ad-hoc language Bend over backwards to support old languages like C/C++ Two choices: Develop cool architecture with complicated, ad-hoc language Bend over backwards to support old languages like C/C++ Two choices: Develop cool architecture with complicated, ad-hoc language Bend over backwards to support old languages like C/C++

Evidence: Superscalars Huge effort into improving performance of sequential instruction stream Complexity has grown unmanageable Even with 1 billion transistors on a chip, what more can be done? Speculative Execution Prefetching Branch Prediction Renaming Out-of-Order Execution Pipelining Value Prediction

A New Era of Architectures Facing new design parameters Transistors are in excess Wire delays will dominate “Communication-exposed” architectures Explicitly parallel hardware Compiler-controlled communication e.g. RAW, Smart Memories, TRIPS, Imagine, the Grid Processor, Blue Gene

A New Common Machine Language Should expose shared properties: Explicit parallelism (multiple program counters) Regular communication patterns Distributed memory banks No global clock Should expose shared properties: Explicit parallelism (multiple program counters) Regular communication patterns Should hide small differences: Granularity of computation elements Topology of network interconnect Interface to memory units C does not qualify! 

The StreamIt Language A high-level language for communication-exposed architectures Computation is expressed as a hierarchical composition of independent filters

The StreamIt Language A high-level language for communication-exposed architectures Computation is expressed as a hierarchical composition of independent filters Features: High-bandwidth channels Low-bandwidth messaging Re-initialization

The StreamIt Compiler We have a compiler for a uniprocessor Performs comparably to C++ runtime system

The StreamIt Compiler We have a compiler for a uniprocessor Performs comparably to C++ runtime system Working on a backend for RAW Fission and fusion transformations Many optimizations in progress

The StreamIt Compiler We have a compiler for a uniprocessor Performs comparably to C++ runtime system Working on a backend for RAW Fission and fusion transformations Many optimizations in progress Goal: High-performance, portable language for communication-exposed architectures

For more information, see: http://cag.lcs.mit.edu/streamit/ Thank you!