Enhancing Commodity Scalar Processors with Vector Components for Increased Scientific Productivity.

Slides:



Advertisements
Similar presentations
The Embedded Compression Dump OCI Investigations.
Advertisements

DSPs Vs General Purpose Microprocessors
Parallel computer architecture classification
Tuan Tran. What is CISC? CISC stands for Complex Instruction Set Computer. CISC are chips that are easy to program and which make efficient use of memory.
Vector Processing. Vector Processors Combine vector operands (inputs) element by element to produce an output vector. Typical array-oriented operations.
Parallell Processing Systems1 Chapter 4 Vector Processors.
Computer Architecture & Organization
University of Michigan Electrical Engineering and Computer Science MacroSS: Macro-SIMDization of Streaming Applications Amir Hormati*, Yoonseo Choi ‡,
Trident Processor A Scalable Architecture for Scalar, Vector, and Matrix Operations Trident Processor A Scalable Architecture for Scalar, Vector, and Matrix.
Introduction to ARM Architecture, Programmer’s Model and Assembler Embedded Systems Programming.
State Machines Timing Computer Bus Computer Performance Instruction Set Architectures RISC / CISC Machines.
Compilation, Architectural Support, and Evaluation of SIMD Graphics Pipeline Programs on a General-Purpose CPU Mauricio Breternitz Jr, Herbert Hum, Sanjeev.
CS 7810 Lecture 24 The Cell Processor H. Peter Hofstee Proceedings of HPCA-11 February 2005.
Embedded Systems Programming
Educational Computer Architecture Experimentation Tool Dr. Abdelhafid Bouhraoua.
Dr. Abdel-Rahman Al-Qawasmi
Prardiva Mangilipally
February 12, 1998 Aman Sareen DPGA-Coupled Microprocessors Commodity IC’s for the Early 21st Century by Aman Sareen School of Electrical Engineering and.
2007 Sept 06SYSC 2001* - Fall SYSC2001-Ch1.ppt1 Computer Architecture & Organization  Instruction set, number of bits used for data representation,
Cs 152 L1 Intro.1 Patterson Fall 97 ©UCB What is “Computer Architecture” Computer Architecture = Instruction Set Architecture + Machine Organization.
Feb. 19, 2008 Multicore Processor Technology and Managing Contention for Shared Resource Cong Zhao Yixing Li.
1 Advance Computer Architecture CSE 8383 Ranya Alawadhi.
Parallel Processing - introduction  Traditionally, the computer has been viewed as a sequential machine. This view of the computer has never been entirely.
History of Microprocessor MPIntroductionData BusAddress Bus
BUS IN MICROPROCESSOR. Topics to discuss Bus Interface ISA VESA local PCI Plug and Play.
ARM for Wireless Applications ARM11 Microarchitecture On the ARMv6 Connie Wang.
Brent Gorda LBNL – SOS7 3/5/03 1 Planned Machines: BluePlanet SOS7 March 5, 2003 Brent Gorda Future Technologies Group Lawrence Berkeley.
M. Mateen Yaqoob The University of Lahore Spring 2014.
Parallel Computing.
ECEG-3202 Computer Architecture and Organization Chapter 7 Reduced Instruction Set Computers.
Pipelining and Parallelism Mark Staveley
Architecture of Microprocessor
By Boxed Economy Foundation Model Toward Simulation Platform for Agent-Based Economic Simulations.
B O N N E V I L L E P O W E R A D M I N I S T R A T I O N Page 1 Pacific Northwest Smart Grid Demonstration Project  Largest Smart Grid Demonstration.
Vector and symbolic processors
B5: Exascale Hardware. Capability Requirements Several different requirements –Exaflops/Exascale single application –Ensembles of Petaflop apps requiring.
Fundamentals of Programming Languages-II
Reduced Instruction Set Computing Ammi Blankrot April 26, 2011 (RISC)
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
CDA-5155 Computer Architecture Principles Fall 2000 Multiprocessor Architectures.
Lecture 1: Introduction CprE 585 Advanced Computer Architecture, Fall 2004 Zhao Zhang.
SPRING 2012 Assembly Language. Definition 2 A microprocessor is a silicon chip which forms the core of a microcomputer the concept of what goes into a.
Computer Organization CS345 David Monismith Based upon notes by Dr. Bill Siever and from the Patterson and Hennessy Text.
Chapter Overview General Concepts IA-32 Processor Architecture
Computer Organization and Architecture Lecture 1 : Introduction
Overview Motivation (Kevin) Thermal issues (Kevin)
Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Parallel Hardware Dr. Xiao Qin Auburn.
Topics to be covered Instruction Execution Characteristics
Bus Systems ISA PCI AGP.
ARM.
ECE354 Embedded Systems Introduction C Andras Moritz.
Visit for more Learning Resources
Parallel computer architecture classification
Parallel Processing - introduction
Embedded Systems Design
5.2 Eleven Advanced Optimizations of Cache Performance
COMP4211 : Advance Computer Architecture
Michael Bedford Taylor, Walter Lee, Saman Amarasinghe, Anant Agarwal
A Review of Processor Design Flow
Array Processor.
64 BIT COMPUTING By: Kapil Kaushik VIII Sesmester(IT)
Ken Barr, Ken Conley and Serhii Zhak Checkpoint I: October 19, 2000
Introduction and History of Cray Supercomputers
ARM.
What is Computer Architecture?
Introduction to Microprocessor Programming
Overview Prof. Eric Rotenberg
A microprocessor into a memory chip Dave Patterson, Berkeley, 1997
CSE 502: Computer Architecture
CSE378 Introduction to Machine Organization
Presentation transcript:

Enhancing Commodity Scalar Processors with Vector Components for Increased Scientific Productivity

Vectors on Commodity Components Find the minimum set of vector-derived modifications to commodity micros to improve efficiency Deconstructing Vectors (segregate features of vector architecture) –ISA, memory BW, addressing modes, vector regsisters Commodity market has focused on increasing peak flops with SIMD/Vector-like features –High peak flops for little space on silicon, but hard to keep fed with operands –Does not improve efficiency for scientific applications High end (IBM Power series) demonstrates that high memory bandwidth (in bytes/flop) is not an exclusive feature of vectors Efficient utilization of BW requires deep pipelining of memory requests (a natural ability for vectors register loads/stores) –Required by Little’s Law (eg. Power5 requires 3k of requests) –Shift focus from vector ISA (eg. SIMD) to vecregs & addressing modes –Samples include ViVA-2, PERCS programmable cache, and IBM Cell processor

Investigation Collect information on real DOE Scientific codes on Vector architectures –Evaluate where deconstructed features of vector arch. Benefits these codes (and where it fails) –Compare to results on microprocessors (particular interest in new processors that match vector Bytes/flop & Vector systems like X1e with lower bytes/flop than microprocessors) Develop parameterized architectural probes that mimic behavior of full codes –Allows us to run on architectural simulators and test systems to assess impact of new vector-like architectural features on scientific codes Work with vendor partners to develop arch features more suited to improving efficiency of scientific applications –Past work on Tera, ViRAM, IMAGINE, DIVA. –ViVA-2, PERCS, Sun HERO, Impulse –Move towards collaborations with industry to push science-driven advances in processor technology that can also leverage mainstream mass- market components (eg. IBM Power processor customization centers)