Compilers and Computers: Partners in Performance Fran Allen (IBM Fellow Emerita) T. J. Watson Research Center Yorktown Heights, NY 10598

Slides:



Advertisements
Similar presentations
COMP375 Computer Architecture and Organization Senior Review.
Advertisements

Computer Architecture Lecture 7 Compiler Considerations and Optimizations.
Computer Organization and Architecture
Microprocessors. Von Neumann architecture Data and instructions in single read/write memory Contents of memory addressable by location, independent of.
Computer Organization and Architecture
Avishai Wool lecture Introduction to Systems Programming Lecture 8 Input-Output.
CS 345 Computer System Overview
CMPT 300: Operating Systems I Dr. Mohamed Hefeeda
Operating Systems Input/Output Devices (Ch , 12.7; , 13.7)
1 School of Computing Science Simon Fraser University CMPT 300: Operating Systems I Dr. Mohamed Hefeeda.
Computer Organization and Architecture The CPU Structure.
Computational Astrophysics: Methodology 1.Identify astrophysical problem 2.Write down corresponding equations 3.Identify numerical algorithm 4.Find a computer.
CSCE 121, Sec 200, 507, 508 Fall 2010 Prof. Jennifer L. Welch.
Chapter 2: Impact of Machine Architectures What is the Relationship Between Programs, Programming Languages, and Computers.
Improving Code Generation Honors Compilers April 16 th 2002.
Compiler Optimization Overview
A. Frank - P. Weisberg Operating Systems Evolution of Operating Systems.
(6.1) Central Processing Unit Architecture  Architecture overview  Machine organization – von Neumann  Speeding up CPU operations – multiple registers.
Group 5 Alain J. Percial Paula A. Ortiz Francis X. Ruiz.
Reduced Instruction Set Computers (RISC) Computer Organization and Architecture.
CH12 CPU Structure and Function
Advanced Computer Architectures
C.S. Choy95 COMPUTER ORGANIZATION Logic Design Skill to design digital components JAVA Language Skill to program a computer Computer Organization Skill.
Computers Central Processor Unit. Basic Computer System MAIN MEMORY ALUCNTL..... BUS CONTROLLER Processor I/O moduleInterconnections BUS Memory.
Contact Information Office: 225 Neville Hall Office Hours: Monday and Wednesday 12:00-1:00 and by appointment.
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Principles of I/0 hardware.
Rensselaer Polytechnic Institute CSCI-4210 – Operating Systems CSCI-6140 – Computer Operating Systems David Goldschmidt, Ph.D.
Computers organization & Assembly Language Chapter 0 INTRODUCTION TO COMPUTING Basic Concepts.
Presented by: Sergio Ospina Qing Gao. Contents ♦ 12.1 Processor Organization ♦ 12.2 Register Organization ♦ 12.3 Instruction Cycle ♦ 12.4 Instruction.
Contact Information Office: 225 Neville Hall Office Hours: Monday and Wednesday 12:00-1:00 E-Main: Phone:
Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.
Advanced Computer Architecture 0 Lecture # 1 Introduction by Husnain Sherazi.
Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors.
Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.
Super computers Parallel Processing By Lecturer: Aisha Dawood.
Operating Systems David Goldschmidt, Ph.D. Computer Science The College of Saint Rose CIS 432.
IBM System/360 Matt Babaian Nathan Clark Paul DesRoches Jefferson Miner Tara Sodano.
CPS 4150 Computer Organization Fall 2006 Ching-Song Don Wei.
Stored Programs In today’s lesson, we will look at: what we mean by a stored program computer how computers store and run programs what we mean by the.
CIS250 OPERATING SYSTEMS Chapter One Introduction.
Processor Structure and Function Chapter8:. CPU Structure  CPU must:  Fetch instructions –Read instruction from memory  Interpret instructions –Instruction.
1 Lecture 1: Computer System Structures We go over the aspects of computer architecture relevant to OS design  overview  input and output (I/O) organization.
Von Neumann Computers Article Authors: Rudolf Eigenman & David Lilja
Jan. 5, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 1: Overview of High Performance Processors * Jeremy R. Johnson Wed. Sept. 27,
Generations of Computing. The Computer Era Begins: The First Generation  1950s: First Generation for hardware and software Vacuum tubes worked as memory.
High Performance Computing1 High Performance Computing (CS 680) Lecture 2a: Overview of High Performance Processors * Jeremy R. Johnson *This lecture was.
Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.
Lecture 1: Introduction CprE 585 Advanced Computer Architecture, Fall 2004 Zhao Zhang.
Hello world !!! ASCII representation of hello.c.
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Operating Systems Overview: Using Hardware.
CPU (Central Processing Unit). The CPU is the brain of the computer. Sometimes referred to simply as the processor or central processor, the CPU is where.
Operating System Concepts with Java – 7 th Edition, Nov 15, 2006 Silberschatz, Galvin and Gagne ©2007 Chapter 0: Historical Overview.
Topics to be covered Instruction Execution Characteristics
Code Optimization.
Central Processing Unit Architecture
Optimization Code Optimization ©SoftMoore Consulting.
Overview Introduction General Register Organization Stack Organization
5.2 Eleven Advanced Optimizations of Cache Performance
Architecture Background
Computer Organization
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
* From AMD 1996 Publication #18522 Revision E
Virtual Memory Overcoming main memory size limitation
Introduction to Microprocessor Programming
Chapter 12 Pipelining and RISC
Chapter-1 Computer is an advanced electronic device that takes raw data as an input from the user and processes it under the control of a set of instructions.
Lecture 4: Instruction Set Design/Pipelining
A Level Computer Science Topic 5: Computer Architecture and Assembly
Contact Information Office: 225 Neville Hall Office Hours: Monday and Wednesday 12:00-1:00 and by appointment. Phone:
Presentation transcript:

Compilers and Computers: Partners in Performance Fran Allen (IBM Fellow Emerita) T. J. Watson Research Center Yorktown Heights, NY

Components of Performance Latency Reduction Data and Instructions in the right place at the right time Fast Computations Concurrency

Talk will Cover Three high performance systems: : Stretch-Harvest System : Advanced Computing System : 801 System (RISC) And how the Compiler-Computer partnership for performance evolved

Some Context : Fortran I : Stretch-Harvest System : Advanced Computing System : 801 System (RISC)

FORTRAN I Spectacular object code!! Some features: Formal parsing techniques (beginnings) Intermediate language form for optimization Control flow graphs Common sub-expression elimination Generalized register allocation - for only 3 registers!

Some Context : Fortran I : Stretch-Harvest System : Advanced Computing System : 801 System (RISC)

Stretch ( ) Goal: 100 x faster than 704 Main performance limitation (identified in 1955): Memory Access Time

Stretch Memory 1Mb magnetic core memory Memory word lengths: 64 bits + 8 check bits Memory organization: 8 core storage units of 16K words each addresses interleaved across units each unit independently connected via memory bus unit to cpu, I/O, disk 2.1 us cycle time per unit Up to 6 storage accesses could be underway at the same time!!!!

Stretch Concurrency Instruction Lookahead up to 11 successive instructions executing in cpu at the same time lookahead unit of virtual registers buffered instructions and data between memory and cpu elaborate backout system to assure sequential consistency when interrupted Pipelining

Stretch Concurrency (cont’d) Overlapped storage references I/O and disk operations Multiprogramming to compensate for slow I/O not shipped due to schedule

A Few Other Stretch Innovations Generalized interrupt system Memory protection Bytes [8 bits] Variable word length operands Multiple forms of floating point arithmetic Coupling two computers to a single memory

A Programmer's Dream 735 instructions (including modes) Bit addressable List walk took 2 instructions Multiple modes of arithmetic Registers and control functions part of addressable memory Word-level storage protection traps

A Compiler Writer's Nightmare! Too many ways of doing the same thing FORTRAN could not use some features, e.g.multiple forms of arithmetic Organizing storage Scheduling instructions Etc……

Compiler as Part of the System A Stretch Objective: "The objective of economic efficiency was understood to imply minimizing the cost of answers, not just the cost of hardware.... A consequent objective was to make programming easier -- not necessarily for trivial problems but for problems worthy of the computer, problems whose coding in machine language would usually be generated automatically by a compiler from statements in the user's language." Fred Brooks in "Planning a Computer System", 1962

HARVEST ( ) Hosted by Stretch Designed for NSA for code breaking Streaming data computation model Seven instructions, e.g. Sort, SBBB Unbounded single operation times Only system with balanced I/O and computational speeds (per conversation with Jim Pomerene 11/00)

Harvest System

Harvest Streaming Unit

Alpha Language for Harvest Language for cryptologists Alphabet definition capability Result descriptors by implication Matched the Harvest instructions but hid all details.

Stretch-Harvest Compiler

Stretch Retrospective Stretch machine missed 100 x goal Stretch compiler for Fortran replaced with simpler, faster compiler But “Stretch defined the limits of the possible for later generations of computer designers and users.” (Dag Spicer - Curator Computer History Museum)

Harvest Retrospective Harvest heavily used for 14 years Hardware was very successful ALPHA and Compiler use is unknown Nov NSA report: “Recently HARVEST scanned 7,075,315 messages of approximately 500 characters each -- examining every possible offset -- to see if they contained any of 7,000 different words or phrases. This... required three hours and 50 minutes to complete -- an average of over 30,000 messages a minute.”

Next Step : Fortran I : Stretch-Harvest System : Advanced Computing System : 801 System (RISC)

What We Had Learned Design compiler & computer together No instructions the compiler can’t use Keep the instruction set simple Keep the compiler simple A lot about building compilers and computers

ACS System ( ) Goal: To build the fastest scientific computer feasible Compiler built early to drive hardware design

ACS Computer ( ) Single instruction counter Superscalar: up to 7 ops per cycle; Pipelined Branch prediction > 50 insts in execution concurrently Programmable condition codes

ACS Compiler Early design used to establish: Branch prediction strategies Performance bottlenecks Instruction scheduling techniques Code was sometimes faster than the best handcode

Some Compiler Results A theoretical basis for program analysis and optimization A Catalogue of Optimizations including: Procedure integration Loop transformations: unrolling, jamming, unswitching Redundant subexpression elimination, code motion, constant folding, dead code elimination, strength reduction, linear function test replacement, carry optimization, anchor pointing Instruction scheduling Register allocation

ACS ACS was cancelled in May 1968 Too costly, too big, …..

Next Step : Fortran I : Stretch-Harvest System : Advanced Computing System : 801 System (RISC)

The 801 System (RISC) Goal: High Performance and Low Cost Simple, 1-cycle instructions Hardware, compiler, and new programming language (PL.8) developed simultaneously

Some 801 Project Results Graph coloring register allocator Influenced Berkeley RISC Project IBM’s Power PC family of computers Optimizer became the core of IBM’s XL family of retargetable compilers for multiple source languages.

More Steps : Fortran I : Stretch-Harvest System : Advanced Computing System : 801 System (RISC) Parallel Systems C Java Etc…..

Challenges The memory wall is getting worse Caches Locality Parallelism Programming languages Compilers for the New Millenium!

BACKUP CHARTS

Stretch Machine speed: ~ 500 KIPS (code dependent) base machine cycle: 300 ns (3.3MHz) transistors: 169,200 disk: 2MW and 8Mbps tape drives: 12 *IBM 729 IV units card reader: 1000 cpm printer: 600 lpm card punch: 250 cpm cpu size: 900 sq. ft. (30 x 6 x 5) cpu power req: 21 KW total size: 2,500 sq. ft. weight: 40,000 lbs.

Tractor Tape for Harvest Tape library system with automatic retrival & storage 3 Cartridge handling units 160 cartridges/handler 90 million characters/tape 1,128,000 characters/sec transfer rate Automatic checking and error correction The 3 cartridge handlers could execute in parallel

Tractor Tape The max read/write transfer rate was > 3.3MB/sec, matching the rate of the streaming unit!