CIS 570 Advanced Computer Systems University of Massachusetts Dartmouth Instructor: Dr. Michael Geiger Fall 2008 Lecture 1: Fundamentals of Computer Design.

Slides:



Advertisements
Similar presentations
Parallel Programming and Algorithms : A Primer Kishore Kothapalli IIIT-H Workshop on Multi-core Technologies International Institute.
Advertisements

© 2006 Edward F. Gehringer ECE 463/521 Lecture Notes, Spring 2006 Lecture 1 An Overview of High-Performance Computer Architecture ECE 463/521 Spring 2006.
CPE 731 Advanced Computer Architecture ILP: Part V – Multiple Issue Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of.
CEG3420 L1 Intro.1 Copyright (C) 1998 UCB CEG3420 Computer Design Lecture 1 Philip Leong.
COE 502 / CSE 661 Parallel and Vector Architectures Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals.
Why GPUs? Robert Strzodka. 2Overview Computation / Bandwidth / Power CPU – GPU Comparison GPU Characteristics.
CMSC 411 Computer Systems Architecture Lecture 1 Computer Architecture at Crossroads Instructor: Anwar Mamat Slides from Alan Sussman, Pete Keleher, Chau-Wen.
CSE 490/590 Computer Architecture Introduction
CS 136, Advanced Architecture Class Introduction.
Chapter1 Fundamental of Computer Design Dr. Bernard Chen Ph.D. University of Central Arkansas.
Ch1. Fundamentals of Computer Design 3. Principles (5) ECE562/468 Advanced Computer Architecture Prof. Honggang Wang ECE Department University of Massachusetts.
ECE 462/562 Computer Architecture and Design
©UCB CS 162 Computer Architecture Lecture 1 Instructor: L.N. Bhuyan
1 Roman Japanese Chinese (compute in hex?). 2 COMP 206: Computer Architecture and Implementation Montek Singh Thu, Jan 22, 2009 Lecture 3: Quantitative.
1 Introduction Background: CS 3810 or equivalent, based on Hennessy and Patterson’s Computer Organization and Design Text for CS/EE 6810: Hennessy and.
EET 4250: Chapter 1 Performance Measurement, Instruction Count & CPI Acknowledgements: Some slides and lecture notes for this course adapted from Prof.
CPSC 614 Computer Architecture Lec 1 - Introduction E. J. Kim Dept. of Computer Science Texas A&M University
1  2004 Morgan Kaufmann Publishers Chapter Seven.
1 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value is stored as a charge.
CPE 731 Advanced Computer Architecture Multiprocessor Introduction
CS3350B Computer Architecture Winter 2015 Introduction Marc Moreno Maza
Slide 1 Instructor: Dr. Hong Jiang Teaching Assistant: Mr. Sheng Zhang Department of Computer Science & Engineering University of Nebraska-Lincoln Classroom:
1 CSE SUNY New Paltz Chapter Seven Exploiting Memory Hierarchy.
Introduction to Computer Architecture SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING SUMMER 2015 RAMYAR SAEEDI.
Single-Chip Multi-Processors (CMP) PRADEEP DANDAMUDI 1 ELEC , Fall 08.
1 Instant replay  The semester was split into roughly four parts. —The 1st quarter covered instruction set architectures—the connection between software.
Chapter1 Fundamental of Computer Design Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.
EET 4250: Chapter 1 Computer Abstractions and Technology Acknowledgements: Some slides and lecture notes for this course adapted from Prof. Mary Jane Irwin.
Lecture 1: Performance EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2013, Dr. Rozier.
Sogang University Advanced Computing System Chap 1. Computer Architecture Hyuk-Jun Lee, PhD Dept. of Computer Science and Engineering Sogang University.
COSC 3330/6308 Computer Architecture Jehan-François Pâris
CS/ECE 3330 Computer Architecture Kim Hazelwood Fall 2009.
October 13, 2015CS232 Summary1 Instant replay  The semester was split into roughly four parts. —The 1st quarter covered instruction set architectures—the.
Eng. Mohammed Timraz Electronics & Communication Engineer University of Palestine Faculty of Engineering and Urban planning Software Engineering Department.
C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Sections 1.5 – 1.11.
CS 5513: Computer Architecture Lecture 1: Introduction Daniel A. Jiménez The University of Texas at San Antonio
CS1104 – Computer Organization PART 2: Computer Architecture Lecture 12 Overview and Concluding Remarks.
Computer Architecture Lec 1 - Introduction. 01/19/10Lec 01-intro 2 Outline Computer Science at a Crossroads Computer Architecture v. Instruction Set Arch.
EEL5708/Bölöni Lec 4.1 Fall 2004 September 10, 2004 Lotzi Bölöni EEL 5708 High Performance Computer Architecture Review: Memory Hierarchy.
Ted Pedersen – CS 3011 – Chapter 10 1 A brief history of computer architectures CISC – complex instruction set computing –Intel x86, VAX –Evolved from.
Lecture 0. Course Introduction Prof. Taeweon Suh Computer Science Education Korea University COM609 Topics in Embedded Systems.
CS 3853/3851: Computer Architecture Lecture 1: Introduction Daniel A. Jiménez The University of Texas at San Antonio
Pipelining and Parallelism Mark Staveley
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
Dept. of Computer Science - CS6461 Computer Architecture CS6461 – Computer Architecture Fall 2015 Lecture 1 – Introduction Adopted from Professor Stephen.
1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.
Lecture 0. Course Introduction Prof. Taeweon Suh Computer Science Education Korea University COM515 Advanced Computer Architecture.
1 Introduction Outline Computer Science at a Crossroads Computer Architecture v. Instruction Set Arch. What Computer Architecture brings to table.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
1  1998 Morgan Kaufmann Publishers Chapter Seven.
BCS361: Computer Architecture I/O Devices. 2 Input/Output CPU Cache Bus MemoryDiskNetworkUSBDVD …
Computer Organization Yasser F. O. Mohammad 1. 2 Lecture 1: Introduction Today’s topics:  Why computer organization is important  Logistics  Modern.
Introduction Computer Organization Spring 1436/37H (2015/16G) Dr. Mohammed Sinky Computer Architecture
1 Chapter Seven. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value.
Lecture 0. Course Introduction Prof. Taeweon Suh Computer Science Education Korea University COM515 Advanced Computer Architecture.
Slide 1 Instructor: Dr. Hong Jiang Teaching Assistant: Ms. Yuanyuan Lu Department of Computer Science & Engineering University of Nebraska-Lincoln Classroom:
VU-Advanced Computer Architecture Lecture 1-Introduction 1 Advanced Computer Architecture CS 704 Advanced Computer Architecture Lecture 1.
Ch1. Fundamentals of Computer Design 3. Principles (5)
Chapter1 Fundamental of Computer Design
CSCE 614 Computer Architecture Lec 1 - Introduction
Instant replay The semester was split into roughly four parts.
COSC 3406: Computer Organization
/ Computer Architecture and Design
CSE 520 Advanced Computer Architecture Lec 2 - Introduction
T Computer Architecture, Autumn 2005
CS/EE 6810: Computer Architecture
Overview Prof. Eric Rotenberg
Mattan Erez The University of Texas at Austin
The University of Adelaide, School of Computer Science
Lecture 1 Class Overview
Presentation transcript:

CIS 570 Advanced Computer Systems University of Massachusetts Dartmouth Instructor: Dr. Michael Geiger Fall 2008 Lecture 1: Fundamentals of Computer Design

9/3/08 M. Geiger CIS 570 Lec. 1 2 Outline Syllabus & course policies Changes in computer architecture What is computer architecture? Design principles

9/3/08 M. Geiger CIS 570 Lec. 1 3 Syllabus notes Course web site (still under construction): cis570/f08.htm TA: To be determined My info:  Office: Science & Engineering, 221C  Office hours: M 1:30-2:30, T 2-3:30, Th 2:30-4  Course text: Hennessy & Patterson’s Computer Architecture: A Quantitative Approach, 4 th ed.

9/3/08 M. Geiger CIS 570 Lec. 1 4 Course objectives To understand the operation of modern microprocessors at an architectural level. To understand the operation of memory and I/O subsystems and their relation to overall system performance. To understand the benefits of multiprocessor systems and the difficulties in designing and utilizing them. To gain familiarity with simulation techniques used in research in computer architecture.

9/3/08 M. Geiger CIS 570 Lec. 1 5 Course policies Prereqs: CIS 273 & 370 or equivalent Academic honesty  All work individual unless explicitly stated otherwise (e.g., final projects)  May discuss concepts (e.g., how does Tomasulo’s algorithm work) but not solutions  Plagiarism is also considered cheating  Any assignment or portion of an assignment violating this policy will receive a grade of 0  More severe or repeat infractions may incur additional penalties, up to and including a failing grade in the class

9/3/08 M. Geiger CIS 570 Lec. 1 6 Grading policies Assignment breakdown:  Problem sets: 20%  Simulation exercises: 10%  Research project (including report & presentation): 20%  Midterm exam: 15%  Final exam: 25%  Quizzes & participation: 10% Late assignments: 10% per day

9/3/08 M. Geiger CIS 570 Lec. 1 7 Topic schedule Computer design fundamentals Basic ISA review Architectural simulation Uniprocessor systems  Advanced pipelining—exploiting ILP & TLP  Memory hierarchy design  Storage & I/O Multiprocessor systems  Memory in multiprocessors  Synchronization  Interconnection networks

9/3/08 M. Geiger CIS 570 Lec. 1 8 Changes in computer architecture Old Conventional Wisdom: Power is free, Transistors expensive New Conventional Wisdom: “Power wall” Power expensive, Xtors free (Can put more on chip than can afford to turn on) Old CW: Sufficiently increasing Instruction Level Parallelism via compilers, innovation (Out-of-order, speculation, VLIW, …) New CW: “ILP wall” law of diminishing returns on more HW for ILP Old CW: Multiplies are slow, Memory access is fast New CW: “Memory wall” Memory slow, multiplies fast (200 clock cycles to DRAM memory, 4 clocks for multiply) Old CW: Uniprocessor performance 2X / 1.5 yrs New CW: Power Wall + ILP Wall + Memory Wall = Brick Wall  Uniprocessor performance now 2X / 5(?) yrs  Sea change in chip design: multiple “cores” (2X processors per chip / ~ 2 years) More simpler processors are more power efficient

9/3/08 M. Geiger CIS 570 Lec. 1 9 Uniprocessor performance From Hennessy and Patterson, Computer Architecture: A Quantitative Approach, 4th edition, October, 2006 VAX : 25%/year 1978 to 1986 RISC + x86: 52%/year 1986 to 2002 RISC + x86: ??%/year 2002 to present

9/3/08 M. Geiger CIS 570 Lec Chip design changes Intel 4004 (1971): 4-bit processor, 2312 transistors, 0.4 MHz, 10 micron PMOS, 11 mm 2 chip RISC II (1983): 32-bit, 5 stage pipeline, 40,760 transistors, 3 MHz, 3 micron NMOS, 60 mm 2 chip 125 mm 2 chip, micron CMOS = 2312 RISC II+FPU+Icache+Dcache

9/3/08 M. Geiger CIS 570 Lec From ILP to TLP & DLP (Almost) All microprocessor companies moving to multiprocessor systems  Embedded domain is the lone holdout Single processors gain performance by exploiting instruction level parallelism (ILP) Multiprocessors exploit either:  Thread level parallelism (TLP), or  Data level parallelism (DLP) What’s the problem?

9/3/08 M. Geiger CIS 570 Lec From ILP to TLP & DLP (cont.) We’ve got tons of infrastructure for single-processor systems  Algorithms, languages, compilers, operating systems, architectures, etc.  These don’t exactly scale well Multiprocessor design: not as simple as creating a chip with 1000 CPUs  Task scheduling/division  Communication  Memory issues  Even programming  moving from 1 to 2 CPUs is extremely difficult Not strictly computer architecture, but it can’t happen without architects

9/3/08 M. Geiger CIS 570 Lec CIS 570 Approach How are we going to address this change?  Start by going through single-processor systems Study ILP and ways to exploit that Delve into memory hierarchies for single processors Talk about storage and I/O systems We may touch on embedded systems at this point  Then, we’ll look at multiprocessor systems Discuss TLP and DLP Talk about how multiprocessors affect memory design Cover interconnection networks

9/3/08 M. Geiger CIS 570 Lec What is computer architecture? Classical view: instruction set architecture (ISA)  Boundary between hardware and software  Provides abstraction at both high level and low level instruction set software hardware

9/3/08 M. Geiger CIS 570 Lec ISA vs. Computer Architecture Modern issues aren’t in instruction set design  “Architecture is dead” … or is it? Computer architecture now encompasses a larger range of technical issues Modern view: ISA + design of computer organization & hardware to meet goals and functional requirements  Organization: high-level view of system  Hardware: specifics of a given system Function of complete system now the issue

9/3/08 M. Geiger CIS 570 Lec The roles of computer architecture … as David Patterson sees it, anyway Other fields borrow ideas from architecture Anticipate and exploit advances in technology Develop well-defined, thoroughly tested interfaces Quantitative comparisons to determine when goals are reached Quantitative principles of design

9/3/08 M. Geiger CIS 570 Lec Goals and requirements What goals might we want to meet?  Performance  Power  Price  Dependability We’ll talk about how to quantify these as needed throughout the semester  Primarily focus on performance (both uniprocessor & multiprocessor systems) and dependability (mostly storage systems)

9/3/08 M. Geiger CIS 570 Lec Design principles 1. Take advantage of parallelism 2. Principle of locality 3. Focus on the common case 4. Amdahl’s Law 5. Generalized processor performance

9/3/08 M. Geiger CIS 570 Lec Take advantage of parallelism Increasing throughput of server computer via multiple processors or multiple disks Detailed HW design  Carry lookahead adders uses parallelism to speed up computing sums from linear to logarithmic in number of bits per operand  Multiple memory banks searched in parallel in set-associative caches Pipelining: overlap instruction execution to reduce the total time to complete an instruction sequence.  Not every instruction depends on immediate predecessor  executing instructions completely/partially in parallel possible  Classic 5-stage pipeline: 1) Instruction Fetch (Ifetch), 2) Register Read (Reg), 3) Execute (ALU), 4) Data Memory Access (Dmem), 5) Register Write (Reg)

9/3/08 M. Geiger CIS 570 Lec Principle of locality The Principle of Locality:  Program access a relatively small portion of the address space at any instant of time. Two Different Types of Locality:  Temporal Locality (Locality in Time): If an item is referenced, it will tend to be referenced again soon (e.g., loops, reuse)  Spatial Locality (Locality in Space): If an item is referenced, items whose addresses are close by tend to be referenced soon (e.g., straight-line code, array access) Last 30 years, HW relied on locality for memory perf.  Guiding principle behind caches To some degree, guides instruction execution, too (90/10 rule) P MEM $

9/3/08 M. Geiger CIS 570 Lec Focus on the common case In making a design trade-off, favor the frequent case over the infrequent case  E.g., Instruction fetch and decode unit used more frequently than multiplier, so optimize it 1st  E.g., If database server has 50 disks / processor, storage dependability dominates system dependability, so optimize it 1st Frequent case is often simpler and can be done faster than the infrequent case  E.g., overflow is rare when adding 2 numbers, so improve performance by optimizing more common case of no overflow  May slow down overflow, but overall performance improved by optimizing for the normal case What is frequent case and how much performance improved by making case faster => Amdahl’s Law

9/3/08 M. Geiger CIS 570 Lec Amdahl’s Law Best you could ever hope to do:

9/3/08 M. Geiger CIS 570 Lec Processor performance CPU time= Seconds = Instructions x Cycles x Seconds Program Program Instruction Cycle CPU time= Seconds = Instructions x Cycles x Seconds Program Program Instruction Cycle Inst Count CPIClock Rate Program X Compiler X (X) Inst. Set. X X Organization X X Technology X

9/3/08 M. Geiger CIS 570 Lec Next week Review of ISAs (Appendix B) Review of pipelining basics (Appendix A) Discussion of architectural simulation

9/3/08 M. Geiger CIS 570 Lec Acknowledgements This lecture borrows heavily from David Patterson’s lecture slides for EECS 252: Graduate Computer Architecture, at the University of California, Berkeley Many figures and other information are taken from Hennessy and Patterson, Computer Architecture: A Quantitative Approach, 4 th ed unless otherwise noted