Download presentation
Presentation is loading. Please wait.
Published byDulcie Hamilton Modified over 9 years ago
1
Lecture 1 1 CS 352H: Computer Systems Architecture Lecture 1: What is Computer Architecture and why should I care? Professor Emmett Witchel University of Texas at Austin witchel@cs.utexas.edu
2
Lecture 1 2 Goals Understand the “how” and “why” of computer system organization –Instruction Set Architecture –System Organization (processor, memory, I/O) –Microarchitecture –Virtualization Learn methods of evaluating performance –Metrics & benchmarks Learn how to make systems go fast –Pipelining, caching –Parallelism (ILP, DLP, TLP) –Application specific architectures (graphics, signal proc.) Preview of where architecture is heading
3
Lecture 1 3 Logistics LecturesT/Th 12:30-2:00pm, PAI 3.14 InstructorProf. Emmett Witchel, W 1:15-2:15 TAShalini Sahoo MW 11:30-1:00pm PAI 5.38 Desk1 Gradingsee web page TextsHennessy & Patterson, Computer Organization and Design (Fourth Edition) Including CD Revised Fourth Edition preferred, not required
4
Lecture 1 4 CS352H Online URL: www.cs.utexas.edu/users/witchel/CS352Hwww.cs.utexas.edu/users/witchel/CS352H I will occasionally email you via blackboard and by your registered email address. I expect this channel to be reliable and timely. discussion group: via blackboard login at courses.utexas.edu General, Homeworks, Project Computer Architecture Seminar Series: www.cs.utexas.edu/users/cart/arch
5
Lecture 1 5 Assignment for Next Tuesday Turn in student survey forms, if you want Read the Moore paper (see webpage) –Write a review of 1/2-1 page (see syllabus) –Review should include Summary of content of paper Your observations on the most interesting/important aspects Your observations on its relevance today –Be prepared to discuss on Tuesday in class
6
Discussion Are you interested in taking this course? One question about computer science One question about computer architecture CS352H Fall 2007 Lecture 1 6
7
7 Specification Program ISA (Instruction Set Architecture) microArchitecture Logic Transistors Physics/Chemistry compute the fibonacci sequence for(i=2; i<100; i++) { a[i] = a[i-1]+a[i-2];} load r1, a[i]; add r2, r2, r1; registers A B S F G D S G S D Arch vs. µarch
8
Lecture 1 8 CS352H Topics Technology Trends Instruction set architectures Pipelining Modern pipelined architectures –Dynamic ILP machines –Static ILP machines Cache memory systems Virtual memory Multiprocessors Computer system implementation
9
Making This Class Work For You Plus and minus grades Clickers CS352H Fall 2007 Lecture 1 9
10
10 What is Computer Architecture? Technology Applications Computer Architect Interfaces Machine Organization Measurement & Evaluation ISAAPI Link I/O Chan Regs IR
11
Lecture 1 11 Technology Constraints Yearly improvement –Semiconductor technology 60% more devices per chip (doubles every 18 months) 15% faster devices (doubles every 5 years) Slower wires –Magnetic Disks 60% increase in density –Circuit boards 5% increase in wire density –Cables no change 1998 1995 1992 1989 >100x more devices since 1989 10x faster devices 2002 2006 90nm130nm 1000nm 800nm 350nm 250nm
12
Lecture 1 12 Changing Technology leads to Changing Architecture 1970s –multi-chip CPUs –semiconductor memory very expensive –microcoded control –complex instruction sets (good code density) 1980s –single-chip CPUs, on-chip RAM feasible –simple, hard-wired control –simple instruction sets –small on-chip caches 1990s –lots of transistors –complex control to exploit instruction-level parallelism 2000s –even more transistors –Power wall –Transition to CMPs –Multi-level caches 2010s –Embedded vs. Desktop vs. Data center (cloud) –New storage (PCM, flash) –Simpler cores and lots of them –Optimizing for power
13
Lecture 1 13 Intel 4004 - 1971 The first microprocessor 2,300 transistors 108 KHz 10 m process
14
Lecture 1 14 Some Recent Chips! Intel Pentium IV 42 million transistors 4GHz 0.13 m process Could fit ~15,000 4004s on this chip! NVidia - GeForce 6800 222 million transistors 400MHz 0.13 m process Intel Itanium II (Montecito) 1.7 billion transistors 1.6 GHz 90nm process IBM Cell 8 vector processors + 1 PPC 4 GHz 90nm process Intel’s net revenue was around $35 billion a year for most of the aughts R&D about $5 billion a year
15
CS352H Fall 2007 Lecture 1 15 Any Architecture You Want (as long as it is x86)
16
Lecture 1 16 Application Constraints Applications drive machine ‘balance’ –Numerical simulations floating-point performance main memory bandwidth –Transaction processing I/Os per second integer CPU performance –Decision support I/O bandwidth –Embedded control I/O timing, power –Media processing low-precision ‘pixel’ arithmetic
17
Lecture 1 17 Application-Driven Architectures General purpose - good performance on “all” programs –x86 family, ARM, powerPC, etc. Application specificity can focus on: –Types of concurrency available –Domain of deployment (server, handheld, desktop) Today - overview of graphics processors –Interface (instruction set architecture - ISA) –Processor organization –Concurrent elements
18
Apple’s iPad/iPhone4 Powered by A4 Chip A4 is modified ARM Cortex run at 1GHz –Integrated processor, graphics, memory controller Among other claims, ARM says the processors gets a near "25 percent processing power boost, even at same processor speed, from the use of a new instruction pipelining system." –We will cover pipelining in this class. Claim: 10 hours of 1024x768 video at 25W Let’s look at the Freescale i.MX51 CS352H Fall 2007 Lecture 1 18
19
Performance: Latency and Throughput Latency: time to complete an operation Throughput: work completed per unit time Consider plumbing –Low latency: turn on faucet and water comes out –High bandwidth: lots of water (e.g., to fill a pool) What is “High speed Internet?” –Low latency: needed to interactive gaming –High bandwidth: needed for downloading large files –Marketing departments like to conflate latency and bandwidth…
20
Relationship between Latency and Throughput Latency and bandwidth only loosely coupled –Henry Ford: assembly lines increase bandwidth without reducing latency My factory takes 1 day to make a Model-T ford. –But I can start building a new car every 10 minutes –At 24 hrs/day, I can make 24 * 6 = 144 cars per day –A special order for 1 green car, still takes 1 day –Throughput is increased, but latency is not. Latency reduction is difficult Often, one can buy bandwidth –E.g., more memory chips, more disks, more computers –Big server farms (e.g., google) are high bandwidth
21
What is cloud computing? Cloud computing is where dynamically scalable and often virtualized resources are provided as a service over the Internet (thanks, wikipedia!) Infrastructure as a service (IaaS) –Amazon’s EC2 (elastic compute cloud) Platform as a service (PaaS) –Google gears –Microsoft azure Software as a service (SaaS) –gmail –facebook –flickr
22
Thanks, James Hamilton, amazon
23
Lecture 1 23 Graphics has dedicated chip in PCs CPU Memory Input/Output Glue Chip (“South Bridge”) Graphics Processor Memory Controller Chip (“North Bridge”) Memory Disk, Keyboard, PCIe, etc. 582 Million transistors 681 Million transistors (GeForce 8800, 90nm) (AGP, PCIe) (Intel “Kentsfield” quad core, QX6700, 65nm, two dies, 8MB L2$)
24
Lecture 1 24 GPU/CPU Performance comparison GFLOPS G80 = GeForce 8800 GTX G71 = GeForce 7900 GTX G70 = GeForce 7800 GTX NV40 = GeForce 6800 Ultra NV35 = GeForce FX 5950 Ultra NV30 = GeForce FX 5800 Source: NVIDIA (except CELL and Core2 Quad) * IBM Cell ~200 GFlops Core 2 Quad 3GHz, 96 GFLOPS *
25
CS352H Fall 2007 Lecture 1 25 Why a dedicated processing chip? 1) Specialization – becoming less important with time 2) Parallelism – becoming more important Graphics processors are the only highly-parallel processors in every desktop machine. 128 “processors” * 2 FLOPS @ 1.35 GHz You can program them!
26
Lecture 1 26 Graphics requires programmability void normalmapped(float2 normalMapTexCoord : TEXCOORD0, … out float4 color : COLOR, uniform float ambient, …) { float3 normalTex, …; normalTex = tex2D(normalMap, normalMapTexCoord).xyz; … diffuse = saturate(dot(normal, normLightDir); … color = Kd * (ambient + diffuse ) + Ks * pow(specular, specularExponent; } Every application does something a bit different. Example Cg “shader” program (invoked like a “callback” function):
27
Lecture 1 27 GeForce 8800
28
Lecture 1 28 Next Time Performance evaluation Basic computer organization How chips are made Start in on instruction set review/overview Always check web page for assignments
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.