ECE 4436ECE 5367 Introduction to Computer Architecture and Design Ji Chen Section : T TH 1:00PM – 2:30PM Prerequisites: ECE 4436
ECE 4436ECE 5367 Instructor:Ji Chen Tel: (713) Office: W328 Office Hour: T TH 2:30-3:30 or by appointment TA:None
ECE 4436ECE 5367
ECE 4436ECE Introduction, basic computer organization 2.Instruction formats, instruction sets and their design 3.ALU design: Adders, subtracters, logic operations 4.Multiplication, division, floating point arithmetic 5.Datapath design 6.Control design: Hardwired control, microprogrammed control 7.Pipelining 8.Memory systems 9.I/O Course Contents
ECE 4436ECE 5367 HW/Quiz/Lab10 % Project15 % Exam 125 % Exam 225 % Exam 325 % Grading Web: Academic Honesty Statement
ECE 4436ECE 5367 Computer Organization and Design: The Hardware/Software Interface by David A. Patterson, John L. Hennessy, 3 rd edition Required NOT REQUIRED
ECE 4436ECE 5367 Home works/quiz: There will be several graded homework/lab assignments. Home works turned in late will be accepted only under extraordinary circumstances. Labs: Laboratory assignments may be worked in teams of two (2); however, there should be no collaboration between teams.. Lab assignments turned in late will be penalized 25 points for each calendar day. Both students in a team will receive the same grade for the project. Projects: Teams of four (4): describe computer architecture of a modern technology Exams: two mid-term exams, and one final exam. A missed exam will result in a grade of zero Let me know immediately if you have any situation Final Exam - TBD Grading: Your final grade will be computed as follows: HW/Quiz/Lab10 % Project15 % Exam 125 % Exam 225 % Exam 325 %
ECE 4436ECE 5367 Since 1946 all computers have had 5 components Control Datapath Memory Processor Input Output
ECE 4436ECE 5367 Message Bus (Mbus ) TI SuperSPARC tm TMS390Z50 in Sun SPARCstation20 Floating-point Unit Integer Unit Inst Cache Ref MMU Data Cache Store Buffer Bus Interface SuperSPARC L2 $ CC MBus Module MBus L64852 MBus control M-S Adapter SBus DRAM Controller SBus DMA SCSI Ethernet STDIO serial kbd mouse audio RTC Floppy SBus Cards
ECE 4436ECE 5367 Computer Architecture Coordination of many levels of abstraction Under a rapidly changing set of forces Design, Measurement, and Evaluation I/O systemInstr. Set Proc. Compiler Operating System Application Digital Design Circuit Design Instruction Set Architecture Firmware Datapath & Control Layout
ECE 4436ECE 5367 Forces on Computer Architecture Computer Architecture Technology Programming Languages Operating Systems History Applications Cleverness
ECE 4436ECE 5367 Mixed-Signal
ECE 4436ECE 5367 Where are We Going?? ECE 5367 Spring 08 Arithmetic Single/multicycle Datapaths IFetchDcdExecMemWB IFetchDcdExecMemWB IFetchDcdExecMem WB IFetchDcdExecMemWB Pipelining Memory Systems I/O
ECE 4436ECE 5367 Purchasing perspective –Given a collection of machines, which has the Best performance ? Least cost ? Best performance / cost ? Design perspective –Faced with design options, which has the Best performance improvement ? Least cost ? Best performance / cost ? Both require –basis for comparison –metric for evaluation Our goal: understand cost & performance implications of architectural choices
ECE 4436ECE 5367 Two Notions of “Performance” Which has higher performance? Time to do the task (Execution Time) – execution time, response time, latency Tasks per day, hour, week, sec, ns... (Performance) – throughput, bandwidth Response time and throughput often are in opposition Plane Boeing 747 Concorde Speed 610 mph 1350 mph DC to Paris 6.5 hours 3 hours Passengers Throughput (pmph) 286, ,200
ECE 4436ECE 5367 Definitions Performance is in units of things-per-second –bigger is better If we are primarily concerned with response time –performance(x) = 1 execution_time(x) " X is n times faster than Y" means Performance(X) n = Performance(Y)
ECE 4436ECE 5367 Example Time of Concorde vs. Boeing 747? Concord is 1350 mph / 610 mph= 2.2 times faster = 6.5 hours / 3 hours Throughput of Concorde vs. Boeing 747 ? Concord is 178,200 pmph / 286,700 pmph = 0.62 “times faster” Boeing is 286,700 pmph / 178,200 pmph= 1.60 “times faster” Boeing is 1.6 times (“60%”) faster in terms of throughput Concord is 2.2 times (“120%”) faster in terms of flying time We will focus primarily on execution time for a single job Lots of instructions in a program => Instruction throughput important!
ECE 4436ECE 5367 CPU = Seconds= Instructions x Cycles x Seconds Performance Program Program Instruction Cycle CPU = Seconds= Instructions x Cycles x Seconds Performance Program Program Instruction Cycle
ECE 4436ECE 5367 Speedup due to enhancement E: ExTime w/o E Performance w/ E Speedup(E) = = ExTime w/ E Performance w/o E Suppose that enhancement E accelerates a fraction F of the task by a factor S and the remainder of the task is unaffected then, ExTime(with E) = ((1-F) + F/S) x ExTime(without E) Speedup(with E) = 1 (1-F) + F/S Amdahl's Law
ECE 4436ECE 5367 Typical Mix Base Machine OpFreqCyclesCPI(i)% Time ALU50%1.523% Load20% % Store10%3.314% Branch20%2.418% 2.2 How much faster would the machine be if a better data cache reduced the average load time to 2 cycles? How does this compare with using branch prediction to save a cycle off the branch time? What if two ALU instructions could be executed at once?