Instant replay The semester was split into roughly four parts.

Slides:

Advertisements

Similar presentations

COMP375 Computer Architecture and Organization Senior Review.

Advertisements

Computer Organization and Architecture

Chapter 8. Pipelining.

Chapter 8. Pipelining. Instruction Hazards Overview Whenever the stream of instructions supplied by the instruction fetch unit is interrupted, the pipeline.

1 Chapter Seven Large and Fast: Exploiting Memory Hierarchy.

Chapter 12 Pipelining Strategies Performance Hazards.

1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.

State Machines Timing Computer Bus Computer Performance Instruction Set Architectures RISC / CISC Machines.

Pipelined Processor II CPSC 321 Andreas Klappenecker.

1  2004 Morgan Kaufmann Publishers Chapter Seven.

1 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value is stored as a charge.

1 CSE SUNY New Paltz Chapter Seven Exploiting Memory Hierarchy.

Pipelining. Overview Pipelining is widely used in modern processors. Pipelining improves system performance in terms of throughput. Pipelined organization.

COM181 Computer Hardware Ian McCrumRoom 5B18,

1 Instant replay  The semester was split into roughly four parts. —The 1st quarter covered instruction set architectures—the connection between software.

October 13, 2015CS232 Summary1 Instant replay  The semester was split into roughly four parts. —The 1st quarter covered instruction set architectures—the.

Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.

CS1104 – Computer Organization PART 2: Computer Architecture Lecture 12 Overview and Concluding Remarks.

Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.

CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.

CS.305 Computer Architecture Enhancing Performance with Pipelining Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from.

Ted Pedersen – CS 3011 – Chapter 10 1 A brief history of computer architectures CISC – complex instruction set computing –Intel x86, VAX –Evolved from.

1 Pipelining Part I CS What is Pipelining? Like an Automobile Assembly Line for Instructions –Each step does a little job of processing the instruction.

Stored Programs In today’s lesson, we will look at: what we mean by a stored program computer how computers store and run programs what we mean by the.

1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.

Processor Structure and Function Chapter8:. CPU Structure  CPU must:  Fetch instructions –Read instruction from memory  Interpret instructions –Instruction.

EKT303/4 Superscalar vs Super-pipelined.

DR. SIMING LIU SPRING 2016 COMPUTER SCIENCE AND ENGINEERING UNIVERSITY OF NEVADA, RENO CS 219 Computer Organization.

1  1998 Morgan Kaufmann Publishers Chapter Seven.

Jan. 5, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 1: Overview of High Performance Processors * Jeremy R. Johnson Wed. Sept. 27,

1 Chapter Seven. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value.

High Performance Computing1 High Performance Computing (CS 680) Lecture 2a: Overview of High Performance Processors * Jeremy R. Johnson *This lecture was.

New-School Machine Structures Parallel Requests Assigned to computer e.g., Search “Katz” Parallel Threads Assigned to core e.g., Lookup, Ads Parallel Instructions.

CPIT Program Execution. Today, general-purpose computers use a set of instructions called a program to process data. A computer executes the.

What’s going on here? Can you think of a generic way to describe both of these?

Computer Hardware What is a CPU.

CS 704 Advanced Computer Architecture

Basic Computer Organization and Design

Advanced Architectures

Chapter 10: Computer systems (1)

Memory COMPUTER ARCHITECTURE

ECE354 Embedded Systems Introduction C Andras Moritz.

Control Unit Lecture 6.

How do we evaluate computer architectures?

Central Processing Unit Architecture

A Real Problem What if you wanted to run a program that needs more memory than you have? September 11, 2018.

A Closer Look at Instruction Set Architectures

Instruction Set Architecture

Performance of Single-cycle Design

Architecture & Organization 1

Microcomputer Architecture

Architecture & Organization 1

Computer Organization and ASSEMBLY LANGUAGE

MARIE: An Introduction to a Simple Computer

1.1 The Characteristics of Contemporary Processors, Input, Output and Storage Devices Types of Processors.

Guest Lecturer TA: Shreyas Chand

November 5 No exam results today. 9 Classes to go!

Chapter 8. Pipelining.

Computer Evolution and Performance

Introduction to Microprocessor Programming

Overview Prof. Eric Rotenberg

Instruct Set Architecture Variations

ARM ORGANISATION.

Chapter 11 Processor Structure and function

The University of Adelaide, School of Computer Science

Cache writes and examples

Presentation transcript:

Instant replay The semester was split into roughly four parts. The 1st quarter covered instruction set architectures—the connection between software and hardware. In the 2nd quarter of the course we discussed processor design. We focused on pipelining, which is one of the most important ways of improving processor performance. The 3rd quarter focused on large and fast memory systems (via caching), virtual memory, and I/O. Finally, we discussed performance tuning, including profiling and exploiting data parallelism via SIMD and Multi-Core processors. We also introduced many performance metrics to estimate the actual benefits of all of these fancy designs. Memory Processor Input/Output August 7, 2018 CS232 Summary

Some recurring themes There were several recurring themes throughout the semester. Instruction set and processor designs are intimately related. Parallel processing can often make systems faster. Performance and Amdahl’s Law quantifies performance limitations. Hierarchical designs combine different parts of a system. Hardware and software depend on each other. August 7, 2018 CS232 Summary

Instruction sets and processor designs The MIPS instruction set was designed for pipelining. All instructions are the same length, to make instruction fetch and jump and branch address calculations simpler. Opcode and operand fields appear in the same place in each of the three instruction formats, making instruction decoding easier. Only relatively simple arithmetic and data transfer instructions are supported. These decisions have multiple advantages. They lead to shorter pipeline stages and higher clock rates. They result in simpler hardware, leaving room for other performance enhancements like forwarding, branch prediction, and on-die caches. August 7, 2018 CS232 Summary

Parallel processing One way to improve performance is to do more processing at once. There were several examples of this in our CPU designs. Multiple functional units can be included in a datapath to let single instructions execute faster. For example, we can calculate a branch target while reading the register file. Pipelining allows us to overlap the executions of several instructions. SIMD performs operations on multiple data items simultaneously. Multi-core processors enable thread-level parallel processing. Memory and I/O systems also provide many good examples. A wider bus can transfer more data per clock cycle. Memory can be split into banks that are accessed simultaneously. Similar ideas may be applied to hard disks, as with RAID systems. A direct memory access (DMA) controller performs I/O operations while the CPU does compute-intensive tasks instead. August 7, 2018 CS232 Summary

Performance and Amdahl’s Law First Law of Performance: Make the common case fast! But, performance is limited by the slowest component of the system. We’ve seen this in regard to cycle times in our CPU implementations. Single-cycle clock times are limited by the slowest instruction. Pipelined cycle times depend on the slowest individual stage. Amdahl’s Law also holds true outside the processor itself. Slow memory or bad cache designs can hamper overall performance. I/O bound workloads depend on the I/O system’s performance. August 7, 2018 CS232 Summary

Hierarchical designs Hierarchies separate fast and slow parts of a system, and minimize the interference between them. Caches are fast memories which speed up access to frequently-used data and reduce traffic to slower main memory. (Registers are even faster…) Buses can also be split into several levels, allowing higher-bandwidth devices like the CPU, memory and video card to communicate without affecting or being affected by slower peripherals. August 7, 2018 CS232 Summary

Architecture and Software Computer architecture plays a vital role in many areas of software. Compilers are critical to achieving good performance. They must take full advantage of a CPU’s instruction set. Optimizations can reduce stalls and flushes, or arrange code and data accesses for optimal use of system caches. Operating systems interact closely with hardware. They should take advantage of CPU features like support for virtual memory and I/O capabilities for device drivers. The OS handles exceptions and interrupts together with the CPU. August 7, 2018 CS232 Summary

Five things that I hope you will remember Abstraction: the separation of interface from implementation. ISA’s specify what the processor does, not how it does it. Locality: Temporal Locality: “if you used it, you’ll use it again” Spatial Locality: “if you used it, you’ll use something near it” Caching: buffering a subset of something nearby, for quicker access Typically used to exploit locality. Indirection: adding a flexible mapping from names to things Virtual memory’s page table maps virtual to physical address. Throughput vs. Latency: (# things/time) vs. (time to do one thing) Improving one does not necessitate improving the other. August 7, 2018 CS232 Summary

Where to go from here? CS433: Advanced Comp. Arch: All of the techniques used in modern processors that I didn’t talk about (out-of-order execution, superscalar, advanced branch prediction, prefetching…). Homework-oriented. CS431: Embedded Systems: How hardware/software gets used in things we don’t think of as computers (e.g., anti-lock breaks, pacemakers, GPS). Lab-oriented. CS498-DP: Parallel Programming: How to write parallel programs. CS498-MG: Program Optimization: How to make a program run really fast (like 4th quarter of 232, but more so). Project-oriented. ECE 498 AL : Programming Massively Parallel Processors: How to write general purpose programs to run on GPU using CUDA. ECE411: Computer Organization and Design: Some content overlap with CS232 and CS433, but you actually build the hardware. Lab-oriented. CS426: Compiler Construction: How does a compiler translate a programming language down to assembly and optimization. Project-oriented. August 7, 2018 CS232 Summary

Good luck on your exams and have a great summer! August 7, 2018 CS232 Summary

Good luck on your exams and have a great break! Friday’s lecture is optional (i.e., will not be covered on final) I will give an overview of techniques used in modern processors, including (probably) a brief description of my research. August 7, 2018 CS232 Summary