“Today, less than $500 will purchase a personal computer that has more performance, more main memory, and more disk storage than a computer bought in.

“Today, less than $500 will purchase a personal computer that has more performance, more main memory, and more disk storage than a computer bought in for 1 million dollars.” section 1.1

Figure 1.1 Growth in processor performance since mid-1980s
section 1.1

Why such a rapid growth? Innovation in computer design
Reduced Instruction Set Computers Exploitation of instruction level parallelism Use of caches Advances in technology used to build computers Integrated circuit technology which supported mass production of microprocessors (allowed CPU on one chip) Elimination of assembly language programming and vendor- independent operating systems (Unix, Linux) Made it less risky for vendors to introduce a new architecture section 1.1

But, huge performance gains are over
Processor performance improvement has dropped from 50% (since mid 80s) to currently 20% No instruction level parallelism left to exploit Almost unchanged memory latency section 1.1

Where is industry headed?
Thread-level parallelism (TLP) Thread is a separate process with its own instructions and data that is typically part of a parallel program Data-level parallelism (DLP) Same operation applied to multiple pieces of data simultaneously Note: unlike instruction-level parallelism that occurs without programmer intervention, TLP and DLP requires the programmer to write the parallel code section 1.1

Changing face of computing
1960s – Large mainframe Millions of dollars Multiple operators Data processing and scientific computing 1970s Minicomputer Time-sharing Terminals section 1.2

Changing face of computing
1980s Desktop computing via microprocessors Servers that provide file storage and access, larger memory, more computing power 1990s (and beyond!) Internet and world wide web (need for powerful servers) Hand held computing devices (PDAs) Digital consumer electronics (video games, dvd players, TiVo, satellite receivers) section 1.2

Three computing markets
Desktop computing $500 PCs to $5,000 workstations Market driven to optimize price- performance (how much bang for the buck) Consumers are interested in high performance and low price sections 1.2

Servers Provide file storage and access,and enough memory and computing power to support many simultaneous users Emergence of world wide web increased popularity Key requirements Availability Scalability Efficient throughput sections 1.2

Embedded computers Computers lodged in other devices – microwaves, washing machines, printers, networking switches, PDAs, ATMs, etc. Very wide range of processing power and cost: 8 bit processors that cost less than a dime to high-end processors for game systems that cost hundreds of dollars Real-time performance requirement – absolute maximum execution time Need to minimize memory (contain costs) and power sections 1.2

What about supercomputers?
Supercomputer – machine with significant floating point performance, designed for specific scientific applications; costs millions of dollars Clusters of desktop computers have largely overtaken conventional supercomputers because they are cheaper and scalable sections 1.2

Instruction Set Architecture (ISA)
Portion of the computer visible to the programmer or compiler writer What is visible: Instruction types and formats General purpose registers Addressing modes Memory addressing section 1.3

Classifying ISAs Classified by the location of operands
Five possibilities: Stack architecture Accumulator architecture Memory-memory architecture Register-memory architecture Register-register architecture section 1.3

Stack architecture Operands are implicitly on top of the stack
Code sequence for C = A + B Push A Push B Add Pop C Each of these instructions causes a TOS register to be modified (TOS = address of operand on top of stack) in addition to the obvious calculation Example: JVM section 1.3

Accumulator architecture
One operand is implicitly the accumulator Code sequence for C = A + B Load A Add B Store C Note the accumulator can be used to hold the input operand or the result section 1.3

Memory-memory architecture
All operands are explicit and are in memory Code sequence for C = A + B ADD C, A, B Not found in any modern machines section 1.3

Register-memory architecture
Operands are explicit, either register or memory location Any instruction can access memory Code sequence for C = A + B Load R1, A Add R3, R1, B Store R3, C 80x86 (X86) is an example section 1.3

Register-Register architecture
Only load and store instructions can access memory Also called a load-store architecture Code sequence for C = A + B Load R1, A Load R2, B Add R3, R1, R2 Store R3, C Example: MIPS section 1.3

Machines designed since 1980
Load-store architectures Why? How many registers? Newer machines tend to have more registers than older machines As instruction level parallelism increased, the number of registers needed increased section 1.3

Memory Addressing How is memory address interpreted?
Almost always the address is a byte address Bytes, halfwords, words, and often doublewords can be accessed How are bytes ordered within a word? Little endian – low order byte has the lowest address Big endian – high order byte has the lowest address section 1.3

Little endian machine Makes sense that low order byte has the lowest address But, strings appear in reverse order in memory dumps and registers 3 2 1 section 1.3

Big endian machine Low order byte is in the right most byte above, but has the highest address 1 2 3 section 1.3

Alignment In most computer accesses to objects larger than a byte must be aligned Access to an object of size s at address A is aligned if: A mod s = 0 Why this restriction? Memory typically aligned on word or double word boundary Unaligned accesses require multiple accesses to memory and are thus slower section 1.3

Addressing Modes Mode indicates where the operand can be found (register, part of the instruction, memory location) Effective address calculation – uses the address field of an instruction to determined a memory location section 1.3

80x86 addressing modes Register Immediate Absolute Indirect
Base plus displacement Indexed (two forms) Scaled indexed (four forms) section 1.3

MIPS addressing modes Register Immediate Displacement section 1.3

Type and Size of Operands
How is type of operand designated? Part of the operand (old technique) Part of the opcode (add versus addf) section 1.3

Typical types for desktop/servers
Character (8 bit ASCII or 16 bit Unicode) Halfword (16 bits) Integer (32 bit word almost always two’s complement) Single precision floating point (1 word, almost always IEEE 754 standard) Double precision floating point (2 words, IEEE 754) 80x86 also supports extended double precision (80 bits) section 1.3

More types for desktops/servers
Packed decimal (binary coded decimal) four bits used to encode single decimal digits two digits per byte Used in business applications to get exact decimal numbers (.1 in decimal is a repeating binary number) section 1.3

Operations in the Instruction Set
Categories Arithmetic and logical Data transfer Control System Floating point Decimal (Packed Decimal) String Graphics (Pixel and Vertex Operations) section 1.3

Instructions for Control Flow
No consistent terminology: branch, jump, transfer This book: jump = unconditional; branch = conditional 4 types: Conditional branches Jumps Procedure calls Procedure returns section 1.3

Addressing Modes for Control Flow Instructions
Target usually explicitly specified (major exception: procedure returns) Most common way to specify target is as a displacement from the PC (PC-relative) PC-relative mode only works if the target can be calculated at compile-time Register-indirect jump – target address is in a register (calculated at runtime) section 1.3

Why indirect jumps? Case/switch statements
Virtual functions (dynamic binding of call to function binding) Function pointers – functions passed as arguments Dynamically shared libraries – library loaded and linked at runtime section 1.3

80x86 branches Conditional branches test condition code bits set as a result of a previous arithmetic instruction Direct jump jmp .L1 (unconditional) je .L1 (conditional based upon value of ZF) Indirect jump jmp *%eax (target is in %eax) jmp *(%eax) (target is in memory) call sub – pushes return address onto the stack section 1.3

MIPS branches Conditional branches test contents of a register
beqz r3 target (branch if r3 is equal to zero) Return address is saved in a register jal sub (return address automatically saved in r31) jalr r3 (target is in r3; return address saved in r31) Unconditional jumps j target (target is encoded as PC relative) j r (target is in r3) section 1.3

Encoding an instruction set: competing forces
Desire to have as many registers and modes as possible Want to keep instruction size and thus program size as small as possible Want instructions to be encoded in such a way that they are easy to pipeline section 1.3

Encoding an instruction set
Fixed encoding (MIPS) Addressing mode implied by the opcode All instructions are the same size Easier to decode Easier to pipeline instruction execution Variable encoding (80x86) Addressing mode is explicit in the operand Different length instructions All addressing modes work with all operands Program is smaller (important when memory was at a premium) Individual instructions vary in size and amount of work section 1.3

Implementation technologies
Integrated circuit logic technology – number of transistors on a chip increases about 40-55% per year Semiconductor DRAM (primary memory) – density increases 40% per year Magnetic disks – disk density improving by more than 30% per year recently Network technology – recently improving quickly because of interest in world wide web section 1.4

Performance trends Bandwidth (throughput) – total amount of work done in a given time Megabytes/second for a disk transfer Latency (response time) – time between start and completion of an event Milliseconds for a disk access Figure 1.8 shows that bandwidth improves much more rapidly than latency section 1.4

Figure 1.8 sections 1.1, 1.2, 1.3, 1.5, 1.6, 1.7, 1.9

Impact of transistors and wires
Feature size – size of transistor or wire in either the x or y dimension As transistors decrease in size Amount of power needed for correct operation of transistor decreases Chip density increases Wire delay plays a greater role in chip performance High rate in improvement in transistor density resulted in the rapid advance from 4-bit to 64-bit processors section 1.4

“Today, less than $500 will purchase a personal computer that has more performance, more main memory, and more disk storage than a computer bought in.

Similar presentations

Presentation on theme: "“Today, less than $500 will purchase a personal computer that has more performance, more main memory, and more disk storage than a computer bought in."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

“Today, less than $500 will purchase a personal computer that has more performance, more main memory, and more disk storage than a computer bought in.

Similar presentations

Presentation on theme: "“Today, less than $500 will purchase a personal computer that has more performance, more main memory, and more disk storage than a computer bought in."— Presentation transcript:

Similar presentations

About project

Feedback