CSCE 121:509-512 Simple Computer Model Spring 2015 Based on slides created by Bjarne Stroustrup and Jennifer Welch
HOW DATA IS REPRESENTED 1 HOW DATA IS REPRESENTED
Fundamental unit of computer storage is a bit, a data entity that can take on one of two values, 0 or 1
Given a sequence of bits (01010001), what does it represent? Depends on the code Base 10 integer: 1,010,001 Base 2 integer: 1*20 + 1*24 + 1*26 = 81 (in base 10) ASCII: The letter ‘Q’ Must know the code to decode a sequence There are many other codes…
2 HOW DATA IS STORED
Bytes Make Memory Byte: a group of 8 bits How many different values? 00000000, 00000001, 00000010, …, 11111110, 11111111 28 = 256 possibilities Memory: long sequence of bytes, numbered 0, 1, 2, 3,… byte 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 0 0 1 1 0 0 1 1 0 1 0 0 1 1 0 0 byte 1 byte 2 ….. byte 3
Addresses and Words Address: the number of a byte in memory Contents of a byte can change Use consecutive locations to store longer sequences of information 4 bytes = 1 word byte 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 0 0 1 1 0 0 1 1 0 1 0 0 1 1 0 0 byte 1 byte 2 ….. byte 3 a word
Limitations of Finite Data Encodings Overflow: number is too large suppose 1 byte stores integers in base 2, from 0 (00000000) to 255 (11111111) if the byte holds 255, then adding 1 to it results in 256, which is too large to be stored in the byte CSCE 121-200: Set 2: Architecture
Limitations of Finite Data Encodings Roundoff error: insufficient precision (size of word): try to store 1/8, which is .001 in base 2, with only two bits nonterminating expansions in current base: try to store 1/3 in base 10, which is .333… nonterminating expansions in every base: irrational numbers such as π
Kinds of Storage slower and cheaper Cache: super-fast Main memory: random access, equally fast to access any address Disk: random access, significantly slower than main memory slower and cheaper Cache and main memory are where your program is executed. Disk is where your program and data are stored. According to the Computer History Museum, “Tape was a storage mainstay for many years and still survives, thanks to its low cost, portability, unlimited offline capacity, and standardized formats that make tapes interchangeable.” http://www.computerhistory.org/revolution/memory-storage/8/258 Tape: sequential access, significantly slower than disk
3 HOW DATA IS OPERATED ON
Main Players Data in main memory cannot be operated on: must be copied (loaded) to special locations called registers Once data is in registers, it can be operated on by circuitry called arithmetic logic unit (ALU) e.g., add, multiply Result of operation can then be copied (stored) back to main memory Procedure is organized by control unit circuitry figures out what the ALU should do next transfers data between main memory and registers Registers, ALU and control unit are in the CPU
CPU ALU registers control unit main memory
Machine Instructions Goal: add the number stored in address 3 and the number stored in address 6; put the result in address 10. Control unit does the following: copies data from main memory address 3 into some register, say 1: LOAD 3,1 copies data in main memory address 6 into some register, say 4: LOAD 6,4 tells ALU to add the contents of registers 1 and 4, and put result in some register, say 3: ADD 1,4,3 copies data in register 3 into main memory address 10: STORE 3,10 LOAD, ADD and STORE are machine instructions. How does the control unit know which instruction is next? The program!
4 HOW A PROGRAM IS STORED
Program is stored in memory the same way data is stored! Program: list of machine instructions using some agreed upon coding convention. Example: Program is stored in memory the same way data is stored! opcode 1st operand 2nd operand 3rd operand ADD 1 4 3 0 0 1 0 0 0 0 1 0 1 0 0 0 0 1 1
HOW A PROGRAM IS EXECUTED ON THE DATA 5 HOW A PROGRAM IS EXECUTED ON THE DATA
How a Program is Executed The control unit has instruction register: holds current instruction to be executed program counter: holds address of next instruction in the program to be fetched from memory Program counter tells where the computer is in the program. Usually the next instruction to execute is the next instruction in memory Sometimes we want to JUMP to another instruction (e.g., if or while) unconditional JUMP: always jump to address given conditional JUMP: only jump if a certain condition is true (e.g., some register contains 0)
CPU ALU registers control unit instruction register program counter main memory
Machine Cycle fetch next instruction into instruction register (IR), as indicated by program counter (PC), and increment PC decode bit pattern in IR to figure out which circuitry needs to be activated to perform instruction execute instruction by copying data into registers and activating ALU to do the right thing a JUMP may cause the PC to be altered
Diagram of Architecture PC: IR: R2: ALU CPU: R3: control unit R4: bus 0 1 2 3 4 5 data … main memory: first instr. second instr. third instr. 95 96 97 98 99 100 … … program
HOW A PROGRAM IS TRANSLATED INTO MACHINE LANGUAGE 6 HOW A PROGRAM IS TRANSLATED INTO MACHINE LANGUAGE
Evolution of Programming Languages High-level languages: such as Fortran, C, Java, C++ machine independent easier for people compiler translates high-level language programs into machine language Assembly languages: allow symbols for operators and addresses still machine dependent slightly less painful assembler translates assembly language programs into machine language Machine languages: all in binary machine dependent painful for people no translation needed
Why Compiling is Hard one high-level instruction can correspond to several machine instructions x = (y+3)/z; fancier data structures arrays: must calculate addresses for references to array elements structs: similar to arrays, but less regular fancier control structures than JUMP while repeat if-then-else case/switch functions/subroutines/ methods: generate machine language to: copy input parameter values save return address start executing at beginning of function code copy output parameter values at the end set PC to return address
Lexical Analysis Parsing Code Generation Compilation Process break up strings of characters into logical components, called tokens, and discard comments and spaces Ex: total = sum + 55.32 has 5 tokens Lexical Analysis decide how the tokens are related Ex: sum + 55.32 is an arithmetic expression Ex: total = sum + 55.32 is an assignment statement Parsing generate machine instructions for each high-level instruction resulting machine language program, called object code, is written to disk Code Generation
Combined code is written to disk. Linking Combined code is written to disk. Linker combines results of compiling different pieces of the program separately. If pieces refer to each other, these references cannot be resolved during independent compilation function p … refers to x main declares x invokes p
Result is an executable Loading Result is an executable Loader initializes PC to starting location of program and adjusts JUMP addresses To run the program, the loader copies the object code from disk into main memory (location in main memory is determined by operating system, not programmer)
Source code for your program Compiler Object code for your program Object code for libraries Linker/Loader Executable for your program
Credits Slide 1: “NEC APC” by Niv Singer, licensed under CC BY-SA 2.0 Slide 3: “La tecnología...” by Infocux Technologies, licensed under CC BY-NC 2.0 Slide 4: http://commons.wikimedia.org/wiki/File:ASCII_Code_Chart.svg Slide 8: “Overflow” by Braveheart, licensed under CC BY-NC-ND 2.0 Slide 9: “pumpkin pi” by Craig Damlo, licensed under CC BY-NC-ND 2.0