Download presentation
Presentation is loading. Please wait.
Published byJames Foster Modified over 8 years ago
1
Wednesday, March 09 Homework #4 Homework #4 Questions? Questions? Solutions will be posted Solutions will be posted Quiz #4 Quiz #4 Later today Later today Questions? Questions? Verify your scores! Verify your scores!
2
Program #5 Program #5 Due Friday at midnight Due Friday at midnight Questions? Questions? Scores will be posted ASAP Scores will be posted ASAP
3
Today's topics Parallelism Parallelism Final exam overview Final exam overview Course objectives evaluation Course objectives evaluation
4
Parallelism Instruction-level Instruction-level Pipelining Pipelining Superscaling Superscaling Processor-level Processor-level Theoretically, 1000 1-nsec. processors should be as fast as one 0.001-nsec. processor Theoretically, 1000 1-nsec. processors should be as fast as one 0.001-nsec. processor Speed depends on Speed depends on Speed of each processor Speed of each processor Number of processors Number of processors Speed / size / number of memory modules Speed / size / number of memory modules Interconnection network of processors and memory modules Interconnection network of processors and memory modules
5
Applications Multi-user systems Multi-user systems Networks Networks Internet Internet Speed up single processes Speed up single processes Chess example Chess example Expert systems Expert systems Other AI applications Other AI applications
6
Pipelining
7
Superscaling (with 5 functional units)
8
Multiprocessor (shared memory) (a) A multi-processor with 16 CPUs sharing a common memory (b) An image partitioned into 16 sections, each being processed by a separate CPU
9
Multicomputer (distributed memory) (a) A multi-computer with 16 CPUs, each with its own memory (b) An image partitioned into 16 sections, distributed among 16 memories
10
Comparisons Multiprocessor Multiprocessor Difficult to build Difficult to build Relatively easy to program Relatively easy to program Multicomputer Multicomputer Easy to build (given networking technology) Easy to build (given networking technology) Extremely difficult to program Extremely difficult to program Hybrid systems Hybrid systems Scalable architectures Scalable architectures Add more processors (nodes), without having to re-invent the system Add more processors (nodes), without having to re-invent the system
11
Interconnection network Communication among processors Communication among processors Multiprocessor system Multiprocessor system Communication through circuits/memory Communication through circuits/memory Multicomputer system Multicomputer system Communication through networking technologies Communication through networking technologies Packets (data, source/destination information, etc. Packets (data, source/destination information, etc. Links, switches, interfaces, etc. Links, switches, interfaces, etc.
12
Parallel computing performance depends on … Hardware Hardware CPU speed of individual processors CPU speed of individual processors I/O speed of individual processors I/O speed of individual processors Interconnection network Interconnection network Scalability Scalability Software Software Parallelizability of algorithms Parallelizability of algorithms Application programming languages Application programming languages Operating systems Operating systems Parallel system libraries Parallel system libraries
13
Hardware parallelism CPU and I/O speed: CPU and I/O speed: Same factors as for single-processor machines … plus: Same factors as for single-processor machines … plus: Interconnection network Interconnection network Latency (wait time): Latency (wait time): Distance Distance Collisions / collision resolution Collisions / collision resolution Bandwidth (bps) Bandwidth (bps) Bus limitations Bus limitations CPU and I/O limitations CPU and I/O limitations Scalability Scalability Adding more processors affects latency and bandwidth Adding more processors affects latency and bandwidth
14
Software parallelism Parallelizability of algorithms Parallelizability of algorithms Number of processors Number of processors Trade-offs and efficiency Trade-offs and efficiency Sequential/parallel parts Sequential/parallel parts Amdahl's Law: Amdahl's Law: n = number of processors f = fraction of code that is sequential T = time to process entire algorithm sequentially (one processor) Note: total execution time is
15
(a) A program has a sequential part and a parallelizable part (b) Effect of running the parallelizable part on a multi- processor architecture
16
Software parallelism Example: An algorithm takes 10 seconds to execute on a single 2.4G processor. 40% of the algorithm is sequential. Assuming zero latency and perfect parallelism in the remaining code, how long should the algorithm take on a 16 x 2.4G processor parallel machine? Therefore the expected time is 10 / (16 / 7) = 4.375 seconds Another way: (.4 x 10) + (.6 x 10) / 16 Seq. + Parallel
17
Software parallelism Assuming perfect scalability, what are the implications of Amdahl’s Law when n ? Assuming perfect scalability, what are the implications of Amdahl’s Law when n ? speedup 1/f(assuming f 0) speedup 1/f(assuming f 0) Therefore, if f =.4, parallelism can never make the program run more than 2.5 times as fast.
18
Software parallelism Parallel system libraries Parallel system libraries Precompiled functions designed for multiprocessing (e.g., matrix transformations) Precompiled functions designed for multiprocessing (e.g., matrix transformations) Functions for control of communication (e.g., background printing) Functions for control of communication (e.g., background printing) Application programming languages Application programming languages Built-in functions for creating child processes, threads, parallel looping, etc. Built-in functions for creating child processes, threads, parallel looping, etc. Mostly imperative Mostly imperative Operating systems Operating systems
19
Software issues: In order to really take advantage of hardware parallelism … 1. Control models Single instruction thread Single instruction thread Multiple instruction threads Multiple instruction threads Single data set Single data set Multiple data sets Multiple data sets SISD, SIMD, MISD, MIMD SISD, SIMD, MISD, MIMD Software (including OS, compilers, etc.) must be designed to use the features Software (including OS, compilers, etc.) must be designed to use the features
20
Research Area !! UMA:Uniform Memory Access NUMA:Non-Uniform Memory Access CC-NUMA: Coherent Cache NUMA NC-NUMA: Non-Cache NUMA COMA:Cache-Only Memory Access MPP:Massively Parallel Processing COW:Cluster of Workstations
21
Software issues: In order to really take advantage of hardware parallelism … 2. Granularity of parallelism At what levels is parallelism implemented? At what levels is parallelism implemented? 3. Computational paradigms Pipelining Pipelining Divide and conquer Divide and conquer Phased computation Phased computation Replicated worker Replicated worker
22
(a) Pipelining(b) Phased Computation (c) Divide and Conquer (d) Replicated Worker Enormous research area !!
23
Software issues: In order to really take advantage of hardware parallelism … 4. Communication methods Shared variables Shared variables Message passing Message passing 5. Synchronization Semaphores, locks, etc. Semaphores, locks, etc.
24
Inovations ExtremeTech ExtremeTech ExtremeTech Learn about 64-bit machines Learn about 64-bit machines Compare several 64-bit architectures, including Compare several 64-bit architectures, including Intel Gulftown, etc. Intel Gulftown, etc. AMD Phenom II X4 955 AMD Phenom II X4 955 Questions on parallelism? Questions on parallelism?
25
Final Exam Date: Monday, March 14 Date: Monday, March 14 Time:10:00 am Time:10:00 am Place:19/128 Place:19/128 Weight:20% of course grade Weight:20% of course grade
26
Exam administration Format: Format: 50% multiple-choice questions 50% multiple-choice questions 50% short-answer, coding, code tracing 50% short-answer, coding, code tracing Bring: Bring: 8.5 x 11 notepage (2-sided) 8.5 x 11 notepage (2-sided) Calculator Calculator NO sharing calculators, notepages NO sharing calculators, notepages
27
Exam content Emphasis is on material covered since Midterm Emphasis is on material covered since Midterm Lectures 12 – 19 Lectures 12 – 19 MASM programming (Programs #4 – 5) MASM programming (Programs #4 – 5) Homeworks #2 - 4 Homeworks #2 - 4 Quizzes #3 - 4 Quizzes #3 - 4 General coverage of earlier topics General coverage of earlier topics
28
"General knowledge" topics Hardware components of CISC/RISC computers ("hypothetical computer") Hardware components of CISC/RISC computers ("hypothetical computer") Assembly to execution phases Assembly to execution phases assembly process assembly process linking / loading linking / loading execution execution Instruction execution cycle Instruction execution cycle
29
Exam Topics Architecture Architecture Gates, circuits Gates, circuits Boolean functions, Boolean identities, truth tables Boolean functions, Boolean identities, truth tables Memory (structure, addressing) Memory (structure, addressing) Buses Buses Parallelism Parallelism Instruction-level Instruction-level Processor-level Processor-level Multi-processor / multi-computer Multi-processor / multi-computer Factors affecting parallelism and speed Factors affecting parallelism and speed Amdahl's law Amdahl's law
30
Topics Interrupts Interrupts General concepts (not MASM-specific) General concepts (not MASM-specific) Expression evaluation Expression evaluation RPN RPN evaluation evaluation conversion infix postfix conversion infix postfix Using 0-address instructions Using 0-address instructions
31
Exam Topics MASM MASM Procedures (create, document, call, etc.) Procedures (create, document, call, etc.) Structure Structure Return address, etc. Return address, etc. call, ret, system stack call, ret, system stack stack frame (activation record) stack frame (activation record) Passing parameters Passing parameters in registers (value, address) in registers (value, address) on the system stack (value, address) on the system stack (value, address) NOT covered: NOT covered: USES, local variables USES, local variables
32
Topics MASM MASM Arrays Arrays addressing modes addressing modes OFFSET, TYPE, PTR, LENGTHOF, SIZEOF OFFSET, TYPE, PTR, LENGTHOF, SIZEOF address calculations address calculations Stack management Stack management Macros Macros String processing String processing NOT covered: NOT covered: “low-level” features of MASM “low-level” features of MASM
33
Questions? Course objectives evaluation Course objectives evaluation
34
I have the ability to 1. 1.Identify the major components of CISC and RISC architectures, and explain their purposes and interactions. 2. 2.Simulate the internal representation of data, and show how data is stored and accessed in memory. 3. 3.Explain the relationships between a hardware architecture and its instruction set, and simulate micro- programs. 4. 4.Explain the Instruction Execution Cycle. 5. 5.Explain the differences among high-level, assembly, and machine languages. 6. 6. Write well-modularized computer programs in an assembly language, implementing decision, repetition, and procedures. 7. 7. Use a debugger, and explain register contents. 8. 8. Explain how the system stack is used for procedure calls and parameter passing. 9. 9. Explain how editors, assemblers, linkers, and operating systems enable computer programming. 10. 10. Explain various mechanisms for implementing parallelism in hardware/software. Strongly Disagree Strongly Agree ABCDE On-line Moodle questionaire complete by Monday
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.