Presentation is loading. Please wait.

Presentation is loading. Please wait.

Wednesday, March 09 Homework #4 Homework #4 Questions? Questions? Solutions will be posted Solutions will be posted Quiz #4 Quiz #4 Later today Later today.

Similar presentations


Presentation on theme: "Wednesday, March 09 Homework #4 Homework #4 Questions? Questions? Solutions will be posted Solutions will be posted Quiz #4 Quiz #4 Later today Later today."— Presentation transcript:

1 Wednesday, March 09 Homework #4 Homework #4 Questions? Questions? Solutions will be posted Solutions will be posted Quiz #4 Quiz #4 Later today Later today Questions? Questions? Verify your scores! Verify your scores!

2 Program #5 Program #5 Due Friday at midnight Due Friday at midnight Questions? Questions? Scores will be posted ASAP Scores will be posted ASAP

3 Today's topics Parallelism Parallelism Final exam overview Final exam overview Course objectives evaluation Course objectives evaluation

4 Parallelism Instruction-level Instruction-level Pipelining Pipelining Superscaling Superscaling Processor-level Processor-level Theoretically, 1000 1-nsec. processors should be as fast as one 0.001-nsec. processor Theoretically, 1000 1-nsec. processors should be as fast as one 0.001-nsec. processor Speed depends on Speed depends on Speed of each processor Speed of each processor Number of processors Number of processors Speed / size / number of memory modules Speed / size / number of memory modules Interconnection network of processors and memory modules Interconnection network of processors and memory modules

5 Applications Multi-user systems Multi-user systems Networks Networks Internet Internet Speed up single processes Speed up single processes Chess example Chess example Expert systems Expert systems Other AI applications Other AI applications

6 Pipelining

7 Superscaling (with 5 functional units)

8 Multiprocessor (shared memory) (a) A multi-processor with 16 CPUs sharing a common memory (b) An image partitioned into 16 sections, each being processed by a separate CPU

9 Multicomputer (distributed memory) (a) A multi-computer with 16 CPUs, each with its own memory (b) An image partitioned into 16 sections, distributed among 16 memories

10 Comparisons Multiprocessor Multiprocessor Difficult to build Difficult to build Relatively easy to program Relatively easy to program Multicomputer Multicomputer Easy to build (given networking technology) Easy to build (given networking technology) Extremely difficult to program Extremely difficult to program Hybrid systems Hybrid systems Scalable architectures Scalable architectures Add more processors (nodes), without having to re-invent the system Add more processors (nodes), without having to re-invent the system

11 Interconnection network Communication among processors Communication among processors Multiprocessor system Multiprocessor system Communication through circuits/memory Communication through circuits/memory Multicomputer system Multicomputer system Communication through networking technologies Communication through networking technologies Packets (data, source/destination information, etc. Packets (data, source/destination information, etc. Links, switches, interfaces, etc. Links, switches, interfaces, etc.

12 Parallel computing performance depends on … Hardware Hardware CPU speed of individual processors CPU speed of individual processors I/O speed of individual processors I/O speed of individual processors Interconnection network Interconnection network Scalability Scalability Software Software Parallelizability of algorithms Parallelizability of algorithms Application programming languages Application programming languages Operating systems Operating systems Parallel system libraries Parallel system libraries

13 Hardware parallelism CPU and I/O speed: CPU and I/O speed: Same factors as for single-processor machines … plus: Same factors as for single-processor machines … plus: Interconnection network Interconnection network Latency (wait time): Latency (wait time): Distance Distance Collisions / collision resolution Collisions / collision resolution Bandwidth (bps) Bandwidth (bps) Bus limitations Bus limitations CPU and I/O limitations CPU and I/O limitations Scalability Scalability Adding more processors affects latency and bandwidth Adding more processors affects latency and bandwidth

14 Software parallelism Parallelizability of algorithms Parallelizability of algorithms Number of processors Number of processors Trade-offs and efficiency Trade-offs and efficiency Sequential/parallel parts Sequential/parallel parts Amdahl's Law: Amdahl's Law: n = number of processors f = fraction of code that is sequential T = time to process entire algorithm sequentially (one processor) Note: total execution time is

15 (a) A program has a sequential part and a parallelizable part (b) Effect of running the parallelizable part on a multi- processor architecture

16 Software parallelism Example: An algorithm takes 10 seconds to execute on a single 2.4G processor. 40% of the algorithm is sequential. Assuming zero latency and perfect parallelism in the remaining code, how long should the algorithm take on a 16 x 2.4G processor parallel machine? Therefore the expected time is 10 / (16 / 7) = 4.375 seconds Another way: (.4 x 10) + (.6 x 10) / 16 Seq. + Parallel

17 Software parallelism Assuming perfect scalability, what are the implications of Amdahl’s Law when n   ? Assuming perfect scalability, what are the implications of Amdahl’s Law when n   ? speedup  1/f(assuming f  0) speedup  1/f(assuming f  0) Therefore, if f =.4, parallelism can never make the program run more than 2.5 times as fast.

18 Software parallelism Parallel system libraries Parallel system libraries Precompiled functions designed for multiprocessing (e.g., matrix transformations) Precompiled functions designed for multiprocessing (e.g., matrix transformations) Functions for control of communication (e.g., background printing) Functions for control of communication (e.g., background printing) Application programming languages Application programming languages Built-in functions for creating child processes, threads, parallel looping, etc. Built-in functions for creating child processes, threads, parallel looping, etc. Mostly imperative Mostly imperative Operating systems Operating systems

19 Software issues: In order to really take advantage of hardware parallelism … 1. Control models Single instruction thread Single instruction thread Multiple instruction threads Multiple instruction threads Single data set Single data set Multiple data sets Multiple data sets SISD, SIMD, MISD, MIMD SISD, SIMD, MISD, MIMD Software (including OS, compilers, etc.) must be designed to use the features Software (including OS, compilers, etc.) must be designed to use the features

20 Research Area !! UMA:Uniform Memory Access NUMA:Non-Uniform Memory Access CC-NUMA: Coherent Cache NUMA NC-NUMA: Non-Cache NUMA COMA:Cache-Only Memory Access MPP:Massively Parallel Processing COW:Cluster of Workstations

21 Software issues: In order to really take advantage of hardware parallelism … 2. Granularity of parallelism At what levels is parallelism implemented? At what levels is parallelism implemented? 3. Computational paradigms Pipelining Pipelining Divide and conquer Divide and conquer Phased computation Phased computation Replicated worker Replicated worker

22 (a) Pipelining(b) Phased Computation (c) Divide and Conquer (d) Replicated Worker Enormous research area !!

23 Software issues: In order to really take advantage of hardware parallelism … 4. Communication methods Shared variables Shared variables Message passing Message passing 5. Synchronization Semaphores, locks, etc. Semaphores, locks, etc.

24 Inovations ExtremeTech ExtremeTech ExtremeTech Learn about 64-bit machines Learn about 64-bit machines Compare several 64-bit architectures, including Compare several 64-bit architectures, including Intel Gulftown, etc. Intel Gulftown, etc. AMD Phenom II X4 955 AMD Phenom II X4 955 Questions on parallelism? Questions on parallelism?

25 Final Exam Date: Monday, March 14 Date: Monday, March 14 Time:10:00 am Time:10:00 am Place:19/128 Place:19/128 Weight:20% of course grade Weight:20% of course grade

26 Exam administration Format: Format: 50% multiple-choice questions 50% multiple-choice questions 50% short-answer, coding, code tracing 50% short-answer, coding, code tracing Bring: Bring: 8.5 x 11 notepage (2-sided) 8.5 x 11 notepage (2-sided) Calculator Calculator NO sharing calculators, notepages NO sharing calculators, notepages

27 Exam content Emphasis is on material covered since Midterm Emphasis is on material covered since Midterm Lectures 12 – 19 Lectures 12 – 19 MASM programming (Programs #4 – 5) MASM programming (Programs #4 – 5) Homeworks #2 - 4 Homeworks #2 - 4 Quizzes #3 - 4 Quizzes #3 - 4 General coverage of earlier topics General coverage of earlier topics

28 "General knowledge" topics Hardware components of CISC/RISC computers ("hypothetical computer") Hardware components of CISC/RISC computers ("hypothetical computer") Assembly to execution phases Assembly to execution phases assembly process assembly process linking / loading linking / loading execution execution Instruction execution cycle Instruction execution cycle

29 Exam Topics Architecture Architecture Gates, circuits Gates, circuits Boolean functions, Boolean identities, truth tables Boolean functions, Boolean identities, truth tables Memory (structure, addressing) Memory (structure, addressing) Buses Buses Parallelism Parallelism Instruction-level Instruction-level Processor-level Processor-level Multi-processor / multi-computer Multi-processor / multi-computer Factors affecting parallelism and speed Factors affecting parallelism and speed Amdahl's law Amdahl's law

30 Topics Interrupts Interrupts General concepts (not MASM-specific) General concepts (not MASM-specific) Expression evaluation Expression evaluation RPN RPN evaluation evaluation conversion infix  postfix conversion infix  postfix Using 0-address instructions Using 0-address instructions

31 Exam Topics MASM MASM Procedures (create, document, call, etc.) Procedures (create, document, call, etc.) Structure Structure Return address, etc. Return address, etc. call, ret, system stack call, ret, system stack stack frame (activation record) stack frame (activation record) Passing parameters Passing parameters in registers (value, address) in registers (value, address) on the system stack (value, address) on the system stack (value, address) NOT covered: NOT covered: USES, local variables USES, local variables

32 Topics MASM MASM Arrays Arrays addressing modes addressing modes OFFSET, TYPE, PTR, LENGTHOF, SIZEOF OFFSET, TYPE, PTR, LENGTHOF, SIZEOF address calculations address calculations Stack management Stack management Macros Macros String processing String processing NOT covered: NOT covered: “low-level” features of MASM “low-level” features of MASM

33 Questions? Course objectives evaluation Course objectives evaluation

34 I have the ability to 1. 1.Identify the major components of CISC and RISC architectures, and explain their purposes and interactions. 2. 2.Simulate the internal representation of data, and show how data is stored and accessed in memory. 3. 3.Explain the relationships between a hardware architecture and its instruction set, and simulate micro- programs. 4. 4.Explain the Instruction Execution Cycle. 5. 5.Explain the differences among high-level, assembly, and machine languages. 6. 6. Write well-modularized computer programs in an assembly language, implementing decision, repetition, and procedures. 7. 7. Use a debugger, and explain register contents. 8. 8. Explain how the system stack is used for procedure calls and parameter passing. 9. 9. Explain how editors, assemblers, linkers, and operating systems enable computer programming. 10. 10. Explain various mechanisms for implementing parallelism in hardware/software. Strongly Disagree Strongly Agree ABCDE On-line Moodle questionaire complete by Monday


Download ppt "Wednesday, March 09 Homework #4 Homework #4 Questions? Questions? Solutions will be posted Solutions will be posted Quiz #4 Quiz #4 Later today Later today."

Similar presentations


Ads by Google