Download presentation
Presentation is loading. Please wait.
Published byMervyn O’Connor’ Modified over 8 years ago
1
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 May 2, 2006 Session 29
2
Computer Science and Engineering Copyright by Hesham El-Rewini Contents Group workExams AssignmentsProject Presentations Literature Search Lectures
3
Computer Science and Engineering Copyright by Hesham El-Rewini Put-it-all-together Memory System Design Pipeline Design Techniques Multiprocessors Shared Memory Systems Message Passing Systems Multiprocessor Systems-on-Chips Network Computing
4
Computer Science and Engineering Copyright by Hesham El-Rewini Put-it-all-together Memory System Design
5
Computer Science and Engineering Copyright by Hesham El-Rewini Memory Hierarchy CPU Registers Cache Main Memory Secondary Storage Latency Bandwidth Speed Cost per bit
6
Computer Science and Engineering Copyright by Hesham El-Rewini Pentium IV two-level cache Cache Level 1 L1 Cache Level 2 L2 Main Memory Processor
7
Computer Science and Engineering Copyright by Hesham El-Rewini Placement Policies How to Map memory blocks (lines) to Cache block frames (line frames) Blocks (lines) Block Frames (Line Frames) Memory Cache n Direct Mapping n Fully Associative n Set Associative
8
Computer Science and Engineering Copyright by Hesham El-Rewini Direct Mapping 128 129 255 0 1 127 3968 4095 0 1 2 127 Memory Tag cache 0131 5 bits TagBlock frameWord 475
9
Computer Science and Engineering Copyright by Hesham El-Rewini Example – Fully Associate 0 1 4094 4095 0 1 2 127 Memory Tag cache 12 bits TagWord 412
10
Computer Science and Engineering Copyright by Hesham El-Rewini Example – Set Associate 0 1 2 3 126 127 Set 0 Tag cache 7 bits Set 31 32 33 63 0 1 314095 Memory 01 127 124 125 4 TagSetWord 57
11
Computer Science and Engineering Copyright by Hesham El-Rewini Put-it-all-together Pipeline Design Techniques
12
Computer Science and Engineering Copyright by Hesham El-Rewini Pipeline Task 1 2 n Sub-tasks 1 2 n Pipeline Stream of Tasks
13
Computer Science and Engineering Copyright by Hesham El-Rewini 5 Tasks on 4 stage pipeline Task 1 Task 2 Task 3 Task 4 Task 5 1 23 4 5 67 8 Time
14
Computer Science and Engineering Copyright by Hesham El-Rewini Speedup t t t 1 2 n Pipeline Stream of m Tasks T (Seq) = n * m * t T(Pipe) = n * t + (m-1) * t Speedup = n * m/n + m -1
15
Computer Science and Engineering Copyright by Hesham El-Rewini Linear Pipeline Processing Stages are linearly connected Perform fixed function Synchronous Pipeline Clocked latches between Stage i and Stage i+1 Equal delays in all stages Asynchronous Pipeline (Handshaking)
16
Computer Science and Engineering Copyright by Hesham El-Rewini Reservation Table X X X X S1 S2 S3 S4 Time
17
Computer Science and Engineering Copyright by Hesham El-Rewini Non Linear Pipelines Variable functions Feed-Forward Feedback
18
Computer Science and Engineering Copyright by Hesham El-Rewini 3 stages & 2 functions S1 S2 S3 Y X
19
Computer Science and Engineering Copyright by Hesham El-Rewini Reservation Tables for X & Y XXX XX XXX YY Y YYY S1 S2 S3 S1 S2 S3
20
Computer Science and Engineering Copyright by Hesham El-Rewini State Diagram 1 0 1 1 0 1 0 1 1 1 1 1 1 1 1 0 1 1 0 1 1 3 6 8+8+ 6 8+8+ 8+8+ 3*3* 1*1*
21
Computer Science and Engineering Copyright by Hesham El-Rewini Put-it-all-together Multiprocessors Shared Memory Systems Message Passing Systems Multiprocessor Systems-on-Chips Network Computing
22
Computer Science and Engineering Copyright by Hesham El-Rewini Types of Parallelism Single Data Stream Multiple Data Stream Single Instruction Stream SISD Uniprocessors SIMD Array Processors Vector Multiple Instruction Stream MISDMIMD Multiprocessors Multicomputers Flynn’s Taxonomy
23
Computer Science and Engineering Copyright by Hesham El-Rewini Walk 4 miles /hour Bike 10 miles / hour Car-1 50 miles / hour Car-2 120 miles / hour Car-3 600 miles /hour 200 miles 20 hours A B must walk Amdhal’s Law
24
Computer Science and Engineering Copyright by Hesham El-Rewini 10%20%30%40%50%60%70%80%90%99% 0 5 10 15 20 25 Speedup % Serial 1000 CPUs 16 CPUs 4 CPUs Amdahl’s Law
25
Computer Science and Engineering Copyright by Hesham El-Rewini Gustafson – Barsis Law (1988) Gordon Bell Prize Overcoming the conceptual barrier established by Amdahl’s law Scale the problem to the size of the parallel system No fixed size problem
26
Computer Science and Engineering Copyright by Hesham El-Rewini 0 20 40 60 80 100 10%20%30%40%50%60%70%80%90%99% % Serial Speedup Gustafson-Barsis Amdhal Amdahl vs. Gustafson-Barsis
27
Computer Science and Engineering Copyright by Hesham El-Rewini SIMD Systems Processor Memory P M P M P M P M P M P M P M P M P M P M P M P M P M P M P M P M von Neumann Computer Some Interconnection Network One control unit Lockstep All Ps do the same or nothing
28
Computer Science and Engineering Copyright by Hesham El-Rewini MIMD Shared Memory Systems Interconnection Networks MM MM PPPPP P C P C P C P C MMMM Global Memory P C P C P C One global memory Cache Coherence All Ps have equal access to memory
29
Computer Science and Engineering Copyright by Hesham El-Rewini Cache Coherent NUMA Interconnection Network M C P M C P M C P M C P Each P has part of the shared memory Non uniform memory access
30
Computer Science and Engineering Copyright by Hesham El-Rewini MIMD Distributed Memory Systems Interconnection Networks MMMM PPPP 1110 1111 1010 1011 0110 0111 0010 0011 1101 1010 1000 1001 0100 0101 0010 0000 0001 S LAN/WAN No shared memory Message Passing Topology
31
Computer Science and Engineering Copyright by Hesham El-Rewini Cluster Architecture M C P I/O OS M C P I/O OS M C P I/O OS Middleware Programming Environment Interconnection Network Home cluster
32
Computer Science and Engineering Copyright by Hesham El-Rewini Internet Grids Dependable, consistent, pervasive, and inexpensive access to high end computing. Geographically distributed platforms.
33
Computer Science and Engineering Copyright by Hesham El-Rewini Multi-core Gate delay does not reduce much The frequency and performance of each core is the same or a little less than previous generation Generation N Generation N Generation N Technology Generation N Technology Generation N+1
34
Computer Science and Engineering Copyright by Hesham El-Rewini 10 100 1 200320052007200920112013 Increasing HW Threads HT Multi-core Era Scalar and Parallel Applications Many-core Era Massively Parallel Applications From HT to Many-Core Intel predicts 100’s of cores on a chip in 2015
35
Computer Science and Engineering Copyright by Hesham El-Rewini Four Eras 1970198019902000Beyond 2000 Parallelism Level Processor level Machine level (In box) LAN levelWAN levelChip level ArchitectureVectorSMP / MPPClusterGridMulti-Core ThreadsOneMultiple Interconnection Network NoneBus, switch, mesh, hypercube Ethernet, Switch InternetOn Chip SystemCustom CommodityCombinationSoC ProgrammingVector Fortran C*, C-Linda, Occam, many others PVM, MPI, HPF, … MPI, OpenMP, … ?
36
Computer Science and Engineering Copyright by Hesham El-Rewini Degree of Coupling SIMDMIMD Shared Memory Distributed Memory Supported Grain Sizes Communication Speed slowfast finecoarse loose tight SIMDSMPCC-NUMADMPCClusterGridOn Chip!
37
Computer Science and Engineering Copyright by Hesham El-Rewini Good Luck to You!!!
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.