Fundamental of Computer Architecture By Panyayot Chaikan November 01, 2003
Chapter 10 - Introduction to Parallel processing Fundamental of Computer Architecture Chapter 10 แนะนำการประมวลผลแบบ ขนาน Introduction to Parallel processing
Chapter 10 - Introduction to Parallel processing Fundamental of Computer Architecture เนื้อหา แนะนำสถาปัตยกรรมการประมวลผลแบบขนาน มัลติโพรเซสเซอร์ เวกเตอร์คอมพิวเตอร์ คลัส เตอร์ Interconnection network แบบต่างๆ แนะนำการเขียนโปรแกรมแบบขนาน
Chapter 10 - Introduction to Parallel processing Fundamental of Computer Architecture High performance computer Large computing capacity Required to compute large amount of data in a reasonable amount of time Often called Supercomputer
Chapter 10 - Introduction to Parallel processing Fundamental of Computer Architecture Supercomputer Applications Weather forecasting Finite element analysis in structural design Fluid flow analysis Simulation of large complex physical system Computer Aided Design (CAD)
Chapter 10 - Introduction to Parallel processing Fundamental of Computer Architecture Parallel processing Picture from
Chapter 10 - Introduction to Parallel processing Fundamental of Computer Architecture 3 ways to construct Supercomputer Vector processing Multiprocessing Distributed computer system
Chapter 10 - Introduction to Parallel processing Fundamental of Computer Architecture Vector Supercomputing Using fastest possible circuit Wide path for access large main memory Extensive I/O capability Dissipate considerable power and require expensive cooling arrangement Provide excellent performance but at very high price
Chapter 10 - Introduction to Parallel processing Fundamental of Computer Architecture Vector Supercomputing NEC SX5 CRAY CRAY1, Y-MP Fujitsu VP5000 Hitachi SR8000
Chapter 10 - Introduction to Parallel processing Fundamental of Computer Architecture Cray Supercomputer Picture from
Chapter 10 - Introduction to Parallel processing Fundamental of Computer Architecture Multiprocessor Use large number of processor design for workstation or PC market Has an efficient high bandwidth medium for communication among the processor memory I/O Provide High performance but cheaper than vector processing
Chapter 10 - Introduction to Parallel processing Fundamental of Computer Architecture Distributed computer system Using many workstation connected by Local area network Provide large computing capabilities at a reasonable cost
Chapter 10 - Introduction to Parallel processing Fundamental of Computer Architecture Multiprocessing performance Many computation can proceed in parallel Difficulty: the application must be broken down into small task that can be assigned to individual processor Processors must communicate with each other to exchange data
Chapter 10 - Introduction to Parallel processing Fundamental of Computer Architecture Classification of Parallel structure Proposed by Flynn[1966] 4 types of computation SISD SIMD MIMD MISD
Chapter 10 - Introduction to Parallel processing Fundamental of Computer Architecture SISD Single Instruction stream, Single Data stream Used in single-processor computer system
Chapter 10 - Introduction to Parallel processing Fundamental of Computer Architecture SIMD Single Instruction stream, Multiple Data stream Single stream of instruction is broadcast to a number of processor Each processor operates on its own data Each processor has its own memories All processors executes the same program but operate on different data
Chapter 10 - Introduction to Parallel processing Fundamental of Computer Architecture MIMD Multiple Instruction stream, Multiple Data stream Many processor execute a different program and access its own sequence of data
Chapter 10 - Introduction to Parallel processing Fundamental of Computer Architecture MISD Multiple Instruction stream, Single Data stream Common data structure is manipulated by separate processor Each processor executes a different program This form does not occur often in practice
Chapter 10 - Introduction to Parallel processing Fundamental of Computer Architecture Array processing Is the SIMD form of parallel processing Instruction is broadcast from a central processor
Chapter 10 - Introduction to Parallel processing Fundamental of Computer Architecture 2 types of Array processing Use small number of powerful processor ILLIAC-IV: 64 processors, each processor is 64-bit Use large number of very simple processor CM2: processors, each processor is 1-bit MP-1216: processors, each processor is 4-bit Gamma II plus: 4096 processors, each processor is 8- bit
Chapter 10 - Introduction to Parallel processing Fundamental of Computer Architecture Array processing Well suited to numerical problem that can be expressed in matrix or vector format
Chapter 10 - Introduction to Parallel processing Fundamental of Computer Architecture The structure of general- purpose multiprocessors UMA multiprocessor NUMA multiprocessor Distributed memory system
Chapter 10 - Introduction to Parallel processing Fundamental of Computer Architecture A UMA multiprocessor
Chapter 10 - Introduction to Parallel processing Fundamental of Computer Architecture A NUMA multiprocessor
Chapter 10 - Introduction to Parallel processing Fundamental of Computer Architecture A distributed memory system
Chapter 10 - Introduction to Parallel processing Fundamental of Computer Architecture Taxonomy of parallel processing
Chapter 10 - Introduction to Parallel processing Fundamental of Computer Architecture Interconnection network Single bus Crossbar networks Multistage networks Hypercube networks Mesh networks Tree networks Ring networks
Chapter 10 - Introduction to Parallel processing Fundamental of Computer Architecture Crossbar interconnection network
Chapter 10 - Introduction to Parallel processing Fundamental of Computer Architecture Multistage shuffle network
Chapter 10 - Introduction to Parallel processing Fundamental of Computer Architecture A 3-dimensional Hypercube Network
Chapter 10 - Introduction to Parallel processing Fundamental of Computer Architecture A 2-dimensional mesh network
Chapter 10 - Introduction to Parallel processing Fundamental of Computer Architecture Four-way tree network
Chapter 10 - Introduction to Parallel processing Fundamental of Computer Architecture Flat tree network
Chapter 10 - Introduction to Parallel processing Fundamental of Computer Architecture Ring network
Chapter 10 - Introduction to Parallel processing Fundamental of Computer Architecture HP Convex architecture Picture from
Chapter 10 - Introduction to Parallel processing Fundamental of Computer Architecture HP Convex Hypernode Picture from
Chapter 10 - Introduction to Parallel processing Fundamental of Computer Architecture SGI Power Challenge Picture from
Chapter 10 - Introduction to Parallel processing Fundamental of Computer Architecture Clustered Supercomputer Picture from
Chapter 10 - Introduction to Parallel processing Fundamental of Computer Architecture Clusters
Chapter 10 - Introduction to Parallel processing Fundamental of Computer Architecture Benefits of clustering Incremental scalability High availability Superior price/performance
Chapter 10 - Introduction to Parallel processing Fundamental of Computer Architecture Parallel programming Task must be broken down into small task that can be assigned to individual processors at program level Need operating system support Different architecture, different programming method
Chapter 10 - Introduction to Parallel processing Fundamental of Computer Architecture A sequential program to compute the dot product integer array a[1..N], b[1..N] integer dot_product. read a[1..N] from vector_a read b[1..N] from vector_b dot_product := 0 do_dot (a,b) print dot_product. do_dot (integer array x[1..N], integer array y[1..N] for k:= 1 to N dot_product := dot_product + x[k] * y[k] end for end do_dot
Chapter 10 - Introduction to Parallel processing Fundamental of Computer Architecture First attempt of 2- processor computation shared integer array a[1..N], b[1..N] shared integer dot_product shared lock dot_product_lock shared barrier done. read a[1..N] from vector_a read b[1..N] from vector_b dot_product := 0 create_thread (do_dot, a, b) do_dot (a,b) print dot_product. do_dot (integer array x[1..N], integer array y[1..N]) private integer id id := mypid() for k:= (id*N/2)+1 to (id+1)*N/2 lock (dot_product_lock) dot_product := dot_product + x[k] * y[k] unlock (dot_product_lock) end barrier (done) end do_dot
Chapter 10 - Introduction to Parallel processing Fundamental of Computer Architecture An efficient 2-processor computation of a shared memory machine shared integer array a[1..N], b[1..N] shared integer dot_product shared lock dot_product_lock shared barrier done. read a[1..N] from vector_a read b[1..N] from vector_b dot_product := 0 create_thread (do_dot, a, b) do_dot (a,b) print dot_product. do_dot (integer array x[1..N], integer array y[1..N]) private integer local_dot_product private integer id id := mypid() local_dot_product := 0 for k:= (id*N/2)+1 to (id+1)*N/2 local_dot_product := local_dot_product + x[k] * y[k] end lock (dot_product_lock) dot_product := dot_product + local_dot_product unlock (dot_product_lock) barrier (done) end do_dot
Chapter 10 - Introduction to Parallel processing Fundamental of Computer Architecture Performance considerations
Chapter 10 - Introduction to Parallel processing Fundamental of Computer Architecture จบ บทที่ 10