Computer Science and Engineering Parallel and Distributed Processing CSE 8380 April 28, 2005 Session 29
Computer Science and Engineering Course Outline 1. Introduction and Motivation 2. Architecture and Performance 3. Algorithms and Programming 4. Task Scheduling 5. Advanced Topics
Computer Science and Engineering Parallel and Distributed Processing Uniprocessor Multiprocessor Speed-upQuality-upSharing Physical limitations N processors cooperate to solve a single computational task
Computer Science and Engineering Motivation Uniprocessor systems are not capable of delivering solutions to some problems in reasonable time Multiple processors cooperate to jointly execute a single computational task in order to speed up its execution Speed-up versus Quality-up
Computer Science and Engineering Parallel versus Sequential Computing Multiple threads of control vs. single thread of control Partitioning for concurrent execution Task Scheduling Synchronization Performance
Computer Science and Engineering A model is an interface separating high level properties from low level ones Modeling Applications Architectures Provides operations Requires implementation MODEL
Computer Science and Engineering Leopold’s View of the Field Numerous Application Programs Concrete Architectures PthreadsJava Threads OpenMP Skeletons MPI PVM Threads Shared Memory Message Passing Distributed SM Cluster SMPCC-NUMAATMMyrinet Hiding Details High Low
Computer Science and Engineering 3 Layers Applications Parallel Tools Architecture Smart Compiler
Computer Science and Engineering Course Outline 1. Introduction and Motivation 2. Architecture and Performance 3. Algorithms and Programming 4. Task Scheduling 5. Advanced Topics
Computer Science and Engineering Past Trends in Parallel Architecture (inside the box) Completely custom designed components (processors, memory, interconnects, I/O) Longer R&D time (2-3 years) Expensive systems Quickly becoming outdated Bankrupt companies!!
Computer Science and Engineering New Trends in Parallel Architecture (outside the box) Advances in commodity processors and network technology Network of PCs and workstations connected via LAN or WAN forms a Parallel System Network Computing Compete favorably (cost/performance) Utilize unused cycles of systems sitting idle
Computer Science and Engineering Architecture Three major Components Processors Memory Modules Interconnection Network
Computer Science and Engineering Parallel and Distributed Computers MIMD Shared Memory Bus based Switch based CC-NUMA MIMD Distributed Memory SIMD Computers Clusters Grid Computing
Computer Science and Engineering MIMD Shared Memory Systems Interconnection Networks MMMM PPPPP
Computer Science and Engineering MIMD Distributed Memory Systems Interconnection Networks MMMM PPPP
Computer Science and Engineering SIMD Computers Processor Memory P M P M P M P M P M P M P M P M P M P M P M P M P M P M P M P M von Neumann Computer Some Interconnection Network
Computer Science and Engineering Clusters M C P I/O OS M C P I/O OS M C P I/O OS Middleware Programming Environment Interconnection Network
Computer Science and Engineering Grids Grids are geographically distributed platforms for computation. They provide dependable, consistent, pervasive, and inexpensive access to high end computational capabilities.
Computer Science and Engineering Interconnection Network Taxonomy Interconnection Network Static Dynamic Bus-basedSwitch-based 1-D2-DHC SingleMultiple SSMS Crossbar
Computer Science and Engineering Course Outline 1. Introduction and Motivation 2. Architecture and Performance 3. Algorithms and Programming 4. Task Scheduling 5. Advanced Topics
Computer Science and Engineering Speedup, Efficiency, Utilization Amdahl’s Law The Gustafson-Barsis Law Benchmarks Performance Evaluation
Computer Science and Engineering Course Outline 1. Introduction and Motivation 2. Architecture and Performance 3. Algorithms and Programming 4. Task Scheduling 5. Advanced Topics
Computer Science and Engineering PRAM Model Synchronized Read Compute Write Cycle EREW ERCW CREW CRCW Complexity: T(n), P(n), C(n) Control Private Memory P1P1 Private Memory P2P2 Private Memory PpPp Global Memory
Computer Science and Engineering Parallel Algorithms Sorting on CRCW PRAM Computing sum on EREW PRAM Computing all partial sums on EREW PRAM Matrix Multiplication on CREW Other Algorithms
Computer Science and Engineering Distributed Algorithms Message Passing Model Complexity Analysis Leader Election
Computer Science and Engineering Course Outline 1. Introduction and Motivation 2. Architecture and Performance 3. Algorithms and Programming 4. Task Scheduling 5. Advanced Topics
Computer Science and Engineering Parallel Virtual Machine (PVM) Environment & Application Structure Task Creation Task Groups Communication Synchronization Reduction operations Work Assignments
Computer Science and Engineering Course Outline 1. Introduction and Motivation 2. Architecture and Performance 3. Algorithms and Programming 4. Task Scheduling 5. Advanced Topics
Computer Science and Engineering Task Scheduling Model Program tasks Machine Schedule Execution and communication time Problem Complexity
Computer Science and Engineering Scheduling System ConsumersResourcesScheduler Policy
Computer Science and Engineering Task Graph A 10 D 15 E 10 F 20 B 15 C 10 G 15 H I
Computer Science and Engineering Scheduling Algorithms Optimal Algorithms (w/o communication) 1. Task graph is in-forest or out-forest 2. Task graph is an interval order 3. two processors Optimal Algorithms (with communication) 1. Task graph is in-forest or out-forest on 2 procs 2. Task graph is an interval order Many heuristics
Computer Science and Engineering Course Outline 1. Introduction and Motivation 2. Architecture and Performance 3. Algorithms and Programming 4. Task Scheduling 5. Advanced Topics
Computer Science and Engineering Good Luck to You!!