Distributed Programming CA107 Topics in Computing Series Martin Crane Karl Podesta
The Basics….. What is a Distributed System (DS)? How does it differ from a Parallel Computer (MPP)? –differences become fuzzy…now called Supercomputers or High Performance Computers (HPC) Supercomputers and Supermodels: –both expensive –both hard to deal with/prone to tantrums –both look glamorous but... –Both spend lots of time doing tedious tasks for others: mostly matrix-vector products for Supercomputers being live mannequins for Supermodels
Why High Performance Computing? Solve larger and larger scientific problems –advanced product design –economic analysis –weather prediction/ climate modelling Store and process huge amount of data –data mining and knowledge discovery –image processing, multi-media information –internet information storage and search (eg GOOGLE)
Different Supercomputers (MPPs) in Your Neighbourhood Single Instruction, Multiple Data (SIMD) –as seen on PlayStation 2 –very useful for processing large arrays eg a(i) = b(i) + c(i)*d(i) {as are found in games} Multiple Instruction, Multiple Data (MIMD) –as seen in Deep Blue But these are dinosaurs - we want something more flexible
Problems with Traditional Supercomputer (ie MPP) Expensive –Very high starting cost ($10,000s per node) –Expensive software –High maintenance cost –Costly to upgrade Vendor dependent –lots of companies have come and gone (datacube, Connection Machines etc.) So, real/poor people cannot do HPC!
PC Cluster: a poor-man’s supercomputer! built from high-end PCs and high-speed comms network supports standard parallel programming based on message-passing model (MPI language) cheap (16 node cluster can cost less than $10k)
Cluster Diagram Here
DCU CA Cluster Resources “John the Baptist” Cluster –built by Redbrick using old CA machines –24 individual 450MHz machines –connected by a fast ethernet switch –harbinger of better things…. “The one that is to come”…… –24 SMP machines –each with 2 GHz –plus loadsa memory! –arrives about Xmas time, appropriately enough
What are the issues in HPC? Communication Vs Computation –size/ nature of problem –interconnect speed/ processor speed Fault tolerance –quality of hardware –nature of problem Load balancing –nature of problem/ quality of programmer –even an easy problem can be made difficult & slow by a bad implementation
Influence of Nature of Problem on Speed What is speed? –speed up is better: Time on 1 node/ Time on n nodes Speed-up and Problems –very good: embarrassingly parallel problems –fair to middling: regular and synchronous problems a bit of cross-talk between nodes –bad: irregular/ asynchronous problems lots of cross-talk between nodes