Download presentation
Presentation is loading. Please wait.
Published byDarren Powers Modified over 9 years ago
2
N. Xiong@ GSU Slide 1 Chapter 05 Clustered Systems for Massive Parallelism N. Xiong Georgia State University
3
N. Xiong@ GSU Slide 2 Chapter 05 Review and Introduction
4
N. Xiong@ GSU Slide 3 Chapter 05 Design Objectives of Clusters and MPPs Cluster and MPP System Architectures Design Principles of Clustered Systems Multiple Job Scheduling and Management Virtual Clustering and Resource Provisioning Homework Problems Chapter 04 Main Contents
5
N. Xiong@ GSU Slide 4 Chapter 05 Scalability Packaging Control Homogeneity Security Design Objectives of Clustered Systems
6
N. Xiong@ GSU Slide 5 Chapter 05 Design Objectives of Clustered Systems
7
N. Xiong@ GSU Slide 6 Chapter 05 Fundamental Cluster Design Issues Scalable Performance Single System Image Availability Support Cluster Job Management Internode Communication Fault Tolerance and Recovery Growth of Servers in HPC and HTC Systems
8
N. Xiong@ GSU Slide 7 Chapter 05 Resource-Sharing in Cluster Systems
9
N. Xiong@ GSU Slide 8 Chapter 05 An Idealized Cluster Architecture Conventional databases and OLTP monitors offer users a desktop environment Supports parallel programming based on standard languages and communication libraries A user-interface subsystem combines the advantages of the Web interface and the windows GUI
10
N. Xiong@ GSU Slide 9 Chapter 05 Node Architectures and System Packaging Two types of cluster nodes compute nodes service nodes
11
N. Xiong@ GSU Slide 10 Chapter 05 Compute Node Examples
12
N. Xiong@ GSU Slide 11 Chapter 05 Modular Packaging of IBM BlueGene/L System
13
N. Xiong@ GSU Slide 12 Chapter 05 Cluster System Interconnects
14
N. Xiong@ GSU Slide 13 Chapter 05 High-Bandwidth Interconnects
15
N. Xiong@ GSU Slide 14 Chapter 05 An InfiniBand Cluster Interconnection Network
16
N. Xiong@ GSU Slide 15 Chapter 05 High-bandwidth Interconnects in Top-500 Systems
17
N. Xiong@ GSU Slide 16 Chapter 05 Hardware, Software, and Middleware Support
18
N. Xiong@ GSU Slide 17 Chapter 05 Design Principles of Clusters Single-System-Image (SSI ) Features Single System Single Control Symmetry Location Transparent
19
N. Xiong@ GSU Slide 18 Chapter 05 Design Principles of Clusters Single-System-Image Layers Application Software Layer Hardware or Kernel Layer Middleware Layer
20
N. Xiong@ GSU Slide 19 Chapter 05 Design Principles of Clusters Single-System-Image Composition Single Entry Point Single File Hierarchy Single I/O, Networking, and Memory Space Other Desired SSI Features
21
N. Xiong@ GSU Slide 20 Chapter 05 Single Entry Point
22
N. Xiong@ GSU Slide 21 Chapter 05 Single File Hierarchy It is persistent. It is fault tolerant to some degree. Network File System (NFS) and Andrew File System (AFS).
23
N. Xiong@ GSU Slide 22 Chapter 05 Single File Hierarchy
24
N. Xiong@ GSU Slide 23 Chapter 05 Single I/O, Networking, and Memory Space Single Input/Output Single Networking Single Point of Control Single Memory Space
25
N. Xiong@ GSU Slide 24 Chapter 05 Single I/O, Networking, and Memory Space
26
N. Xiong@ GSU Slide 25 Chapter 05 An Example
27
N. Xiong@ GSU Slide 26 Chapter 05 Other Desired SSI Features Single Job Management System Single User Interface Single Process Space
28
N. Xiong@ GSU Slide 27 Chapter 05 Middleware Support for SSI Clustering
29
N. Xiong@ GSU Slide 28 Chapter 05 High Availability Through Redundancy Reliability Availability Serviceability
30
N. Xiong@ GSU Slide 29 Chapter 05 Availability and Failure Rate
31
N. Xiong@ GSU Slide 30 Chapter 05 Availability Values of Several Representative Systems
32
N. Xiong@ GSU Slide 31 Chapter 05 Redundancy Techniques
33
N. Xiong@ GSU Slide 32 Chapter 05 Fault-Tolerant Cluster Configurations Hot Standby Mutual Takeover Fault-Tolerance
34
N. Xiong@ GSU Slide 33 Chapter 05 Recovery Schemes Backward recovery Forward recovery: in real- time systems
35
N. Xiong@ GSU Slide 34 Chapter 05 Checkpointing and Recovery Techniques Kernel, Library, and Application Levels Checkpoint Overheads Choosing an Optimal Checkpoint Interval
36
N. Xiong@ GSU Slide 35 Chapter 05 Checkpointing Parallel Programs
37
N. Xiong@ GSU Slide 36 Chapter 05 Cluster Job Scheduling and Management Cluster Job Management Issues A user server A job scheduler A resource manager
38
N. Xiong@ GSU Slide 37 Chapter 05 Cluster Job Types Serial jobs Parallel jobs Interactive jobs Batch jobs Foreign jobs
39
N. Xiong@ GSU Slide 38 Chapter 05 Multi-Job Scheduling Schemes
40
N. Xiong@ GSU Slide 39 Chapter 05 Share Cluster Nodes Dedicated Mode Space Sharing Time Sharing
41
N. Xiong@ GSU Slide 40 Chapter 05 Migration Schemes Issues Node Availability Migration Overhead Recruitment Threshold : the amount of time a workstation stays unused before the cluster considers it an idle node
42
N. Xiong@ GSU Slide 41 Chapter 05 Virtual Clustering and Resource Provisioning
43
N. Xiong@ GSU Slide 42 Chapter 05 Five Virtual Cluster Research Projects
44
N. Xiong@ GSU Slide 43 Chapter 05 Live VM Migration and Cluster Management
45
N. Xiong@ GSU Slide 44 Chapter 05 Effect by Live Migration
46
N. Xiong@ GSU Slide 45 Chapter 05 Dynamic Virtual Resource Provisioning
47
N. Xiong@ GSU Slide 46 Chapter 05 Autonomic Adaptation of Virtual Environments
48
N. Xiong@ GSU Slide 47 Chapter 05 Some References and Further Reading
49
N. Xiong@ GSU Slide 48 Chapter 05 Homework Problems
50
N. Xiong@ GSU Slide 49 Chapter 05 Homework Problems
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.