并行程序设计 Programming for parallel computing 张少强 QQ: 249104218 ( 第一讲： 2011 年 9 月.

并行程序设计 Programming for parallel computing 张少强 sqzhang@163.com http://bioinfo.uncc.edu/szhang QQ: 249104218 http://renren.com/kindkid ( 第一讲： 2011 年 9 月 16 日，博理楼 B204)

References 当当有卖 75 折 Barry Wilkinson & M. Allen 主讲教材 MPI 编程实例教材

Parallel Computing 1.The use of multiple computers, or computers with multiple internal processors, to solve a problem at a greater computational speed than using a single computer. 2.Offers opportunity to tackle problems that could not be solved in a reasonable time otherwise. 3.Also can tackle problems with: Higher precision More memory requirements.

1. Multiple interconnected computers Cluster Computing - A form of parallel computing in which the computing platform is a group of interconnected computers (a cluster) For this course, we will use a small dedicated departmental cluster ( 59.67.76.156 ) consisting of 8 nodes: –8-core Xeon processors, all interconnected thro a local Ethernet switch.( 通过以太网高速连接 ) –Programming is normally done using the message– passing interface (MPI).

2. A computer system with multiple internal processors Shared memory multiple processor system - Multiple processors connected internally to a common main memory. Multi-core processor - a processor with multiple internal execution units on one chip (a form of shared memory multiprocessor). For this course, we will use the cluster as it has both types. Programming uses a shared memory thread model.

Prerequisites Data Structures Basic skills in C What a computer consists of (-- processors and memory and I/O).

Course Contents Parallel computers: architectural types, shared memory, message passing, interconnection networks, potential for increased speed Message passing: MPI message passing APIs, send, receive, collective operations. Running MPI programs on a cluster. Basic parallel programming techniques: 1.Embarrassingly parallel computations( 易并行计算） 2.Partitioning and divide and conquer( 划分，分治策略 ) 3.Pipelined computations( 流水线计算 ) 4.Synchronous computations( 同步计算 ) 5.Load balancing and termination detection （负载平衡与终止检测）

Course Contents (Continued) 共享存储器程序设计 Shared memory architectures: Hyperthreaded, multi- core, many core. Programming with shared memory programming: Specifying parallelism, sharing data, critical sections, threads, OpenMP. Running threaded/OpenMP programs on multi-core system. CPU-GPU systems: Architecture, programming in CUDA, issues for achieving high performance.

Course Contents (Continued) Algorithms and applications: Selection from: Sorting algorithms Searching algorithms Numerical algorithms Image processing algorithms

Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer

Conventional Computer Consists of a processor executing a program stored in a (main) memory: Each main memory location located by its address. Addresses start at 0 and extend to 2 b - 1 when there are b bits (binary digits) in address. Main memory Processor Instructions (to processor) Data (to or from processor)

Shared Memory Multiprocessor System Natural way to extend single processor model - have multiple processors connected to multiple memory modules, such that each processor can access any memory module: Processors Processor-memory Interconnections Memory module One address space

Simplistic view of a small shared memory multiprocessor Examples: Dual Pentiums Quad Pentiums ProcessorsShared memory Bus( 总线 )

Real computer system have cache memory between the main memory and processors. Level 1 (L1) cache and Level 2 (L2) cache. Example Quad Shared Memory Multiprocessor Processor L2 Cache Bus interface L1 cache Processor L2 Cache Bus interface L1 cache Processor L2 Cache Bus interface L1 cache Processor L2 Cache Bus interface L1 cache Memory controller Memory Processor/ memory bus Shared memory Since L1 cache is usually inside package and L2 cache outside package, dual-/multi-core processors usually share L2 cache.

Single quad core shared memory multiprocessor L2 Cache Memory controller Memory Shared memory Chip Processor L1 cache Processor L1 cache Processor L1 cache Processor L1 cache Intel Core i7

Multiple quad-core multiprocessors Memory controller Memory Shared memory L2 Cache possible L3 cache Processor L1 cache Processor L1 cache Processor L1 cache Processor L1 cache Processor L1 cache Processor L1 cache Processor L1 cache Processor L1 cache …

Programming Shared Memory Multiprocessors 1.Pthreads libraries ( 库 ): Programmer decomposes program into individual parallel sequences, (threads), each being able to access shared variables declared outside threads. pthread_create() pthread_join() Pthread_exit() 2.OpenMP: Higher level library functions and preprocessor compiler directives to declare shared variables and specify parallelism. OpenMP 由小型的编译器命令集、一个扩展的小型库函数、和 C/C++ 和 Fortran 基本语言环境组成。 #progma omp directive_name …

Programming Shared Memory Multiprocessors 3.Use a modified sequential programming language -- added syntax to declare shared variables and specify parallelism. Example UPC (Unified Parallel C) - needs a UPC compiler. 4.Use a specially designed parallel programming language -- with syntax to express parallelism. Compiler automatically creates executable code for each processor (not now common). 5.Use a regular sequential programming language such as C and ask parallelizing compiler to convert it into parallel executable code. Also not now common.

Message-Passing Multicomputer Complete computers connected through an interconnection network: Processor Interconnection network Local Computers Messages memory

Networked Computers as a Computing Platform A network of computers became a very attractive alternative to expensive supercomputers and parallel computer systems for high-performance computing in early 1990s. Several early projects. Notable: – Berkeley NOW (network of workstations) project. –NASA Beowulf project.

Key advantages: Very high performance workstations and PCs readily available at low cost. The latest processors can easily be incorporated into the system as they become available. Existing software can be used or modified.

Beowulf Clusters A group of interconnected “commodity” computers achieving high performance with low cost. Typically using commodity interconnects - high speed (Gigabit) Ethernet, and Linux OS.

Dedicated cluster with a master node and compute nodes User Master node Compute nodes Dedicated Cluster Ethernet interface Switch External network Computers Local network

Software Tools for Clusters Each node has a copy of OS: linux Save apps in master node, master node can be set as a file server to manage network file system MPI installed in master node Based upon message passing programming model User-level libraries provided for explicitly specifying messages to be sent between executing processes on each computer. Use with regular programming languages (C, C++,...).

MPI ( Message-passing interface) Next step: Learn the message passing programming model, some MPI routines, write a message-passing program and test on the cluster. To be continued … …^_^

并行程序设计 Programming for parallel computing 张少强 QQ: 249104218 ( 第一讲： 2011 年 9 月.

Similar presentations

Presentation on theme: "并行程序设计 Programming for parallel computing 张少强 QQ: 249104218 ( 第一讲： 2011 年 9 月."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

并行程序设计 Programming for parallel computing 张少强 QQ: 249104218 ( 第一讲： 2011 年 9 月.

Similar presentations

Presentation on theme: "并行程序设计 Programming for parallel computing 张少强 QQ: 249104218 ( 第一讲： 2011 年 9 月."— Presentation transcript:

Similar presentations

About project

Feedback