Download presentation
Presentation is loading. Please wait.
Published byHelena Wilkerson Modified over 8 years ago
1
COMP7330/7336 Advanced Parallel and Distributed Computing Data Parallelism Dr. Xiao Qin Auburn University http://www.eng.auburn.edu/~xqin xqin@auburn.edu Some slides are adopted from Dr. David E. Culler, U.C. Berkeley 1-1
2
Message-Passing Abstraction – Send specifies buffer to be transmitted and receiving process – Recv specifies sending process and application storage to receive into – Memory to memory copy, but need to name processes – Optional tag on send and matching rule on receive – User process names local data and entities in process/tag space too – In simplest form, the send/recv match achieves pairwise synch event Other variants too – Many overheads: copying, buffer management, protection Process PProcessQ Address Y AddressX SendX, Q, t ReceiveY, P, t Match Local process address space Local process address space 1-2
3
Evolution of Message-Passing Machines Early machines: FIFO on each link – HW close to prog. Model; – synchronous ops – topology central (hypercube algorithms) CalTech Cosmic Cube (Seitz, CACM Jan 95) 1-3
4
Diminishing Role of Topology Shift to general links –DMA, enabling non-blocking ops Buffered by system at destination until recv –Store&forward routing Diminishing role of topology –Any-to-any pipelined routing –node-network interface dominates communication time –Simplifies programming –Allows richer design space grids vs hypercubes Intel iPSC/1 -> iPSC/2 -> iPSC/860 1-4
5
Example Intel Paragon 1-5
6
Example Intel Paragon 1-6
7
Example Intel Paragon Compute Node 1-7
8
Example Intel Paragon Compute Nodes Inside a Rack 1-8
9
Example Intel Paragon Compute (and some I/O) nodes in XP-E rack 1-9
10
Example Intel Paragon I/O Node 1-10
11
Example Intel Paragon I/O Node (cont.) 1-11
12
Example Intel Paragon I/O Node (cont.) 1-12
13
Example Intel Paragon 1-13
14
Building on the mainstream: IBM SP-2 Made out of essentially complete RS6000 workstations Network interface integrated in I/O bus (bw limited by I/O bus) 1-14
15
IBM RS/6000 Enterprise Server model H70 1-15
16
Dual 375 MHz IBM POWER3-II processors on the CPU module 1-16
17
Data Parallel Systems Programming model – Operations performed in parallel on each element of data structure – Logically single thread of control, performs sequential or parallel steps – Conceptually, a processor associated with each data element Architectural model – Array of many simple, cheap processors with little memory each Processors don’t sequence through instructions – Attached to a control processor that issues instructions – Specialized and general communication, cheap global synchronization 1-17
18
Application of Data Parallelism – Each PE contains an employee record with his/her salary If salary > 100K then salary = salary *1.05 else salary = salary *1.10 – Logically, the whole operation is a single step – Some processors enabled for arithmetic operation, others disabled Can you give me another example? Other examples: – Finite differences, linear algebra,... – Document searching, graphics, image processing,... Some recent machines: – Thinking Machines CM-1, CM-2 (and CM-5) – Maspar MP-1 and MP-2, 1-18
19
Data Parallel Evolution Rigid control structure Popular when cost savings of centralized sequencer high – 60s when CPU was a cabinet – Replaced by vectors in mid-70s More flexible w.r.t. memory layout and easier to manage – Revived in mid-80s when 32-bit datapath slices just fit on chip – Revived again with Graphic Processing Units Reasons for demise / renaissance? – Simple, regular applications have good locality, can do well anyway – Loss of applicability due to hardwiring data parallelism MIMD machines as effective for data parallelism and more general 1-19
20
Toward Architectural Convergence Evolution and role of software have blurred boundary – Send/recv supported on SAS machines via buffers – Can construct global address space on MP using hashing – Page-based (or finer-grained) shared virtual memory Hardware organization converging – Tighter NI integration even for MP (low-latency, high-bandwidth) – At lower level, even hardware SAS passes hardware messages Organization converging – Nodes connected by general network and communication assists – Implementations converging 1-20
21
Convergence: Generic Parallel Architecture A generic modern multiprocessor Node: processor(s), memory system, plus communication assist Network interface and communication controller Scalable network Convergence allows lots of innovation, now within framework Integration of assist with node, what operations, how efficiently... 1-21
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.