Presentation is loading. Please wait.

Presentation is loading. Please wait.

Computer architecture II

Similar presentations


Presentation on theme: "Computer architecture II"— Presentation transcript:

1 Computer architecture II
Introduction Computer Architecture II

2 Computer Architecture II
Recap Importance of parallelism Architecture classification Flynn (SISD, SIMD, MISD, MIMD) Memory access (SM, MP) Clusters Grids Top500 Computer Architecture II

3 Computer Architecture II
Today's plan Parallel Architecture convergence (Culler’s classification) Shared Memory (Single Address Space) Message Passing Data parallel (SIMD) Data flow Systolic Computer Architecture II

4 Convergence of Architectural Models
Culler`s classification of parallel architectures Shared Address Space Message Passing Data Parallel Others: Dataflow Systolic Arrays Examine programming model, motivation, intended applications, and contributions to convergence Computer Architecture II

5 Where is Parallel Arch Going?
Old view: Divergent architectures, no predictable pattern of growth. Application Software System Software Systolic Arrays SIMD Architecture Message Passing Dataflow Shared Memory Uncertainty of direction paralyzed parallel software development! Computer Architecture II

6 NEW VIEW: Convergence of parallel architectures
Systolic Arrays SIMD Generic Architecture Message Passing Dataflow Shared Memory Computer Architecture II

7 Computer Architecture II
Parallel computer Last class definition: “A parallel computer is a collection of processing elements that cooperate to solve large problems fast” Extend the sequential computer architecture with a communication architecture Computer architecture has 2 important aspects Abstractions: hardware/software, user/system Implementation of these abstractions Communication architecture as well Abstractions: communication and synchronization operations Implementations of these abstractions Programming model Abstractions Computer Architecture II

8 Modern Layered Framework
CAD Multipr ogramming Shar ed addr ess Message passing Data parallel Database Scientific modeling Parallel applications Pr ogramming models Communication abstraction User/system boundary Compilation or library Operating systems support Communication har dwar e Physical communication medium Har e/softwar e boundary Layers of architectural abstraction Computer Architecture II

9 Computer Architecture II
Programming Model What programmer uses in coding applications Specifies communication and synchronization Examples: Multiprogramming: no communication or synch. at program level Shared address space: like bulletin board Message passing: like letters or phone calls, explicit point to point Data parallel: global simultaneous actions on data Implemented with shared address space or message passing Computer Architecture II

10 Modern Layered Framework
CAD Multipr ogramming Shar ed addr ess Message passing Data parallel Database Scientific modeling Parallel applications Pr ogramming models Communication abstraction User/system boundary Compilation or library Operating systems support Communication har dwar e Physical communication medium Har e/softwar e boundary Layers of architectural abstraction Computer Architecture II

11 Communication Abstraction
Programming model is built on communication abstraction Possibilities Supported directly by hardware OS (sockets) user software Combination OS/hardware: page fault + OS handler Earlier Communication abstraction oriented toward programming model Today: Compilers and software play important roles as bridges today (MPI/OpenMP) Computer Architecture II

12 Shared Address Space (SAS) Architectures
Any processor can directly reference any memory location Communication occurs implicitly as result of loads and stores Convenient: Location transparency Similar programming model to time-sharing on uniprocessors Except processes run on different processors Naturally provided on wide range of platforms History dates at least to precursors of mainframes in early 60s Wide range of scale: few to hundreds of processors Popularly known as shared memory machines or model memory may be physically distributed among processors UMA NUMA Computer Architecture II

13 SAS-UMA (Uniform Memory Access)
Any processor can directly reference any memory location Theoretical same access time for all accesses M1 M2 Mk Interconnect P1 P2 Pn Computer Architecture II

14 SAS-NUMA (Non Uniform Memory Access)
Any processor can directly reference any memory location (including the memory of remote processors) NI integrated into the memory system IMPORTANT DIFFERENCE! For message passing machines P1 can not access M2 (NI integrated into the I/O system) Different access times for local and remote memory P1 P2 Pn M1 M2 Mn PE1 PEn PE2 Interconnect Computer Architecture II

15 Computer Architecture II
SAS Memory Model Process: virtual address space plus one or more threads of control Portions of address spaces of processes are shared S t o r e P 1 2 n L a d p i v Virtual address spaces for a collection of processes communicating via shared addresses Machine physical address space Shared portion of address space Private portion Common physical addresses Writes to shared address visible to other threads (in other processes too) Natural extension of uniprocessors model: Communication: R/W the memory Synchronization: special atomic operations (we come back later) ONE OS uses shared memory to coordinate processes Computer Architecture II

16 Communication Hardware
Natural extension of uniprocessor Already have processor, one or more memory modules and I/O controllers connected by hardware interconnect of some sort Memory capacity increases by adding modules I/O by adding Controllers Add processors for processing! Computer Architecture II

17 Computer Architecture II
History “Mainframe” approach Motivated by multiprogramming Extends crossbar used for more memory and I/O bandwidth Bandwidth scales with nr of processors Originally processor cost high later, cost of crossbar: use multistage Multistage Reduces the incremental cost Increased latency “Minicomputer” approach Almost all microprocessor systems have bus Used heavily for parallel computing Called symmetric multiprocessor (SMP) Latency larger than for uniprocessor Bus is bandwidth bottleneck caching is key: coherence problem Low incremental cost M M M M P P I/O I/O $ $ Computer Architecture II

18 SAS UMA Example: Intel Pentium Pro Quad
All coherence and multiprocessing in the processor module Highly integrated, targeted at high volume Low latency and bandwidth Computer Architecture II

19 SAS UMA Example: SUN UltraSPARC-based Enterprise
16 cards of either type: processors + memory, or I/O All memory accessed over bus, so symmetric Higher bandwidth, higher latency bus Computer Architecture II

20 Computer Architecture II
NUMA PE1 PE2 PEn P1 P2 Pn C1 C2 Cn M1 M2 Mn Interconnect Computer Architecture II

21 SAS-NUMA Example: Cray T3E
Scale up to 1024 processors, 480MB/s links Memory controller generates comm. request for non- local references (no local caching, SGI Origin has) No hardware mechanism for coherence (SGI Origin has) Computer Architecture II

22 Message Passing Architectures
High-level block diagram similar to distributed-memory SAS NIC integrated into I/O system, needn’t be into memory system Clusters, but tighter integration Easier to build than scalable SAS P1 P2 Pn M1 M2 Mn PE1 PEn PE2 Interconnect Computer Architecture II

23 Message Passing Architectures
Communication via explicit I/O operations In SAS case through memory accesses Programming model: directly access only private address space (local memory) comm. via explicit messages (send/receive) farther from hardware operations Library (MPI) OS intervention ( ex: page fault in page DSM) Computer Architecture II

24 Message-Passing Abstraction
Pr ocess P Q Addr ess Y X Send X, Q, t Receive , t Match Local pr addr ess space Send specifies buffer to be transmitted and receiving process Recv specifies sending process and a buffer to receive into Optional tag on send and matching rule on receive Many overheads: copying, buffer management, protection Computer Architecture II

25 Evolution of Message- Passing Machines
Early machines: Store and forward FIFO on each link Hardware close to programming model synchronous ops Only neighboring nodes named! Replaced by DMA, enabling non-blocking ops Buffered by system at destination until recv Diminishing role of topology topology less important all nodes named pipelined wormhole routing (asynchronous MP) Cost is in node-network interface Simplifies programming (earlier you had to map your program on the topology) Computer Architecture II

26 Computer Architecture II
Example: IBM SP-2 Made out of essentially complete RS6000 workstations Network interface integrated in I/O bus (bw limited by I/O bus) 8X8 Crossbar switch Computer Architecture II

27 Computer Architecture II
Example Intel Paragon Computer Architecture II

28 SAS & MP Architectural Convergence
SAS machines SOFTWARE: MP Send/recv supported via buffers HARDWARE: At lower level, even hardware SAS passes hardware messages MP machines SOFTWARE: Constructed SAS global address space on MP (software DSM) Page-based (or finer-grained) shared virtual memory HARDWARE: Tighter NI integration even for MP (low-latency, high-bandwidth) Due to the mergence of fast system area networks (SAN) Traditional NI integrated into the memory system for SAS NUMA systems Clusters of SMP workstations Computer Architecture II

29 Data Parallel Systems (SIMD)
Architectural model SIMD: Array of many simple, cheap processors with little memory each Processors don’t sequence through instructions Attached to a control processor that issues instructions Specialized and general communication, cheap global synchronization Original motivations Matches simple differential equation solvers We’ll see Ocean Current simulation Centralize high cost of instruction fetch/sequencing Computer Architecture II

30 Computer Architecture II
Data Parallel Systems Programming model Operations performed in parallel on each element of data structure Logically single thread of control, performs sequential or parallel steps Conceptually, a processor associated with each data element After a phase of computation all the processors synchronize Computer Architecture II

31 Application of Data Parallelism
Each PE contains an employee record with his/her salary Each PE has a condition flag: execute or not the instruction Ex: work in parallel on several employer records If salary > 100K then salary = salary *1.05 else salary = salary *1.10 Logically, the whole operation is a single step Some processors enabled for arithmetic operation, others disabled Other examples: Differential equations, linear algebra, ... Document searching, graphics, image processing, ... Last machines: Thinking Machines CM-1, CM-2 (and CM-5) Maspar MP-1 and MP-2 Computer Architecture II

32 Data parallel machines evolution
Architecture disappeared today, but programming model still popular Popular when cost savings of centralized sequencer high 60s when CPU was a cabinet Replaced by vectors in mid-70s More flexible memory layout and easier to manage No need to map the problem on the infrastructure Revived in mid-80s when 32-bit datapath fit on chip Modern microprocessors more attractive today Other reasons for demise Simple, regular applications have good locality, can do well anyway Loss of applicability due to hardwiring data parallelism MIMD machines as effective for data parallelism and more general Computer Architecture II

33 Computer Architecture II
Convergence Programming model Still exists separated from hardware converges to SPMD (single program multiple data) Map local data structure on the HW machine model HPF, OpenMP Needed fast global synchronization Global address space, implemented with either SAS or MP Computer Architecture II

34 Dataflow Architectures
Represent computation (program) as a graph of essential dependences Logical processor at each node, activated by availability of operands Message (tokens) carrying tag of next instruction sent to next processor Tag compared with others in matching store; match fires execution Computer Architecture II

35 Data-flow architectures
Key characteristics Name operations anywhere in the machine Support synchronization for independent ops Dynamic scheduling at machine level The architectures demised Problems Operations have locality across them, useful to group together Handling complex data structures like arrays Complexity of matching store and memory units Expose too much parallelism Too fine-grained Hurts locality Computer Architecture II

36 Data-flow architectures convergence
Converged to use conventional processors and memory Support for large, dynamic set of threads to map to processors Typically shared address space as well Separation of programming model from hardware (like data-parallel) Lasting contributions: Integration of communication with thread (handler) generation Tightly integrated communication and fine-grained synchronization Data-flow useful concept for software (compilers etc.) Computer Architecture II

37 Systolic Architectures
Replace single processor with array of regular processing elements Orchestrate data flow for high throughput with less memory access Different from pipelining Nonlinear array structure, multidirection data flow, each PE may have (small) local instruction and data memory Different from SIMD: each PE may do something different Initial motivation: VLSI enables inexpensive special-purpose chips Represent algorithms directly by chips connected in regular pattern Computer Architecture II

38 Systolic Arrays (contd.)
Example: Systolic array for 1-D convolution y ( i ) = w 1 x ) + 2 + 1) + 3 + 2) + 4 + 3) 8 7 6 5 in out out = in + = Practical realizations (e.g. iWARP CMU-Intel) use quite general processors Enable variety of algorithms on same hardware Dedicated interconnect channels Data transfer directly from register to register across channel Specialized, and same problems as SIMD General purpose systems work well for same algorithms (locality etc.) Computer Architecture II

39 Recap: Generic Multiprocessor Architecture
Computer Architecture II


Download ppt "Computer architecture II"

Similar presentations


Ads by Google