Presentation is loading. Please wait.

Presentation is loading. Please wait.

Programming the Cell Multiprocessor Işıl ÖZ. Outline Cell processor – Objectives – Design and architecture Programming the cell – Programming models CellSs.

Similar presentations


Presentation on theme: "Programming the Cell Multiprocessor Işıl ÖZ. Outline Cell processor – Objectives – Design and architecture Programming the cell – Programming models CellSs."— Presentation transcript:

1 Programming the Cell Multiprocessor Işıl ÖZ

2 Outline Cell processor – Objectives – Design and architecture Programming the cell – Programming models CellSs

3 Cell Processor Cell Broadband Engine Architecture – Cell BE Developed by STI (SCEI-Toshiba-IBM) design center – STI formed in 2000 – STI design center opened in 2001 – Introduced in 2005 – 65 nm in 2007, 45 nm in 2008

4 Cell Processor Objectives Outstanding performance especially on game/multimedia applications – Memory latency – Power efficiency – Processor frequency and pipeline depth Real time response to the user and the network Applicable to a wide range of platforms Support for introduction in 2005

5 Cell Architecture a 64-bit Power processor element (PPE) 8 synergistic processor elements (SPE) Memory controller Bus-interface controller Element interconnect bus

6 Power Processor Elements PPE – Power core – First level cache L1 – Second level cache L2

7 PPE Major Units

8 Synergistic Processor Elements SPEs – DMA (Direct Memory Access Unit) – LS (Local Store Memory) – SXUs (Execution Units)

9 SPE Organization

10 Controllers Memory Interface Controller – interfaces to the Rambus XDR I/O unit which communicates directly to DRAM modules Bus Interface Controller – interfaces to the Rambus FlexIO which provides to communicate with system components

11 Element Interconnect Bus EIB – Coherent, on-chip bus – Connects the processing elements, memory and I/O devices

12 Programming the Cell Local store memory in SPEs (256KB) SIMD nature of dataflows The size of the register file (128 bits) Single program context

13 Programming Models Function offload model Device extension model Computational acceleration model Streaming models Shared-memory multiprocessor model Asymmetric thread runtime model

14 A programming model:CellSs Cell superscalar – Simple and flexible – Automatic parallelism of sequential program – Task scheduling and data handling

15 CellSs Structure Based on – code annotations – C language Composed of – Source compiler – Runtime library

16 CellSs Compilation Environment

17 CellSs Compiler Source to source compiler – Function(task) to be executed in the SPEs – Function parameter directions – Parameters that are arrays and their lengths No pointers!

18 Parallelism on CellSs Annotated code Generated code for the PPE Generated code for the SPE

19 CellSs Syntax Three types of pragmas – initialization and finalization css start and css finish – task css task [input inout output] – synchronization css wait

20 Example CellSs Source Code start/finish task wait for task

21 CellSs Runtime Execute function – Add a node in task graph – Data dependency analysis (RaW, WaR, Waw) – Parameters renaming – Task submission

22 CellSs Runtime Behavior

23 Middleware for the Cell Task scheduling – task control buffer – task grouping – dynamic scheduling

24 Locality Aware Task Scheduling

25 Tracing Generates Paraver trace files by a tracing component embedded in the CellSs runtime – when the main program enters or exits – when an annotated function is called in the main program – when a task is started or finished

26 Performance Analysis Matmul – Block matrix multiplication TSP – Recursive implementation of Traveling Salesman Problem Cholesky – Block matrix Cholesky factorization

27 Performance Analysis TSP – No data dependency Cholesky – Highly connected data dependency graph

28 Performance Analysis x-axis : timeline y-axis : a thread of the application green : events yellow : communications

29 Performance Analysis yellow : SPE thread DMA transfer brown : SPE executing the task

30 Pros and Cons annotations – simple – but limited data transfer transparently to the user code task dependency analysis rely on other compilers for – code vectorization (SPE performance) – lower level code optimization

31 Related Work OpenMP Accelerated Library Framework (ALF) Thread level synchronization Sequoia Rapidmind Ohara Graphics Processor Units (GPUs)

32 References J. A. Kahle, M. N. Day, H. P. Hofstee, C. R. Johns, T. R. Maeurer, D. Shippy, “Introduction to the Cell multiprocessor”, IBM J. Res. & Dev. Vol. 49 No. 4/5 July/ September 2005. Pieter Bellens, Josep M. Perez, Rosa M. Badia and Jesus Labarta, “CellSs: a Programming Model for the Cell BE Architecture”, Supercomputing Conference, 2006. M. W. Riley, J. D. Warnock, D. F. Wendel, “Cell Broadband Engine processor:Design and implementation”, IBM J. Res. & Dev. Vol. 51 No. 5 September 2007. J. M. Perez, P. Bellens, R. M. Badia, J. Labarta, “CellSs: Making it easier to program the Cell Broadband Engine processor”, IBM J. Res. & Dev. Vol. 51 No. 5 September 2007. http://www.ibm.com/developerworks/power/cell/ www.bsc.es/cellsuperscalar


Download ppt "Programming the Cell Multiprocessor Işıl ÖZ. Outline Cell processor – Objectives – Design and architecture Programming the cell – Programming models CellSs."

Similar presentations


Ads by Google