Membrane Computing in Connex Environment 1WMC8 June 2007 Membrane Computing in the Connex Environment Gheorghe Stefan BrightScale Inc., Sunnyvale, CA & Politehnica University of Bucharest
Membrane Computing in Connex Environment 2WMC8 June 2007 Outline Integral Parallel Architecture The Connex Chip The Connex Architecture How to Use the Connex Environment Concluding Remarks
Membrane Computing in Connex Environment 3WMC8 June 2007 Integral Parallel Architecture The Ubiquitousness of Parallelism Asks for Integral Parallel Architectures Partial Recursive Functions & Parallel Computation A Functional Taxonomy of Parallel Computation
Membrane Computing in Connex Environment 4WMC8 June 2007 Parallelism can not be avoided anymore Intel’s approach Multi-processors: the best approach for multi-threading on MIMD architecture Inefficient on SIMD architecture Ignores the MISD architecture Many-processors asking for another taxonomy They work as accelerators They perform critical functions Berkeley’s 13 dwarfs is a functional approach for many-processors Real applications ask for all kind of parallelism to solve corner cases – the places where the devil hides
Membrane Computing in Connex Environment 5WMC8 June 2007 Partial Recursive Functions & Parallel Computation Composition Rule & the Basic Parallel Structures Primitive Recursive Rule Minimalization Rule
Membrane Computing in Connex Environment 6WMC8 June 2007 Composition & the Associated Structure f(x 0, … x n-1 ) = g(h 0 (x 0, … x n-1 ), h 1 (x 0, … x n-1 ), … h m-1 (x 0, … x n-1 )) x 0, … x n-1... f(x 0, … x n-1 ) h0h0 h0h0 h1h1 h1h1 h m-1 g(h 0, h 1, … h m-1 )
Membrane Computing in Connex Environment 7WMC8 June 2007 Data Parallel Composition X = {x 0, … x n-1 } {h(x 0 ), h(x 0 ), … h(x 0 )} x 0 x 1 x n-1... h(x 0 ) h(x 1 ) h(x n-1 ) h h h h h h
Membrane Computing in Connex Environment 8WMC8 June 2007 Speculative Composition function vector: H = [h 0, h 1, … h n-1 ], scalar: x H(x) = {h 0 (x), h 1 (x) … h n-1 (x)} x... h 0 (x) h 1 (x) h n-1 (x) h0h0 h0h0 h1h1 h1h1 h m-1
Membrane Computing in Connex Environment 9WMC8 June 2007 Serial Composition f(x) = g(h(x)) x Time parallelism The general case: f(x) = g 1 (g 2 ( g 3 ( … g p (x) …))) f(x) h h g(h(x))
Membrane Computing in Connex Environment 10WMC8 June 2007 Reduction Composition f(x 0, … x m-1 ) = g(x 0, … x m-1 ) x 0 x 1 … x m-1 g(x 0, … x m-1 ) g(x 0, x 1, … x m-1 )
Membrane Computing in Connex Environment 11WMC8 June 2007 Primitive recursive rule f(x,y) = h(x, f(x, y-1)), where f(x,0) = g(x) f(x,y) = h(x, h(x, h(x, … h(x, g(x) )… ))) Parallel solution makes sense only if the function must be computed many times. Implementations: 1.Data parallel composition 2.Loop in a serial composition
Membrane Computing in Connex Environment 12WMC8 June 2007 Data Parallel Composition for the Primitive Recursive Rule x, Y = {y 0, … y n-1 } {f(x,y 0 ), f(x,y 1 ), … f(x,y n-1 )} (x, y 0 ) (x, y 1 ) (x, y n-1 )... f(x, y 0 ) f(x, y 1 ) f(x, y n-1 ) h h h h h h
Membrane Computing in Connex Environment 13WMC8 June 2007 Serial Composition for the Primitive Recursive Rule x, = = x,... h h h h h h sel
Membrane Computing in Connex Environment 14WMC8 June 2007 Minimalization rule f(x) = min(y)[m(x,y) = 0] Implementations: 1.Speculative composition & reduction composition 2.Serial composition & reduction composition
Membrane Computing in Connex Environment 15WMC8 June 2007 Speculative Composition & Reduction Composition for Minimalization x... {m(x,0), 0} {m(x,n-1), n-1} f(x) = i m(x,0) m(x,1) m(x,n-1) first{0, i}
Membrane Computing in Connex Environment 16WMC8 June 2007 Serial Composition & Reduction Composition for Minimalization y i-1 y i-2 … y i-s selection code y i ( P i : the i-th pipe stage ) Example of dynamic reconfiguration P i-5 selector fifi fifi P i-1 P i-2 P i-3 P i-4 PiPi PiPi
Membrane Computing in Connex Environment 17WMC8 June 2007 Functional Taxonomy of Parallel Computation Data Parallel Computation: uses SIMD-like machines Time Parallel Computation: is a very special sort of MIMD used to compute only one function Speculative Computation: is MISD machine completely ignored by the actual implementations
Membrane Computing in Connex Environment 18WMC8 June 2007 Integral Parallel Architecture An Integral Parallel Architecture (IPA) uses all kinds of parallelism to build a real machine, in two versions: complex IPA: all types of parallel mechanisms tightly interleaved on the same physical structure (pipelined superscalar speculative general purpose processors) intensive IPA: all types of parallel mechanisms highly separated, implemented on specific physical structures (accelerators for embedded computation in a SoC approach)
Membrane Computing in Connex Environment 19WMC8 June 2007 Intensive IPA Intensive IPA are used as accelerators for complex IPA 1.Monolithic intensive IPA: the same machine works in two modes: Data parallel Time parallel 2.Segregated intensive IPA: two distinct machines are used, one for data parallel computation and the other for time parallel (i.e. speculative) computation
Membrane Computing in Connex Environment 20WMC8 June 2007 The Connex Chip The organization of BA1024: multi-core area of 4 MIPS many-core data parallel area of 1024 simple PEs speculative time parallel pipe of 8 PEs interfaces (DDR, PCI, video & audio interfaces for 2 HDTV channels)
Membrane Computing in Connex Environment 21WMC8 June 2007 The Connex System 1 I/O Controller (4KB data & 4KB program memory) Connex Array Connex Array: 1,024 linearly connected 16-bit Processing Cells Sequencer: 32-bit stack machine & program memory & data memory issues in each cycle (on a 2-stage pipe) one 64-bit instruction for Connex Array and a 24-bit instruction for itself IO Controller: 32-bit stack machine controls a 3.2 GB/s IO channel Processing Cell: Integer unit & data memory & Boolean unit I/O channel works in parallel with code running on the Connex Array Connex I/O AUX 16-bit RAM For data Address Boolean Index 16 bit ALU Sequencer (4KB data & 32Kb program memory) 255 R0 R R2 R3 R4 R5 R6 R7
Membrane Computing in Connex Environment 22WMC8 June bit ALU 16 bit ALU 16 bit ALU Connex Array Structure Processing Cells are linearly connected using only the register R0 IO Plan consists in all R1s supervised mainly by the IO Controller Conditional execution based on the state of Boolean unit Integer unit, Boolean unit and Data memory execute in each cycle command fields from a 64- bit instruction issued by Sequencer Vector reduction operations with scalar results in the TOS of Sequencer (receiving through a 3-stage pipe data from the array of cells) R0 R R2 R3 R4 R5 R6 R7 off 1023 on R0 R on 1 R2 R3 R4 R5 R6 R7 255 R0 R R2 R3 R4 R5 R6 R7
Membrane Computing in Connex Environment 23WMC8 June 2007 I/O System I/O Plane Connex Array IOC Switch Fabric (128-bit word) IS Interrupts DDR-DRAM Controller DRAM
Membrane Computing in Connex Environment 24WMC8 June 2007 Configurable Switch Fabric Audio Out Video Out Video Out HOST I/F Audio Out Ext. Bus Audio In Audio In Video In Video In Test ICE PCI v2.2 or Generic 64-bit Wide DRAM 1x-I2S 4xI2S BT.656/1120 Flash 1x-I2S BT.656/1120 1x-I2S BT.656/1120 DDR-DRAM Ctrl (400 MHz Data Rate) EJTAG GPIOI2C S/PDIF Stream Accelerator Host CPU Audio CPU TS/Sec CPU Video CPU Instruction Sequencer Configurable Switch Fabric Test I/O Sequencer ConnexArray™ Programmable Media Processor Multi-Codec Processing Pre-Analysis 3D Filter Scaling Video Merge/Blend Motion Adaptive De-interlacing BA1024 Configurable Switch Fabric
Membrane Computing in Connex Environment 25WMC8 June 2007 The Connex Architecture Vectors & selections Programming Connex Performances
Membrane Computing in Connex Environment 26WMC8 June 2007 Vectors & Selections Linear array of processing elements vectors Local data memory in each processing element array of vectors Data dependency operations at the level of each processing element selections
Membrane Computing in Connex Environment 27WMC8 June 2007 Full Line Operations Line i Line k Line j +, -, *, XOR, etc. = Line k = Line i OP Line j Line k = Line i OP scalar value (repeated for all elements) 16-bit data operand
Membrane Computing in Connex Environment 28WMC8 June 2007 Columns Active Based On Repeating Patterns Line i Line k Line j +, -, *, XOR, etc. = Mark all odd columns active. Or mark every third column active. Or mark every third and fourth column active, etc.
Membrane Computing in Connex Environment 29WMC8 June 2007 Columns Active Based On Data Content Line i Line k Line j +, -, *, XOR, etc. = Apparently random columns are active, marked, based on data-dependent results of previous operations.
Membrane Computing in Connex Environment 30WMC8 June Line i Line j Example: 128 sets of 8x8 run in parallel in a 1024-cell array 7 7 8x8 Outer-Loop Parallelism ……..
Membrane Computing in Connex Environment 31WMC8 June 2007 Programming Connex VectorC is an extension/restriction of C++ Code that operates on scalar data written in regular C notation Connex-specific operators defined as functions for features not available in C++, e.g. operations on vectors and selections (Boolean vectors) VectorC uses sequential operators and control structures on vector and select data-types Using VectorC the Connex Machine is programmed the same way as conventional sequential machines int main() { vector V1 = 2; // V1 = {2, 2, … 2} vector V2 = 3; // V2 = {3, 3, … 3} vector V; // V = {0, 0, … 0} vector Index = indexvector(); // Index = {0, 1, … } V = mm_absdiff(V1, V2); // V = {1,1, … 1} return 0; } // Find the absolute difference between two vectors vector mm_absdiff(vector V1, vector V2) { vector V; V = V1 - V2; WHERE (V < 0) { V = -V; // V = abs(V); } ENDW return V; } Vectors are arrays of scalar components. Selections are arrays of Boolean values that dictate what vector components are active.
Membrane Computing in Connex Environment 32WMC8 June 2007 Overall performances of BA GOP/sec 3.2 GB/sec: external bandwidth 400 GB/sec: internal bandwidth > 60 GOP/Watt > 2 GOP/mm 2 Note: 1 OP = 16-bit simple integer operation (excluding multiplication)
Membrane Computing in Connex Environment 33WMC8 June 2007 How to Use the Connex Environment for Membrane Computation Example (G. Paun): the initial configuration: [ 1 [ 2 [ 3 a f c] 3 ] 2 ] 1... R1: e (e, out), f f R2: b d, d de, ff f, cf cdδ R3: a ab, a bδ, f ff
Membrane Computing in Connex Environment 34WMC8 June 2007 The first example of processing Initial vector: (1,[) (2,[) (3,[) (0,a) (0,f) (0,c) (3,]) (2,]) (1,])... [[[a f c] ] ]... a ab, f ff: [[[a b f f c] ] ]... // 11 clock cycles a ab, f ff: [[[a b b f f f f c] ] ]... // 15 clock cycles a bδ, f ff: [[b b b f f f f f f f f c ] ]... // 27 clock cycles b d, ff f: [[d d d f f f f c ] ]... // 10 clock cycles d de, ff f: [[d e d e d e f f c ] ]... // 10 clock cycles d de, cf cdδ: [d e e d e e d e e d f c ]... // 10 clock cycles e (e, out), f f: [d d d d f c ] e e e e e e... // 15 clock cycles total: 98 clock cycles
Membrane Computing in Connex Environment 35WMC8 June 2007 The second example of processing Initial vector: (1,[) (2,[) (3,[) (1,a) (1,f) (1,c) (3,]) (2,]) (1,])... [[[1a 1f 1c] ] ]... [[[1a 1b 2f 1c] ] ]... // in 5 clock cycles [[[1a 2b 4f 1c] ] ]... // in 5 clock cycles [[3b 8f 1c ] ]... // in 10 clock cycles [[3d 4f 1c ] ]... // in 7 clock cycles [[3d 3e 2f 1c ] ]... // in 8 clock cycles [4d 3e 1f 1c ]... // in 8 clock cycles [4d 1f 1c ] 3e... // in 5 clock cycles total: 48 clock cycles
Membrane Computing in Connex Environment 36WMC8 June 2007 The third example of processing The third membrane is duplicated (multiplicated), but the content can be different [[[1a 1f 1c] [2a 1f 1c] ] ]... [[[1a 1b 2f 1c] [2a 2b 2f 1c] ] ]... // in 5 clock cycles [[[1a 2b 4f 1c] [2a 4b 4f 1c] ] ]... // in 5 clock cycles [[3b 8f 1c 6b 8f 1c ] ]... // in 10 clock cycles [[3d 4f 1c 6d 4f 1c ] ]... // in 7 clock cycles [[3d 3e 2f 1c 6d 6e 2f 1c ] ]... // in 8 clock cycles [4d 3e 1f 1c 7d 6e 1f 1c]... // in 8 clock cycles [4d 1f 1c 7d 1f 1c ] 9e... // in 10 clock cycles total: 53 clock cycles For up to 200 level 3 membranes the number of clock cycles remains 53.
Membrane Computing in Connex Environment 37WMC8 June 2007 Concluding Remarks 1. Functional taxonomy vs. Flynn taxonomy 2. Connex architecture accelerates membrane computation 3. An efficient P-architecture asks for few additional features to the Connex architecture 4. Why not a P-language?
Membrane Computing in Connex Environment 38WMC8 June 2007 Main technical contributors to the Connex project: Emanuele Altieri, BrightScale Inc., CA Lazar Bivolarski, BrightScale Inc., CA Frank Ho, BrightScale Inc., CA Mihaela Malita, St. Anselm College, NH Bogdan Mitu, BrightScale Inc., CA Dominique Thiebaut, Smith College, MA Tom Thomson, BrightScale Inc., CA Dan Tomescu, BrightScale Inc., CA
Membrane Computing in Connex Environment 39WMC8 June 2007 Thank You Mihaela’s webpage on VectorC Q&A