Martin Kruliš 6. 1. 2015 by Martin Kruliš (v1.0)1.

Martin Kruliš 6. 1. 2015 by Martin Kruliš (v1.0)1

 Adapteva Company ◦ Small fabless semiconductor company ◦ Founded in 2008 ◦ Main objective is to design massively parallel chips with emphasis on power efficiency  First company that designed chip that expects to scale over 1000 cores ◦ Current products  Epiphany processor (16 core and 64 core versions)  Parallela board ◦ Parallela University Program started this year 6. 1. 2015 by Martin Kruliš (v1.0)2

6. 1. 2015 by Martin Kruliš (v1.0)3 16-core Epiphany Coprocessor 1GB SDRAM μUSB 1Gb Ethernet μSD μHDMI μUSB Zyng dual-core ARM-A9 (with integrated FPGA) Expansion Slots

6. 1. 2015 by Martin Kruliš (v1.0)4

 Coprocessor ◦ 32-bit RISC cores with superscalar architecture ◦ 32KB local memory per core (1 cycle latency)  Divided into four independent banks ◦ IEEE754 compliant floating point instruction set ◦ Two DMA channels  eMesh (Network-on-Chip) ◦ Both on chip and off chip communication ◦ No specific API, works with memory transactions  eLink (Chip-to-Chip Links) ◦ 4 I/O ports for external communication 6. 1. 2015 by Martin Kruliš (v1.0)6

 Coprocessor Cores ◦ Simple in-order RISC architecture  Most instructions take 1 cycle  8-stage dual-issue pipeline  Instruction set optimized for signal processing ◦ Separate integer and floating point ALU ◦ 64x 32-bit registers (for both IALU and FPU)  Load store architecture  Per cycle 3/1 FPU and 2/1 IALU accesses, 1 load/store ◦ Performance  16 cores ~ 2Gflops each, 64 cores ~ 1.6 Gflops each 6. 1. 2015 by Martin Kruliš (v1.0)7

 Memory Model ◦ Internal memory of each node is mapped into global memory 6. 1. 2015 by Martin Kruliš (v1.0)8

 Local Memory ◦ Divided into four banks with independent controllers ◦ Each clock cycle each bank may perform:  Send 64bit word to program sequencer  Transfer 64bit word between memory and registers  Receive 64bit word from eMesh interface  Local DMA sends 64bit word to eMesh interface ◦ Memory order model  Local reads and writes follow strong memory model  Non-local transactions follow weak memory model  Operations may not propagate in the same order 6. 1. 2015 by Martin Kruliš (v1.0)9

 eMesh ◦ 2D topology with nearest-neighbor connections ◦ 3 orthogonal (independent) meshes  cMesh – on-chip write transactions (8B/cycle)  xMesh – off-chip write transactions (1B/cycle)  rMesh – read requests (1req/8cycles) ◦ Edge connections may be interfaced with other epiphany chips  Or other type of busses (off-core memory, IO ports, …) ◦ Significantly favorizes writing operations to reading  Writing transactions are 16x faster 6. 1. 2015 by Martin Kruliš (v1.0)10

 eMesh 6. 1. 2015 by Martin Kruliš (v1.0)11

 eMesh Routing ◦ Upper 12bits of the address is address of the core  6 bits – row index, 6 bits – col index ◦ Each node uses simple routing algorithm ◦ Nodes use round-robin arbitration to avoid deadlock 6. 1. 2015 by Martin Kruliš (v1.0)12

 DMA ◦ Two DMA channels per node ◦ 2D addressing awareness, flexible strides ◦ Local-external memory and external-external memory transfers ◦ Completion signaling by HW interrupt ◦ Master and slave modes  Slave DMA is controlled by external IO or another DMA 6. 1. 2015 by Martin Kruliš (v1.0)13

 Epiphany SDK ◦ Separate compilation for host and coprocessor code  Epiphany uses e-gcc and e-objcopy ◦ The host runtime provide way to  Detect the coprocessor  Allocate memory, transfer data  Execute precompiled binaries on the coprocessor  OpenCL ◦ The coprocessor is perceived as OpenCL accelerator ◦ Each core is computing unit, on-chip memory is local memory, … 6. 1. 2015 by Martin Kruliš (v1.0)14

 Host Code Example e_platform_t platform; e_epiphany_t dev; e_init(NULL); e_reset_system(); e_get_platform_info(&platform); e_open(&dev, 0, 0, platform.rows, platform.cols); e_load_group("coproccode.srec", &dev, 0, 0, platform.rows, platform.cols); for (i = 0; i < platform.rows ; ++i) for (j = 0; j < platform.cols; ++j) { coreid = (i + platform.row) * 64 + j + platform.col; usleep(100000); e_read(&emem, 0, 0, 0x0, emsg, _BufSize); e_read(&dev, i, j, 0x6000, &flag, sizeof(flag));... } e_close(&dev); e_finalize(); 6. 1. 2015 by Martin Kruliš (v1.0)15

 Matrix Multiplication 6. 1. 2015 by Martin Kruliš (v1.0)17 A tiles are rotated vertically in each column B tiles are rotated horizontally in each row

Martin Kruliš 6. 1. 2015 by Martin Kruliš (v1.0)1.

Similar presentations

Presentation on theme: "Martin Kruliš 6. 1. 2015 by Martin Kruliš (v1.0)1."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Martin Kruliš 6. 1. 2015 by Martin Kruliš (v1.0)1.

Similar presentations

Presentation on theme: "Martin Kruliš 6. 1. 2015 by Martin Kruliš (v1.0)1."— Presentation transcript:

Similar presentations

About project

Feedback