Download presentation
Presentation is loading. Please wait.
Published byCarol Bridges Modified over 9 years ago
1
Martin Kruliš 6. 1. 2015 by Martin Kruliš (v1.0)1
2
Adapteva Company ◦ Small fabless semiconductor company ◦ Founded in 2008 ◦ Main objective is to design massively parallel chips with emphasis on power efficiency First company that designed chip that expects to scale over 1000 cores ◦ Current products Epiphany processor (16 core and 64 core versions) Parallela board ◦ Parallela University Program started this year 6. 1. 2015 by Martin Kruliš (v1.0)2
3
6. 1. 2015 by Martin Kruliš (v1.0)3 16-core Epiphany Coprocessor 1GB SDRAM μUSB 1Gb Ethernet μSD μHDMI μUSB Zyng dual-core ARM-A9 (with integrated FPGA) Expansion Slots
4
6. 1. 2015 by Martin Kruliš (v1.0)4
5
6. 1. 2015 by Martin Kruliš (v1.0)5
6
Coprocessor ◦ 32-bit RISC cores with superscalar architecture ◦ 32KB local memory per core (1 cycle latency) Divided into four independent banks ◦ IEEE754 compliant floating point instruction set ◦ Two DMA channels eMesh (Network-on-Chip) ◦ Both on chip and off chip communication ◦ No specific API, works with memory transactions eLink (Chip-to-Chip Links) ◦ 4 I/O ports for external communication 6. 1. 2015 by Martin Kruliš (v1.0)6
7
Coprocessor Cores ◦ Simple in-order RISC architecture Most instructions take 1 cycle 8-stage dual-issue pipeline Instruction set optimized for signal processing ◦ Separate integer and floating point ALU ◦ 64x 32-bit registers (for both IALU and FPU) Load store architecture Per cycle 3/1 FPU and 2/1 IALU accesses, 1 load/store ◦ Performance 16 cores ~ 2Gflops each, 64 cores ~ 1.6 Gflops each 6. 1. 2015 by Martin Kruliš (v1.0)7
8
Memory Model ◦ Internal memory of each node is mapped into global memory 6. 1. 2015 by Martin Kruliš (v1.0)8
9
Local Memory ◦ Divided into four banks with independent controllers ◦ Each clock cycle each bank may perform: Send 64bit word to program sequencer Transfer 64bit word between memory and registers Receive 64bit word from eMesh interface Local DMA sends 64bit word to eMesh interface ◦ Memory order model Local reads and writes follow strong memory model Non-local transactions follow weak memory model Operations may not propagate in the same order 6. 1. 2015 by Martin Kruliš (v1.0)9
10
eMesh ◦ 2D topology with nearest-neighbor connections ◦ 3 orthogonal (independent) meshes cMesh – on-chip write transactions (8B/cycle) xMesh – off-chip write transactions (1B/cycle) rMesh – read requests (1req/8cycles) ◦ Edge connections may be interfaced with other epiphany chips Or other type of busses (off-core memory, IO ports, …) ◦ Significantly favorizes writing operations to reading Writing transactions are 16x faster 6. 1. 2015 by Martin Kruliš (v1.0)10
11
eMesh 6. 1. 2015 by Martin Kruliš (v1.0)11
12
eMesh Routing ◦ Upper 12bits of the address is address of the core 6 bits – row index, 6 bits – col index ◦ Each node uses simple routing algorithm ◦ Nodes use round-robin arbitration to avoid deadlock 6. 1. 2015 by Martin Kruliš (v1.0)12
13
DMA ◦ Two DMA channels per node ◦ 2D addressing awareness, flexible strides ◦ Local-external memory and external-external memory transfers ◦ Completion signaling by HW interrupt ◦ Master and slave modes Slave DMA is controlled by external IO or another DMA 6. 1. 2015 by Martin Kruliš (v1.0)13
14
Epiphany SDK ◦ Separate compilation for host and coprocessor code Epiphany uses e-gcc and e-objcopy ◦ The host runtime provide way to Detect the coprocessor Allocate memory, transfer data Execute precompiled binaries on the coprocessor OpenCL ◦ The coprocessor is perceived as OpenCL accelerator ◦ Each core is computing unit, on-chip memory is local memory, … 6. 1. 2015 by Martin Kruliš (v1.0)14
15
Host Code Example e_platform_t platform; e_epiphany_t dev; e_init(NULL); e_reset_system(); e_get_platform_info(&platform); e_open(&dev, 0, 0, platform.rows, platform.cols); e_load_group("coproccode.srec", &dev, 0, 0, platform.rows, platform.cols); for (i = 0; i < platform.rows ; ++i) for (j = 0; j < platform.cols; ++j) { coreid = (i + platform.row) * 64 + j + platform.col; usleep(100000); e_read(&emem, 0, 0, 0x0, emsg, _BufSize); e_read(&dev, i, j, 0x6000, &flag, sizeof(flag));... } e_close(&dev); e_finalize(); 6. 1. 2015 by Martin Kruliš (v1.0)15
16
6. 1. 2015 by Martin Kruliš (v1.0)16
17
Matrix Multiplication 6. 1. 2015 by Martin Kruliš (v1.0)17 A tiles are rotated vertically in each column B tiles are rotated horizontally in each row
18
6. 1. 2015 by Martin Kruliš (v1.0)18
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.