Presentation is loading. Please wait.

Presentation is loading. Please wait.

Xiaocheng Zhou Intel Labs China “Single-chip Cloud Computer” An experimental many-core processor from Intel Labs.

Similar presentations


Presentation on theme: "Xiaocheng Zhou Intel Labs China “Single-chip Cloud Computer” An experimental many-core processor from Intel Labs."— Presentation transcript:

1 Xiaocheng Zhou Intel Labs China “Single-chip Cloud Computer” An experimental many-core processor from Intel Labs

2 What is Tera-scale? TIPs of compute power operating on Tera-bytes of data TIPs of compute power operating on Tera-bytes of data Terabytes TIPS Gigabytes MIPS Megabytes GIPS Performance Dataset Size Kilobytes KIPS Tera-scale Multi-core Single-core Mult- Media 3D & Video Text RMS Personal Media Creation and Management Learning & Travel Entertainment Health Source: electronic visualization lab University of Illinois http://techresearch.intel.com/articles/Tera-Scale/1421.htm

3 Performance Scaling Challenges EnergyEfficiencyEmergingApplicationsProgrammingStrategyDesignComplexity

4 Cloud Computing Today Cloud datacenters: –1000s of networked computers –Millions of threads & petabytes of data Opportunity: –Lower power, higher density via integration –Greater efficiency and better programmability Example: Intel’s Open Cirrus testbed Intel Labs Pittsburgh

5 Single-chip Cloud Computer (SCC) Experimental many-core CPU on 45 nm Hi-K metal-gate siliconExperimental many-core CPU on 45 nm Hi-K metal-gate silicon 48 IA-compatible cores48 IA-compatible cores Network of 2-core nodes mimics cloud computing at chip levelNetwork of 2-core nodes mimics cloud computing at chip level Fine-grained power management scales from 25- 125WFine-grained power management scales from 25- 125W Supports proven, highly parallel “scale-out” programming modelsSupports proven, highly parallel “scale-out” programming models

6 R MC 24 Tiles 24 Routers 48 IA cores Inside the SCC Inside the SCC ROUTE R MEMORY CONTROLLER 2D mesh network 4 Integrated DDR3 memory controllers (64GB addressable) RR RRR 1TILE Dual-core SCDC Tile R

7  Architecture –6x4 2D Mesh NOC –16B wide data links + 2B sideband –8 Virtual Channels in 2 classes –Fixed (X-Y) routing  Performance –Target freq: 2GHz @ 1.1V –Link Bandwidth 64GB/s –4 cycle latency  Power Management –Independent Frequency & Voltage control –Sleep mode, clock gating, low power RF On-die Interconnect

8  Memory –Up to 64GB DDR3 via 4 memory controllers @ 21.3GB/s –16KB SRAM in each tile as Message Passing Buffer (MPB)  Caching –32KB L1 per core (16KB I,D), 12MB L2 cache (256KB/core) –No HW cache-coherent shared memory  Addressing –Core physical to system physical addresses in 16MB sections –Memory mapped configuration & control registers Memory Architecture

9 Address Translation: From Core Address to System Address Core Physical Address Space Core Physical Address Space System Physical Address Space Physical-Physical Mapping Look Up Table (LUT)

10

11 11

12 Message Passing on SCC  Regions of memory mapped to multiple cores –Message Passing Buffer (MPB) for small fast messages –Larger buffers in off-die memory  Message Passing Data Type (MPDT) –R/W bypass L2 cache – tagged in L1 as MPDT –New instruction to selectively invalidate MPDT lines  Read/Write to other core’s MPB on-die –Synchronize through special atomic register bits –Core-core asynchronous interrupts  High-level API for applications – “RCCE” –One-sided communication (Get, Put, Send, Recv) –MPB allocation, synchronization

13 Improving Energy Efficiency Fine-grain, software-controlled power management 8 voltage and 28 frequency islands –Each tile can run at a different frequency –6 banks of four tiles can run at different voltages –Also independent V&F control for I/O network & MCs Memory Controller Tile R R R R R R R R R R R R R R R R R R R R R R R R Memory Controller V1V1 V2V2 FnFn FnFn FnFn FnFn V3V3 V4V4 V5V5 V6V6

14 Package and Test Board Technology45nm Process Package1567 pin LGA package 14 layers (5-4-5) Signals970 pins

15 SCC Platform Board Overview SCC Platform Board Overview

16 SCC “Chipset”  System Interface FPGA –Connects to SCC Mesh interconnect –IO capabilities like PCIe, Ethernet & SATA –Bitstream loaded by BMC  Board Management Controller (BMC) –JTAG interface for Clocking, Power etc. –USB Stick to hold FPGA bitstream –Network interface for User intercation via Telnet –Status monitoring

17 Software Environment  SCC Software –Customized Linux –Bare Metal –RCCE communication & power management –Tools –Selected Intel tools (e.g., icc, ifort,...) –Microsoft research release of SCC extensions to Visual Studio  Management Console PC Software –PCIe driver with integrated TCP/IP driver –Programming API for communication with SCC platform –GUI for interaction with SCC platform –Command line tools for interaction with SCC platform

18 RCCE Communication API  A compact, lightweight communication environment. –SCC and RCCE were designed together side by side: –… a true HW/SW co-design project.  A research vehicle to understand how message passing APIs map onto many core chips.  For experienced parallel programmers willing to work close to the hardware.  Static SPMD Execution Model: –identical UEs created together when a program starts (this is a standard approach familiar to message passing programmers)

19 RCCE Power Management API  RCCE power management emphasizes safe control: V/GHz changed together within each 4-tile (8-core) power domain. –A Master core sets V + GHz for all cores in domain. –RCCE_istep_power(): –steps up or down V + GHz, where GHz is max for selected voltage. –RCCE_wait_power(): –returns when power change is done –RCCE_step_frequency(): –steps up or down only GHz  Power management latencies –V changes: Very high latency, O(Million) cycles. –GHz changes: Low latency, O(few) cycles.

20 sccGui for debugging Modify config registers Read system memory Read system memory 20

21 sccBoot & sccReset  sccBoot: A command-line tool that allows to boot Linux on selected cores and to check the status (“which cores are currently booted”).  sccReset: A command-line tool that allows to reset selected SCC cores.

22 sccKonsole  Regular konsole, with automatic login to selected cores.  Enables broadcasting amongst shells. 22

23 MARC - Many-core Application Research Community  Worldwide research partnership program with academia & industry  Providing access to SCC for many-core programming research  Overwhelming interest - ~200 research proposals received  SCC datacenter is online - Community website up and running http://communities.intel.com/community/marc

24


Download ppt "Xiaocheng Zhou Intel Labs China “Single-chip Cloud Computer” An experimental many-core processor from Intel Labs."

Similar presentations


Ads by Google