Presentation is loading. Please wait.

Presentation is loading. Please wait.

General Purpose Processors as Processor Arrays Peter Cappello UC, Santa Barbara.

Similar presentations


Presentation on theme: "General Purpose Processors as Processor Arrays Peter Cappello UC, Santa Barbara."— Presentation transcript:

1 General Purpose Processors as Processor Arrays Peter Cappello UC, Santa Barbara

2 VLSI Design Forces in 1986 “Nature, to be commanded, must be obeyed.” –Sir Francis Bacon High performance  parallelism

3 VLSI Design Forces in 1986 High performance  parallelism

4 VLSI Design Forces in 1986 Power is scarce  limit resistive delay

5 VLSI Design Forces in 1986 Power is scarce  limit resistive delay  limit long communication

6 VLSI Design Forces in 1986 Power is scarce  limit resistive delay  limit long communication Area is scarce  limit wire crossing

7 VLSI Design Forces in 1986 Power is scarce  limit resistive delay  limit long communication Area is scarce  limit wire crossing

8 VLSI Design Forces in 1986 Power is scarce  limit resistive delay  limit long communication Area is scarce  limit wire crossing

9 VLSI Design Forces in 1986 $$ are scarce  design is expensive

10 VLSI Design Forces in 1986 $$ are scarce  design is expensive  reuse components

11 VLSI Design Forces in 1986 $$ are scarce  design is expensive  reuse components

12 VLSI Design Forces in 1986 $$ are scarce  design is expensive  reuse components

13 VLSI Design Forces in 1986 $$ are scarce  design is expensive  reuse components

14 VLSI Design Forces in 1986 $$ are scarce  design is expensive  reuse components

15 VLSI Design Forces in 1986 In 2D systolic arrays, clock skew is an issue  wavefront arrays Islands of synchrony in an ocean of asynchrony

16 Processor Array Properties 1.Have multiple processors

17 Processor Array Properties 1.Have multiple processors 2.Neighbors abut (no long wires)

18 Processor Array Properties 1.Have multiple processors 2.Neighbors abut 3.Only neighbors communicate directly

19 Processor Array Properties 1.Have multiple processors 2.Neighbors abut 3.Only neighbors communicate directly 4.Have a constant # of processor types

20 Processor Array Properties 1.Have multiple processors 2.Neighbors abut 3.Only neighbors communicate directly 4.Have a constant # of processor types 5.Scale: larger problems  larger arrays

21 No 3D PA Has Properties 1 - 5 Enclose 3D PA in minimal sphere of radius r. r

22 No 3D PA Has Properties 1 - 5 Scale PA in all 3 dimensions. r

23 No 3D PA Has Properties 1 - 5 1.Power consumption = Θ( r 3 ). r

24 No 3D PA Has Properties 1 - 5 1.Power consumption = Θ( r 3 ). 2.Heat dissipation via surface = Θ( r 2 ). r

25 VLSI Design Forces in 2006 “Nature, to be commanded, must be obeyed.” –Sir Francis Bacon Power is scarce  limit clock frequency  parallelism Power is scarce  limit resistive delay  limit long communication

26 Trends in GPP in 2006 Chip multiprocessors (CMP) Vector IRAM Cell TRIPS RAW

27 Trends in GPP in 2006 Chip Multiprocessors (CMP) –Parallel processors –Crossbar

28 Trends in GPP in 2006 Vector IRAM – Vector Intelligent RAM For mobile multimedia devices Stream data processing Combine GPP and DSP –Parallel – linear array –Crossbar

29

30 Trends in GPP in 2006 Cell processor “The Department of Energy said Wednesday that it had awarded I.B.M. a contract to build a supercomputer capable of 1,000 trillion calculations a second, using an array of 16,000 Cell processor chips that I.B.M. designed for the coming PlayStation 3 video game machine.” Sept. 7, 2006. NY Times.I.B.M.PlayStation Parallel processors –BIU – Bus interface unit –RMT – Replacement management table –SL1 – 1 st -level cache –PPE – PowerPC Element –SPE – Synergistic Processor Element –Element interconnect bus

31

32 Trends in GPP in 2006 TRIPS Tera-op, Reliable, Intelligently adaptive Processing System The following slides are taken from a talk: "The Design and Implementation of the TRIPS Prototype Chip," HotChips 17, Palo Alto, CA, August, 2005.The Design and Implementation of the TRIPS Prototype Chip

33 E – execution tile R – register bank D – 8KB data cache I – instruction cache G – global control

34 Instructions execute as a data flow graph –An instruction’s output is another instruction’s input. –Minimize use of register/cache for intermediate values Register reads/writes access the register banks Loads/stores access the data cache banks

35

36

37

38

39 Trends in GPP in 2006 RAW (MIT) The following slides are taken from a RAW talk: Evaluating The Raw Microprocessor: Scalability and Versatility Presented at the International Symposium on Computer Architecture, June 21, 2004.

40 ALU RF >> + Replace the crossbar with a point-to-point, pipelined, routed network.

41 Distribute the Register File ALU RF

42 Distribute the rest. ALU RF Control Wide Fetch (16 inst) Unified Load/Store Queue PC I$ PC D$ I$ PC D$ I$ PC D$ I$ PC D$ I$ PC D$ I$ PC D$ I$ PC D$ I$ PC D$ I$ PC D$ I$ PC D$ I$ PC D$ I$ PC D$ I$ PC D$ I$ PC D$ I$ PC D$ I$ PC D$ [ISCA99]

43 Tiles! ALU RF I$ PC D$ I$ PC D$ I$ PC D$ I$ PC D$ I$ PC D$ I$ PC D$ I$ PC D$ I$ PC D$ I$ PC D$ I$ PC D$ I$ PC D$ I$ PC D$ I$ PC D$ I$ PC D$ I$ PC D$ I$ PC D$

44 Conclusions VLSI Scalable microprocessors are possible. Constant factors are beginning to give way to asymptotics: - 16 ALU Raw – Oct 2002 - 64 ALU Raw – Now - 1,024 ALU Raw- 2010 - 32,768 ALU Raw – If Moore’s Law makes it to 2 nm There is an opportunity to make processors more “versatile” i.e., steal applications from custom chips. Tiled Processor Architectures are a promising approach and merit further research.

45 GPP Predictions: In 10 Years Encapsulate registers/cache/processors into an array. Partition off-chip memory: Encapsulate memory & processor. Safely increase parallel access (concurrent programming) For non-recursive applications GPP (mobile multimedia): –no bus; quasi-nearest neighbor networks. For recursive applications GPP (gaming, control) –replace bus w/ lean on-chip short-diameter communication network. –1 network-on-chip routes register/cache/instruction/control. –Need >= 1K processors/chip to justify network-on-chip.

46 Predictions Increasing complexity of: –Applications –Technology  Increasing specialization of labor

47 Predictions Increasing complexity of: –Applications –Technology  Increasing specialization of labor Rate of change of increase in complexity is increasing over time  Increasing adaptability is important!

48 Yet another taxonomy! RECONFIGURABILITY ARCHITECTURAL SPECIFICITY ASIC PROTOTYPE ASIC GPP CCM STATICDYNAMIC SPECIFIC GENERAL

49 Yet another taxonomy! ASIC PROTOTYPE ASIC GPP CCM STATICDYNAMIC SPECIFIC GENERAL ARCHITECTURAL SPECIFICITY RECONFIGURABILITY

50 STATICDYNAMIC COMMUNICATION LATENCY TP DP ASIC PROTOTYPE ASIC GPP CCM ARCHITECTURAL SPECIFICITY SPECIFIC GENERAL RECONFIGURABILITY

51 DP Communication Topology FPGA EDGE ISA (2D VLIW) With Cores FFT, RISC High Throughput (iterative) Communication topology

52 TP Communication Topology FPGA EDGE ISA (2D VLIW) With Cores RAM, RISC Low Latency (recursive) Communication topology

53 General Purpose Language Domain Specific Language Computational Model Compute Substrate Communicate Substrate Configurable Hardware Static Hardware Fabrication Technology DISCIPLINE PROCESS CS, DE CS, CE CE, EE EE EE, ME Circuit layout Processor architecting CompilingCS CS, DE Application programDE CE, EEProcessor layout FPGA/Circuit design Language design Fabrication process Compute model design

54 Conclusion Last 20 years witnessed dramatic advances

55 Conclusion Last 20 years witnessed dramatic advances Next 20 years will witness even more dramatic advances.

56 Spare slides follow

57 Recursive Computation via a Tree of Meshes Network?

58 Quasi-Scalable

59

60 RFD$ GLOBALLOCAL ADDRESS

61 Interleave Memory & Processor Tiles Slightly more chips Compiler localizes memory accesses EDGE ISA deals with variable access times (TRIPS).

62

63 Cell architecture

64 Specialization of Labor High Level / Domain-Specific Language Computational Model Exposes Comm. Topology ISANetwork FPGA Fabrication APPLICATION PROGRAMMER COMPILER COMPUTER ARCHITECT COMPUTER ENGINEER ELECTRICAL & COMPUTER ENGINEER


Download ppt "General Purpose Processors as Processor Arrays Peter Cappello UC, Santa Barbara."

Similar presentations


Ads by Google