Download presentation
Presentation is loading. Please wait.
Published byChloe Perez Modified over 9 years ago
1
THE RAW MICROPROCESSOR: A COMPUTATIONAL FABRIC FOR SOFTWARE CIRCUITS AND GENERAL- PURPOSE PROGRAMS Taylor, M.B.; Kim, J.; Miller, J.; Wentzlaff, D.; Ghodrat, F.; Greenwald, B.; Hoffman, H.; Johnson, P.; Jae-Wook Lee; Lee, W.; Ma, A.; Saraf, A.; Seneski, M.; Shnidman, N.; Strumpen, V.; Frank, M.; Amarasinghe, S.; Agarwal, A. IEEE Micro, Volume: 22 Issue: 2, March-April 2002 pp. 25-35
2
Wire delay is emerging as the natural limiter to microprocessor scalability. A new architectural approach could solve this problem, as well as deliver unprecedented performance, energy efficiency, and cost effectiveness. The Raw Microprocessor
3
Scalable ISA Provide a parallel, software interface to the gate, wire, and pin resources of the chip Allow programmers more control of physical resources to achieve maximum performance and energy efficiency The Raw Microprocessor Problem: How to leverage growing quantities of chip resources even as wire delays become substantial?
4
Until recently, the abstraction of a wire as an instantaneous connection between transistors has shaped assumptions and architectural designs However, today, it takes on the order of two clock cycles for a signal to travel from edge- to-edge of a 2-GHz processor die Processor manufacturers have strived to maintain high clock rates in spite of the increased impact of wire delay; but materials and process changes have not been sufficient to solve the problem The Raw Microprocessor Technology Trends
5
The Raw Microprocessor The Response of Existing Architectures
6
Attempts to minimize the ISA gap by exposing underlying physical resources as architectural entities Uses an array of identical, programmable tiles The Raw Microprocessor
7
Each tile contains: One static communication router Two dynamic communication routers An eight-stage, in-order, single- issue, MIPS-style processor A four-stage, pipelined, floating- point unit A 32-Kbyte data cache 96 Kbytes of software-managed instruction cache
8
The tiles interconnect using four 32-bit full- duplex on-chip networks, consisting of over 12,500 wires. Each tile only connects to its four neighbors. The length of the longest wire in the system is no greater than the length or width of a tile. This property ensures high clock speeds, and the continued scalability of the architecture. The Raw Microprocessor
9
On the edges of the network, the network buses are multiplexed onto pins Prototype uses 1,657 pins and provides 14 full-duplex, 32-bit, 7.5 Gbps I/O ports at 225 MHz Pin Multiplexing The Raw Microprocessor
11
Architectural Entities The Raw Microprocessor
12
Raw processors will have: More functional units, as well as more flexible and efficient pin utilization Higher pin count due to this efficiency More predictablity and have higher clock frequencies due to explicit exposure of wire delay The Raw Microprocessor Architectural Entities
13
Applications can leverage the Raw static network’s ASIC-like place and route facility -- applications that do so are called software circuits The Raw operating system allows both space and time multiplexing of processes -- it allocates a rectangular-shaped number of tiles to each process The Raw Microprocessor Application Mapping
14
The Raw Microprocessor Application Mapping
15
The Raw Microprocessor Design Decisions Compute Processor: Focus: tight integration of coupled network interfaces and processor pipeline Networks are register mapped and integrated directly into the bypass paths of the pipeline Intertile networking extends bypass concept into 2-D
16
The Raw Microprocessor Design Decisions
17
The Raw Microprocessor Design Decisions Static Router: Routing instructions determine routing path The static routers collectively reconfigure the entire communication pattern of the network on a cycle-by-cycle basis One cycle-per-hop latency between tiles
18
Static Router: 5-stage pipeline that exploits parallelism in routing The Raw Microprocessor Design Decisions
19
The Raw Microprocessor Design Decisions Dynamic Networks: Supports need for dynamic events and message passing Better suited for long data streams due to large overhead
20
The Raw Microprocessor Implementation IBM’s SA-27E, 0.15 micron, six-level copper, ASIC process 25W power consumption Wire delay in tiles was large enough that placement could not be ignored
22
The Raw Microprocessor Implementation Applications with very small ILP generally do not benefit from running on Raw For applications with moderate to significant ILP, performance increases are observed Authors attain speedups ranging from 6x to 11x versus a single tile on Specfp applications for a 16-tile Raw processor and9x to 19x for 32 tiles
23
The Raw Microprocessor Conclusion Replicated tile design saved time in design, RTL Verilog coding, resynthesis, verification, placement, and back-end flow Virtual Raw systems can be created from glueless connection of up to 64 chips Authors believe that reaching the point at which a Raw tile is a relatively small portion of total computation could change the way we compute
24
The Raw Microprocessor Discussion
25
The Raw Microprocessor Discussion Questions Does this paper discuss enough real program and benchmark results? Is 25W power consumption “energy efficient” for the performance they have indicated? Are there negative consequences of exposing so much complexity to the software/programmer? How can the functionality of this processor be likened to a 2-D pipeline? Does cost need to be addressed? How advantageous is the design time reduction achieved through redundancy?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.