Download presentation
Presentation is loading. Please wait.
Published byVerity Sutton Modified over 9 years ago
1
Baring It All to Software: Raw Machines E. Waingold, M. Taylor, D. Srikrishna, V. Sarkar, W. Lee, V. Lee, J. Kim, M. Frank, P. Finch, R. Barua, J. Babb, S. Amarasinghe, A. Agarwal (Presented by Linda Deng)
2
Hitting a wall Already in 1997? As # of transistors increases, so does wire delay New complex hardware verification costs Emerging stream-based multimedia
3
The radical Raw idea Lots of simple interconnected tiles Each tile contains: – Instruction/data memories – ALU – Registers – Configurable logic – Programmable switch for routing Complex operations synthesized into HW ↑
4
A Raw processor ↑
5
The programmer’s job Software deals with wire delay Wire delay = hops in mesh network One cycle to move from a tile to its neighbor Compiler knows # of cycles needed to move – Statically schedules operations Register renaming, instruction scheduling, dependency checking… ↑
6
What’s the big deal? Distributed registers – Bigger register namespace higher ILP Distributed static RAM – Shorter memory latency No specialized logic structures in HW – Smaller tiles more tiles greater parallelism – More chip area for memory/logic – Faster clock – Less complexity easier verification
7
The hard-working compiler Parallelism vs. communication/synchronization? – But the latter’s overhead is low – So partitioning can be fine-grained Tile placement to minimize latency/bandwidth Programs for tiles/switches (scheduling/routing) Logic synthesis tool for configurable logic – Pattern-matching algorithms to find candidate insns
8
Some remaining dynamic events… What happens when compiler can’t resolve? Reserve bandwidth b/w potential communicators Conservative estimates for dynamic routing Assign dependency checking to tiles Predict tile for offset, even though base is unknown
9
Prototype time: RawLogic Implemented with FPGAs Limited feature support – Static sequences converted into state machines – Hardwired into RawLogic – Inflexible, with amazingly long compilation times Framework in C/Verilog for compilation – Produced binary code for state machines But larger benchmarks were emulated And Raw machine has faster clock than FPGA
10
The numbers
11
Looking ahead “In 10 to 15 years, we believe that billion- transistor chip densities, faster switching speeds, and growing compiler sophistication will allow a Raw machine’s performance-to- cost ratio to surpass that of traditional architectures for future, general-purpose workloads.” Agarwal’s Tilera started shipping 64-core TILE64 in 2007, working on 36- and 120-core?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.