Augmented von Neumann Processors Binu K. Mathew, Al Davis School of Computing University of Utah
Guide to the future! "What is it?" asked Arthur. "The Hitchhiker's Guide to the Galaxy. It's a sort of electronic book… a million "pages” could be summoned at a moment's notice... A screen, about three inches by four, lit up and characters began to flicker across the surface. The words Vogon Constructor Fleets flared in green across the screen. At the same time, the book began to speak the entry as well in a still quiet measured voice.This is what the book said. - Douglas Adams, The Hitch Hiker’s Guide to the Galaxy
Future Applications Projected performance requirement: 10 GOPS Continuous speech recognition Handwriting and gesture recognition Computer Vision Heuristic searches in multimedia databases Video conferencing Power consumption of typical processors Intel Strong ARM SA110 @ 233 MHz : 1 W (Max) Intel embedded Pentium @ 233 MHz : 7.9W – 17W AMD Athlon @ 800 MHz: 45.5 W (Max core power) Can conventional archs provide required performance? At a low enough power budget ?
Primitives for Future Applications Hidden Markov Model Solvers : Speech Recognition, Handwriting and gesture recognition FFT: Audio processing DCT: Image compression Block Data Difference: Compression, motion detection Pattern matching: Database searches, feature recognition Generalized filters: Image and audio processing, array transformations Encryption/Decryption, block data transfer, heuristic processing of bulk data … Reduction operators, block math units: Image statistics, Finite element analysis, Logic simulation, Neural nets
Augmented von Neumann Processors Multiple threads of execution, task level parallelism Domain-specific coprocessors provide high performance at low power Language model from memory Pat Match HMM GFU-2 GFU-1 Bulk Math Bulk Diff CPU Core Scratch SRAM Block Transfer Enc/ Decrypt FFT/DCT Audio
Conclusion Power Area Challenges Pihl et al’s HMM coprocessor consumes 853mW @ 154 MHz in a 5V, 0.8 technology 41mW estimated power @ 1GHz on a 1.2V. 0.1 process Indication that domain specific coprocessors win! Area AMD K-7 die area is 184mm2 in a 0.25 process Same K-7 is estimated to be 4.7-12.5% of the die area of a microprocessor of 2005 in a 0.1 process Total area of all our coprocessors is less than 184 mm2 Domain specific coprocessors win again! Challenges Identify core primitives, generalize Power efficient implementation Provide plumbing between units and overall framework