Download presentation
Presentation is loading. Please wait.
Published byClaud Thornton Modified over 9 years ago
1
ISCA Panel June 7, 2005 Wen-mei W. Hwu —University of Illinois at Urbana-Champaign 1 Future mass apps reflect a concurrent world u Exciting applications in future mass computing market represent and model physical world. u Traditionally considered “supercomputing apps” or super-apps. s Physiological simulation, Molecular dynamics simulation, Video and audio manipulation, Medical imaging, Consumer game and virtual reality products u Attempts to grow current architectures “out” or domain-specific architectures “in” lack success; a more broad approach to cover more domains is promising
2
ISCA Panel June 7, 2005 Wen-mei W. Hwu —University of Illinois at Urbana-Champaign 2 MPEG Encoding Parallelism u Independent IPPP sequences u Frames: independent 16x16 pel macroblocks u Localized dependence of P-frame macroblocks on previous frame u Steps of macroblock processing exhibit finer grained parallelism, each block spans function boundaries
3
ISCA Panel June 7, 2005 Wen-mei W. Hwu —University of Illinois at Urbana-Champaign 3 Alternative Forms of MPEG-4 Threading
4
ISCA Panel June 7, 2005 Wen-mei W. Hwu —University of Illinois at Urbana-Champaign 4 Building on HPF Compilation: what’s new? u Applicability to mass software base - requires pointer analysis, control flow analysis, data structure and object analysis, beyond traditional dependence analysis u Domain-specific, application model languages s More intuitive than C for inherently parallel problems t increased productivity, increased portability t Will still likely have C as implementation language s There is room for a new app language or a family of languages u Role for the compiler in model language environments s Model can provide structured semantics for the compiler, beyond what can be derived from analysis of low-level code s Compiler can magnify the usefulness of model information with its low-level analysis
5
ISCA Panel June 7, 2005 Wen-mei W. Hwu —University of Illinois at Urbana-Champaign 5 Pointer analysis: sensitivity, stability and safety Improved efficiency increases the scope over which unique, heap- allocated objects can be discovered Improved analysis algorithms provide more accurate call graphs (below) instead of a blurred view (above) for use by program transformation tools Fulcra in OpenIMPACT [SAS2004, PASTE2004] and others
6
ISCA Panel June 7, 2005 Wen-mei W. Hwu —University of Illinois at Urbana-Champaign 6 Thoughts from the VLIW/EPIC Experience u Any significant compiler work for a new computing platform takes 10-15 years to mature s 1989-1998 initial academic results from IMPACT s 1995-2005 technology collaboration with Intel/HP s 2000-2005 SPEC 2000, Itanium 1 and 2, open source apps s This was built on significant work from Multiflow, Cydrom, RISC, HPC teams u Real work in compiler development begins when hardware arrives s IMPACT output code performance improved by more than 20% since arrival of Itanium hardware – and much more stable s Most apps brought up with IMPACT after Itanium systems arrived: debugging! s Real performance effects can only be measured on hardware s Early access to hardware for academic compiler teams crucial and must a priority for industry development team. u Quantitative methodology driven by large apps is key s Innovations evaluated in whole system context
7
ISCA Panel June 7, 2005 Wen-mei W. Hwu —University of Illinois at Urbana-Champaign 7 How the next-generation compiler will do it (1) To-do list: o Identify acceleration opportunities o Localize memory o Stream data and overlap computation Heavyweight loops Acceleration opportunities: o Heavyweight loops identified for acceleration o However, they are isolated in separate functions called through pointers
8
ISCA Panel June 7, 2005 Wen-mei W. Hwu —University of Illinois at Urbana-Champaign 8 Large constant lookup tables identified How the next-generation compiler will do it (2) To-do list: Identify acceleration opportunities o Localize memory o Stream data and overlap computation Localize memory: o Pointer analysis identifies indirect callees o Pointer analysis identifies localizable memory objects o Private tables inside accelerator initialized once, saving traffic Initialization code identified
9
ISCA Panel June 7, 2005 Wen-mei W. Hwu —University of Illinois at Urbana-Champaign 9 How the next-generation compiler will do it (3) To-do list: Identify acceleration opportunities Localize memory o Stream data and overlap computation Streaming and computation overlap: o Memory dataflow summarizes array/pointer access patterns o Opportunities for streaming are automatically identified o Unnecessary memory operations replaced with streaming Summarize input access pattern Summarize output access pattern Constant table privatized
10
ISCA Panel June 7, 2005 Wen-mei W. Hwu —University of Illinois at Urbana-Champaign 10 How the next-generation compiler will do it (4) To-do list: Identify acceleration opportunities Localize memory Stream data and overlap computation Achieve macropipelining of parallelizable accelerators o Upsampling and color conversion can stream to each other o Optimizations can have substantial effect on both efficiency and performance
11
ISCA Panel June 7, 2005 Wen-mei W. Hwu —University of Illinois at Urbana-Champaign 11 Memory dataflow in the pointer world u Arrays are not true 3D arrays (unlike in Fortran) u Actual implementation: array of pointers to array of samples u New type of dataflow problem – understanding the semantics of memory structures instead of true arrays Array of constant pointers Row arrays never overlap
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.