Download presentation
Presentation is loading. Please wait.
Published byLesley Bailey Modified over 9 years ago
1
CSE Dept., (XHU) 1 The Salishan conference on High-Speed Computing No Free Lunch, No Hidden Cost X. Sharon Hu Dept. Computer Science and Engineering University of Notre Dame 11 Department of Computer Science and Engineering How Can Co-Design Help? The Salishan Conference on High-Speed Computing
2
CSE Dept., (XHU) 2 The Salishan conference on High-Speed Computing Theme: Exposing Hidden Execution Costs Cost of execution: performance and power Computation Communication Data motion Synchronization …… How can we strike a balance between the extremes? Hide as much as possible? Explicitly manage “all” costs? My “position”: Expose widely and choose wisely Focus on power
3
CSE Dept., (XHU) 3 The Salishan conference on High-Speed Computing Why Taking the Position? Expose widely Better understanding the contribution by each component Allowing application-specific tradeoffs Providing opportunities for powerful co-design tools Choose wisely Requiring sophisticated co-design tools Exploring more algorithm/software options
4
CSE Dept., (XHU) 4 The Salishan conference on High-Speed Computing But Easier Said Than Done! Heterogeneity Compute nodes: (multi-core) CPU, GP-GPU, FPGA, … Memory components: on-chip, on-board, disks, … Communication infrastructure: bus, NoC, networks, … Parallelism (”non-determinism”) Data access: movement, coherence, … Resource contention synchronization
5
CSE Dept., (XHU) 5 The Salishan conference on High-Speed Computing Outline Why expose widely? How to benefit from exposing widely? How to choose wisely? Going forward
6
CSE Dept., (XHU) 6 The Salishan conference on High-Speed Computing Why Expose Widely? (1) Different programs has different power distribution Memory ConstSM ConstCache TextCache GPU Cores } Hong and Kim, ISCA 2010 GPU Power Distribution (NVidia GTX 280)
7
CSE Dept., (XHU) 7 The Salishan conference on High-Speed Computing Why Expose Widely? (2) Energy consumptions of three sorting algorithms (Pentium 4 + GeForce 570) Data movement impacts different algorithms differently
8
CSE Dept., (XHU) 8 The Salishan conference on High-Speed Computing Why Expose Widely? (3) Application dependent Massaki Kondo, et. al., SigARCH 2007 Performance degradation due to memory bus contention
9
CSE Dept., (XHU) 9 The Salishan conference on High-Speed Computing Outline Why expose widely? How to benefit from exposing widely? How to choose wisely? Going forward
10
CSE Dept., (XHU) 10 The Salishan conference on High-Speed Computing How to Benefit from “Exposing Widely”? Co-design is the key Expose all factors impacting the “execution model” Computation: processing resource Data motion: memory components and hierarchy Communication: bus and network Resource contention, synchronization… Some examples Software macromodeling Hardware module-based modeling Optimize through power management Keep in mind Amdahl’s law
11
CSE Dept., (XHU) 11 The Salishan conference on High-Speed Computing Macromodeling: Algorithm Complexity Based Relate power/energy of a program with its complexity Example: E = C 1 S + C 2 S 2 + C 3 S 3 (Tan, et. al. DAC’01) where S is the size of the array for a sorting algorithm Example: E comm = C 0 + C 1 S (Loghi, et. al. ACMTECS’07) where S is the size of exchanged messages More sophisticated models to account for both computing and communication How to handle resource contention?
12
CSE Dept., (XHU) 12 The Salishan conference on High-Speed Computing Power Modeling of Bus Contension Penolazzi, Sander and Ahmed Hemani: DATE’11 Characterization step C % N,1 : percentage of cycle difference between the N- processor case and 1-processor case Can be one by IP providers on chosen benchmarks Prediction step
13
CSE Dept., (XHU) 13 The Salishan conference on High-Speed Computing Hierarchical Module-Based Power Modeling Accumulate energy/power of modules CPU+GPU example Access rate: software dependent Data movement contributes to memory power Resource contention modifies access rate Adapted from Isci and Martonosi, Micro’03
14
CSE Dept., (XHU) 14 The Salishan conference on High-Speed Computing Outline Why expose widely? How to benefit from exposing widely? How to choose wisely? Going forward
15
CSE Dept., (XHU) 15 The Salishan conference on High-Speed Computing Managing Bus Contention to Reduce Energy M. Kondo, H. Sasaki and H. Nakamura, 2006 Counter for mem request Register for PU identification Thresholds for selecting which PU uses what V dd value
16
CSE Dept., (XHU) 16 The Salishan conference on High-Speed Computing Application Mapping to Reduce Energy (1) Application mapping for heterogeneous systems J1J1 J2J2 J3J3 J4J4 ([minR 1,maxR 1 ], D 1 ) ([minR 2,maxR 2 ], D 2 ) PE 1 PE 2 PE 3 PE 4 Memory ([minR 4,maxR 4 ], D 4 ) ([minR 3,maxR 3 ], D 3 ) R. Racu, R. Ernst, A. Hamann, B. Mochocki and X. Hu, “Methods for power optimization in distributed embedded systems with real-time requirements,”, CASES’06.
17
CSE Dept., (XHU) 17 The Salishan conference on High-Speed Computing Application Mapping to Reduce Energy (2) Optimization: Minimize power/energy dissipation Satisfying timing properties (e.g. average path latency, average lateness, etc.) …… Search Space: Scheduling parameter, traffic shaping, … Task level DVFS, i.e. task speed assignment Resource level DVFS, i.e., resource speed assignment ……
18
CSE Dept., (XHU) 18 The Salishan conference on High-Speed Computing Application Mapping (3): Sensitivity Analysis R. Racu, R. Ernst, A. Hamann, B. Mochocki and X. Hu, “Methods for power optimization in distributed embedded systems with real-time requirements,”, CASES’06.
19
CSE Dept., (XHU) 19 The Salishan conference on High-Speed Computing Application Mapping (4): GA-Based Approach Power Analyzer 2’. Scheduling Trace 3’. Power Dissipation Power model needed
20
CSE Dept., (XHU) 20 The Salishan conference on High-Speed Computing A Sample Result
21
CSE Dept., (XHU) 21 The Salishan conference on High-Speed Computing Outline Why expose widely? How to benefit from exposing widely? How to choose wisely? Going forward
22
CSE Dept., (XHU) 22 The Salishan conference on High-Speed Computing Going Forward: Systematic Co-design Effort Expose more More hardware counters / registers More efficient/accurate high-level power models Better models for resource contention and synchronization Choose better Handling parallelism Algorithm, OS, hardware Resource contention synchronization Handling non-determinism Worst case bounds Statistical analysis Interval-based techniques
23
CSE Dept., (XHU) 23 The Salishan conference on High-Speed Computing ES Design v.s. HPCS Design Differences (maybe) Application specific workloads v.s. domain specific workloads Constraints, objectives, desirables? latency, throughput, energy, cost, reliability, fault tolerance, IP protection/privacy, ToM, … Other issues: homogeneous v.s. heterogeneous, levels of complexity, user expertise,… Similarities Ever increasing hardware capability: multi-core, multi- thread, complex communication fabrics, memory hierarchy, … Productivity gap Common concerns: latency, throughput, energy, cost, reliability, fault tolerance, …
24
CSE Dept., (XHU) 24 The Salishan conference on High-Speed Computing Leverage Co-Design for HPC Systematic performance estimation Formal methods: scenario-based, statistical analysis Hybrid approaches: analytical+simulation Seamless migration from one abstraction level to the next Efficient design space exploration Efficient search techniques Multiple-level abstraction models Multiple-attribute optimization Others: memory and communication analysis and design
25
CSE Dept., (XHU) 25 The Salishan conference on High-Speed Computing Thank you!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.