Download presentation
Presentation is loading. Please wait.
Published byBailey Elwell Modified over 9 years ago
1
Cross-stack Energy Optimization: Fact or Fiction? Kevin Skadron University of Virginia Dept. of Computer Science
2
Flavors of X-Stack “Up” the stack – Circuits Microarchitecture – HW SW eg, sensors throttling Ideally, application itself can adapt (algorithm, precision, QoS, etc.) – … “Down” the stack – Often overlooked, but OS, HW can benefit from application knowledge – SW HW eg, access patterns, thread priorities, private/shared, etc. – GPU example: texture (API driver HW) eg, reconfigurable hardware 2
3
Up: Dymaxion: Index Transformation SIMD/SIMT: Because SIMD requires contiguous access for efficiency, data layout/traversal needs to be transformed User middleware (device driver) (hardware) feature[index] feature’[transform(index)] 8
4
Code Example HOST cudaMemcpy(feature_d, feature, …); kmeans_kernel_orig >>( feature_d,... ); HOST map_row2col(feature_remap, feature, …); kmeans_kernel_map >>( feature_remap,... ); DEVICE __global__ kmeans_kernel_orig(float *feature_d,...){ int tid = BLOCK_SIZE * blockIdx.x + threadIdx.x; /*... */ for (int l = 0; l < nclusters; l++) { index = point_id * nfeatures + l;...feature_d[index]... } DEVICE __global__ kmeans_kernel_map(float *feature_remap,...){ int tid = BLOCK_SIZE * blockIdx.x + threadIdx.x; /*... */ for (int l = 0; l < nclusters; l++) { index = point_id * nfeatures + l;...feature_remap[transform_row2col(index, npoints, nfeatures)]... } } Dymaxion Version Original Version
5
Down: Lack of Sensors and Actuators Feedback control: sensors and actuators Chicken and egg problem Lack of sensors is a big problem now – Can’t control what we can’t measure – Performance monitors not designed for this Too coarse-grained, can’t monitor enough – Moving in the right direction Need more actuators, too – Currently mainly have just DVFS and scheduling/placement – Some HDDs offer DRPM – Reconfiguration is a form of actuation, too 5
6
Wish List Sensors/constraint communication – Up: Structure occupancies, interval behavior, fine- grained/instruction-level responsiveness, physical location, etc. Expand perf-counter system, add informing loads (ISCA ~00), allow HW to query microarchitectural state, expose chip/rack/datacenter/geographic location, etc. – Down: Access patterns, private/shared, priority/performance expectations, etc. Requires new programming constructs and new (possibly privileged) instructions Actuators – Many system components hard to control e.g., HDDs, DRAM, power supply – Control memory behavior, light sleep modes Ordering/buffering/prefetching/contention – More reconfigurability, coarse-grained architectures Why use cache when you can use scratchpad; registers, routed network when you can do direct producer-consumer, etc.? 6
7
Summary Turn fiction into non-fiction! Some good ideas already in papers – Revisit: why weren’t they adopted? New ideas: – Imagine ideal sensing and actuation – Show a promising control/adaptation/reconfiguration algorithm – Propose plausible sensors/actuators 7
8
Backup 8
9
What is “Cross Stack”? Layer X adapts based on information in Layer Y – Example: OS uses hardware info e.g., temp sensors, structure occupancies, # pending cache misses guide thread co-location – Or hardware uses OS info e.g., thread priorities, task deadlines guide hardware DVFS policy – Important—leverage information across layers to make globally efficient decisions – Ultimately: break down costly interfaces Unnecessary copies, extra state, redundant computation Different than energy optimization happening independently in multiple layers – e.g., hardware DVFS (based on instruction flow) + OS DVFS (based on task deadlines) – Risky—control loops can fight 9
10
Fact or Fiction Should be fact! But mostly fiction – Can’t measure power/energy effectively in many systems and components – Control options are typically high-overhead DVFS, task migration, etc. – Most solutions are single-layer Baby steps – Cluster/datacenter front end monitors per-node activity, temperature—schedules accordingly – Autotuning – Reducing copies 10
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.