Presentation is loading. Please wait.

Presentation is loading. Please wait.

Cross-stack Energy Optimization: Fact or Fiction? Kevin Skadron University of Virginia Dept. of Computer Science.

Similar presentations


Presentation on theme: "Cross-stack Energy Optimization: Fact or Fiction? Kevin Skadron University of Virginia Dept. of Computer Science."— Presentation transcript:

1 Cross-stack Energy Optimization: Fact or Fiction? Kevin Skadron University of Virginia Dept. of Computer Science

2 Flavors of X-Stack “Up” the stack – Circuits  Microarchitecture – HW  SW eg, sensors  throttling Ideally, application itself can adapt (algorithm, precision, QoS, etc.) – … “Down” the stack – Often overlooked, but OS, HW can benefit from application knowledge – SW  HW eg, access patterns, thread priorities, private/shared, etc. – GPU example: texture (API  driver  HW) eg, reconfigurable hardware 2

3 Up: Dymaxion: Index Transformation SIMD/SIMT: Because SIMD requires contiguous access for efficiency, data layout/traversal needs to be transformed User  middleware  (device driver)  (hardware) feature[index] feature’[transform(index)] 8

4 Code Example HOST cudaMemcpy(feature_d, feature, …); kmeans_kernel_orig >>( feature_d,... ); HOST map_row2col(feature_remap, feature, …); kmeans_kernel_map >>( feature_remap,... ); DEVICE __global__ kmeans_kernel_orig(float *feature_d,...){ int tid = BLOCK_SIZE * blockIdx.x + threadIdx.x; /*... */ for (int l = 0; l < nclusters; l++) { index = point_id * nfeatures + l;...feature_d[index]... } DEVICE __global__ kmeans_kernel_map(float *feature_remap,...){ int tid = BLOCK_SIZE * blockIdx.x + threadIdx.x; /*... */ for (int l = 0; l < nclusters; l++) { index = point_id * nfeatures + l;...feature_remap[transform_row2col(index, npoints, nfeatures)]... } } Dymaxion Version Original Version

5 Down: Lack of Sensors and Actuators Feedback control: sensors and actuators Chicken and egg problem Lack of sensors is a big problem now – Can’t control what we can’t measure – Performance monitors not designed for this Too coarse-grained, can’t monitor enough – Moving in the right direction Need more actuators, too – Currently mainly have just DVFS and scheduling/placement – Some HDDs offer DRPM – Reconfiguration is a form of actuation, too 5

6 Wish List Sensors/constraint communication – Up: Structure occupancies, interval behavior, fine- grained/instruction-level responsiveness, physical location, etc. Expand perf-counter system, add informing loads (ISCA ~00), allow HW to query microarchitectural state, expose chip/rack/datacenter/geographic location, etc. – Down: Access patterns, private/shared, priority/performance expectations, etc. Requires new programming constructs and new (possibly privileged) instructions Actuators – Many system components hard to control e.g., HDDs, DRAM, power supply – Control memory behavior, light sleep modes Ordering/buffering/prefetching/contention – More reconfigurability, coarse-grained architectures Why use cache when you can use scratchpad; registers, routed network when you can do direct producer-consumer, etc.? 6

7 Summary Turn fiction into non-fiction! Some good ideas already in papers – Revisit: why weren’t they adopted? New ideas: – Imagine ideal sensing and actuation – Show a promising control/adaptation/reconfiguration algorithm – Propose plausible sensors/actuators 7

8 Backup 8

9 What is “Cross Stack”? Layer X adapts based on information in Layer Y – Example: OS uses hardware info e.g., temp sensors, structure occupancies, # pending cache misses guide thread co-location – Or hardware uses OS info e.g., thread priorities, task deadlines guide hardware DVFS policy – Important—leverage information across layers to make globally efficient decisions – Ultimately: break down costly interfaces Unnecessary copies, extra state, redundant computation Different than energy optimization happening independently in multiple layers – e.g., hardware DVFS (based on instruction flow) + OS DVFS (based on task deadlines) – Risky—control loops can fight 9

10 Fact or Fiction Should be fact! But mostly fiction – Can’t measure power/energy effectively in many systems and components – Control options are typically high-overhead DVFS, task migration, etc. – Most solutions are single-layer Baby steps – Cluster/datacenter front end monitors per-node activity, temperature—schedules accordingly – Autotuning – Reducing copies 10


Download ppt "Cross-stack Energy Optimization: Fact or Fiction? Kevin Skadron University of Virginia Dept. of Computer Science."

Similar presentations


Ads by Google