Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Integrated Management of Power Aware Computing & Communication Technologies Kickoff review meeting Nader Bagherzadeh, Pai H. Chou, Fadi Kurdahi University.

Similar presentations


Presentation on theme: "1 Integrated Management of Power Aware Computing & Communication Technologies Kickoff review meeting Nader Bagherzadeh, Pai H. Chou, Fadi Kurdahi University."— Presentation transcript:

1 1 Integrated Management of Power Aware Computing & Communication Technologies Kickoff review meeting Nader Bagherzadeh, Pai H. Chou, Fadi Kurdahi University of California, Irvine, ECE Dept. DARPA Contract F33615-00-1-1719 September 27, 2000

2 2 Agenda Introduction and overview Management status, financial, milestones, schedule. Technical presentation  Task progress Architecture Applications CAD  Lessons learned, challenges, issues. Questions + action items review.

3 3 Outline Introduction  Program goals  Project overview Management status  Personnel and teaming plans  Plans and milestones  Financial information Technical presentation  Background  Technical approach  Status and accomplishments  Current detailed schedule Program impact and anticipated transitions

4 4 Introduction

5 5 Program Goals Power-aware system-level design  Enhance mission success (time, task)  Rapid customization for different missions Design tool  Exploration & evaluation  Optimization& specialization  Technique integration System architecture  Statically configurable  Dynamically adaptive  Use COTS parts & protocols

6 6 Technical approach High-level specification  Separate behavior from architecture  Explicit constraints (timing, power)  Library characterization System synthesis tool  Source-aware power usage scheduling  Bus topology transformation and communication scheduling Configurable architecture  Task migration & selective shutdown  Bus segmentation and voltage scaling Domain knowledge  Encompass mechanical / thermal power  Aware of power supply model

7 7 Quad Chart Innovations Component-based power-aware design  Exploit off-the-shelf components & protocols  Best price/performance, reliable, cheap to replace CAD tool for global power policy optimization  Optimal partitioning, scheduling, configuration  Manage entire system, including mechanical & thermal Power-aware reconfigurable architectures  Reusable platform for many missions  Bus segmentation, voltage / frequency scaling Impact Enhanced mission success  More task for the same power  Dramatic reduction in mission completion time Cost saving over a variety of missions  Reusable platform & design techniques  Fast turnaround time by configuration, not redesign Confidence in complex design points  Provably correct functional/power constraints  Retargetable optimization to eliminate overdesign  Power protocol for massive scale Behavior Architecture high-level simulation functional partitioning & scheduling composition operators high-level components behavioral system model busses, protocols system architecture mapping system integration & synthesis static configuration dynamic power management parameterizable components 2Q 00 Kickoff 2Q 01 2Q 02 Static & hybrid optimizations  partitioning / allocation  scheduling  bus segmentation  voltage scaling COTS component library FireWire and I2C bus models Static composition authoring Architecture definition High-level simulation Benchmark Identification Dynamic optimizations  task migration  processor shutdown  bus segmentation  frequency scaling Parameterizable components library Generalized bus models Dynamic reconfiguration authoring Architecture reconfiguration Low-level simulation System benchmarking Year 1Year 2

8 8 Innovations Component-based power-aware design  Exploit off-the-shelf components & protocols  COTS offer best price/performance, reliable, cheap to replace CAD tool for global power policy optimization  Optimal partitioning, scheduling, configuration  Manage entire system, including mechanical & thermal Power-aware reconfigurable architectures  Reusable platform for many missions  Bus segmentation, voltage / frequency scaling

9 9 Impact Enhanced mission success  More task for the same power  Dramatic reduction in mission completion time Cost saving over a variety of missions  Reusable platform & design techniques  Fast turnaround time by configuration, not redesign Confidence in complex design points  Provably correct functional/power constraints  Retargetable optimization to eliminate overdesign  Power protocol for massive scale

10 10 Management Status

11 11 Personnel & teaming plans UC Irvine, Co-PI's- Design tools  Nader Bagherzadeh  Pai Chou  Fadi Kurdahi UC Irvine, research assistants  Dexin Li  Jinfeng Liu  Afshin Niktash USC- Component power optimization  Jean-Luc Gaudiot  Seong-Won Lee JPL- Applications and benchmarking  Nazeeh Aranki  Nikzad “Benny” Toomarian

12 12 Previous work Design tools  System-level:the Chinook HW/SW codesign tool  Architectural synthesis (w/ physical design considerations) Components  Reconfigurable computing:the MorphoSys Chip  Parameterizable components:PCL  Simultaneous MultiThreading vs. Chip MultiProcessing Architectural platform  Segmented busX-2000, Mars Pathfinder  Configurable SMP

13 13 Responsibilities Bagherzadeh, Chou, Kurdahi -- co-PIs  Oversee project operation  Integration into curriculum and related research efforts Li, Liu, Afshin -- RA's  Development of CAD tools  Modeling of demonstrator examples  Authoring of component / protocol library JPL  Furnish example specifications  Co-develop optimization techniques USC  Supporting link to low-level technologies

14 14 External collaborations JPL  X-2000 multi-mission architecture  Mars Pathfinder as baseline  JPL to provide COTS testbed  JPL to evaluate IMPACCT optimizations USC  Parameterizable components  Low-level power estimation Consystant Design Technologies (Seattle, WA)  Framework for component-based design  IMPACCT plugins to support power management

15 15 Technical Background

16 16 Background: MorphoSys project Reconfigurable processor array MIPS-like RISC processor High-bandwidth data interface 100 MHz clock 0.35µm 4metal CMOS Software support Platform for dynamic power management Advanced RISC Processor External Memory (e.g. SDRAM, RDRAM) System Bus Instr./Data Cache (L1) Reconfigurable Processor Array High Bandwidth Data Interface MorphoSys

17 17 RC Array and Context Memory RC 16 column block 16 row block Context Memory 2 blocks 8 sets in each block A set controls 1 row or column (SIMD) 16 contexts in 1 set. Possible to overlap ctx broadcast with ctx reloading

18 18 The M1 chip layout

19 19 M1 chip test fixture

20 20 TR_app a = b + c p = a + 1 TR_app a = b + c p = a + 1 TinyRISC RC Array App. (C Code) C++, VHDL MorphoSys Chip mcc Z=RC_F(X) W=RC_F(Y) mLoad Context Lib. mSched Executable RC Array functions MuLate, MorphoSim mView Configuration context Software environment

21 21 Background on USC's SMT work High performance processors  Superscalar processor (SSP)  Single chip multiprocessor (CMP)  Very long instruction word (VLIW)  Simultaneous multithreading (SMT) Performance and power dissipation  High performance need high power consumption Recent applications need for low power, high performance processor

22 22 Microarchitectural tradeoffs Power tradeoffs between different architectures  SMT vs. SSP: SMT has more modules than SSP SMT has better performance and consumes more power  SMT vs. CMP: SMT has better utilization They have similar performance, but SMT consumes less power  SMT vs. VLIW: SMT consume more power SMT has compatibility with conventional architecture Design of simple SMT  A simplified SMT may consume less power and still have the advantage of TLP Analysis of architectural features  Power drain of modern processor (control vs. data path)

23 23 SMT design methodology Measuring power consumption of a processor  Checking transitions of signals and module operations  Hardware implementation of the processor simulator Measuring performance of modules  The contribution of each module to the total performance  Performance-power ratio of each module Comparison between architectures Design of a low power processor

24 24 Measuring performance Finding the performance per power of each module  Simulate and measure the performance without a module  Calculate the performance per power for each module  Classify modules if more than two modules cooperate with each other Find the solution for the low power high performance processor

25 25 Background: Chinook project Component-based HW/SW codesign framework  Specification, simulation, synthesis  Motivated by IP reuse, system integration Problem: IP reuse forces modification  Reason:components have hardwired coordination protocols Approach  Adaptable components  Separate coordination protocols from components Benefits  Reuse without modification  Enable system-level optimizations

26 26 Example protocol: Subsumption Must handle three cases:  Subsuming, yielding, idle  Hardwired protocol Generalization:  Adaptable components (by mode mapping)  Separate protocols & components joystick bumper sonar wheels escape avoid override s s sensors actuators decision modules decision composition is y is y is y is y i i i i si y i si y i y s WBTF Bumper process yieldingsubsumingidle subsumption interface +B BF subsumingidle +W WB +subsuming yieldingsubsuming W B F W T 2s 45d bump release

27 27 Architectural mapping Single processor or multiple processors Multiple mappings to an architecture mode manager modal processes

28 28 Distributed mode managers Automatically partitioned among processors  Synthesized control communication  Comm. tradeoffs: synchronization, replication mode manager modal processes

29 29 Technical Presentation

30 30 “Sojourner” The Mars Pathfinder Microrover Flight Experiment Alpha Proton X-ray Spectrometer (APXS) Past missions – Mars Pathfinder

31 31 Application requirements System specification  6 wheel motors  4 steering motors  System health check  Hazard detection Power supply  Battery (non-rechargeable)  Solar panel Power consumption  Digital Computation, imaging, communication, control  Mechanical Driving, steering  Thermal Motors must be heated in low-temperature environment

32 32 Energy Required FunctionTime and Calculation 7.51W-hr 5.63W-hr 6.92W-hr 1.83W-hr 0.45W-hr 1.2W-hr 5.2W-hr 0.63W-hr 15.0W-hr 50W-hr 95W-hr motor heating: 1 motor at a time motor heating: 2 motors at a time driving (extreme terrain @ -80degC) hazard detection imaging (3 images @ 2 min/image) image compression (compress 3 images @ 6 min/image) 6Mbit communication @ 50min/sol 42, 10 sec health checks during day remainder of 7 hr daytime CPU operation WEB heating (as needed) = 7.51W x 1hr = 11.26W x 0.5hr = 13.85W x 0.5hr = 7.33W x 0.25hr = 4.5W x 0.1hr = 3.7W x 0.3hr = 6.27W x 0.8hr = 6.27W x 0.1hr = 3.7W x 4hr = 50W-hr System-level power budget

33 33 Design issues Timing constraints  System health check 10s/10min  Heating motor for 5s, 50s prior to driving  Hazard detection 10s – steering 5s – driving 10s Power management  Low-power electronics cannot make significant power saving  No system-level management tool available Conservative hand-crafted schedule  Serialize all operations to avoid power surge  Long execution time  Solar power wasted

34 34 Pancam/Mini-TES Mini-Corer Instrument Arm Cluster : Raman Spectrometer Alpha-Proton-X-Ray Spectrometer (APXS) Mössbauer Spectrometer Microscopic Imager Present missions – Athena/Mars ’03 Rover configuration

35 35 Athena/Mars ‘03 Rovers - power subsystem Power utilization:  38 W = 19 W (CPU&I/O) + 9 W (accel and gyro) + 10 W (wheel motors) for driving.  75 W = 19 W (CPU&I/O) + 55 W (transmission) for orbiter communication  30 W = 19 W (CPU&I/O) + 10 W (transmission) for lander relay communication  55 W = 19 W (CPU&I/O) + 33 W (peak motor) for drilling  29 W = 23 W (CPU&I/O) + 6 W (cameras) required for imaging  11 W Raman, 1.4W APXS and 2.3 W for nighttime spectrometer operation  141Whr daily for housekeeping engineering  75Whr limit for nighttime operations

36 36 Present missions – MUSES-CN Asteroid NanoRover Completely solar powered  Requiring only 1 watt, including an RF telecommunications system for communications between the rover and a lander or small-body orbiter for relay to Earth. Power source  500 grams of commercial, non- rechargeable, replaceable lithium batteries, with energy density of 750 joules per gram.

37 37 Power-aware designs Subsume low power as a special case  Minimize power consumption  Minimal application specific knowledge, limited reconfiguration space  Conservative Make best use of available power  Use MAX solar power while it's available  Increase parallelism, perform more tasks, reduce mission time  Both MIN and MAX power constraints Application-specific knowledge  Multiple mission requirement  Adapt to run-time power supply, operating environment

38 38 System-level power management Amdahl's law -- extended to power  Component-level improvements must be scaled by % contributions  Synergy between inter-component interactions Scope of system power model  Digital, mechanical, thermal  Battery model - control power surge  Renewable source - solar panel, etc Mission-driven tradeoffs  Execution time vs. power saving  Adapt to operating environment

39 39 What's needed? Reconfigurable system architecture  Statically configurable for different missions  Reconfiguration for dynamic power management  Support state-of-the-art power management policies System-level design tool  Support design space exploration  Take full advantage of COTS components  Optimize mission-specific system configuration  Synthesize system-level power manager  Support simulation for early validation

40 40 X2000 avionics system architecture Symmetric COTS multiprocessors  Low cost component with strong commercial support  Widely accepted specification, design, application and testing  Reduced development cost Dual system bus architecture  High speed data rate with moderate power  Low speed control with low power Industry standard bus protocols  FireWire (IEEE 1394) bus  I 2 C bus  Reconfigurable bus topology

41 41 PA system architecture The NASA X2000 Avionics System high-rate input (camera) high-speed bus (e.g. IEEE 1394) communication module (CDMA) bus power controller symmetric multiprocessor modules altimeter subnet microcontroller-directed subnet - power regulations & control - analog telemetry sensors - safety inhibits - valve & pyro drive reconfigurable hardware blocks low-speed bus (e.g. I 2 C )

42 42 Applicable power optimizations Application level  Scheduling under timing and power constraints  Task partitioning, allocation, migration  Algorithm selection Architecture level  Bus segmentation / clustering  Communication scheduling Component level  Voltage / frequency scaling  Power down X-2000 goals  Digital electronics power:10x decrease  Analog electronics power:2x decrease  Computer performance:10 to 20x increase both static & dynamic versions

43 43 The need for a system-level CAD tool Avoid pitfalls with manual design  Overdesign (too conservative)  Hardwired assumptions in implementation (hard to change/adapt)  System integration (bottleneck in projects) Scalable methodology  Specification: separation of concerns Behavior vs. architecture Policy vs. mechanism Constraint vs. implementation  Exploration Framework for technique integration Rapid feedback  Manage complexity Knowledge base for component/bus details Consistent knowledge propagation through design stages

44 44 Design tool Library  Components and bus protocols  Provides power estimation  Defines configuration space Authoring  Behavioral description, architecture description  Mapping from behavior to architecture Synthesis  Scheduling, partitioning  Bus segmentation, voltage scaling  Synthesis of power manager with task scheduler Simulation  High-level: explore design space  Detailed-level: power/performance for a given design point

45 45 Behavior Architecture high-level simulation functional partitioning & scheduling composition operators high-level components behavioral system model busses, protocols system architecture mapping system integration & synthesis static configuration dynamic power management IMPAC 2 T overview parameterizable components

46 46 Library: low-level components Supported components  COTS  Parameterizable Levels of abstraction  Parameterizable  Simulatable  Synthesizable  Reconfigurable VHDL code Bus width = 8Bus width = 16

47 47 Library: component definition Component interface  Physical:pin interface  Functional:data and control interface  Power, current, voltage Power/mode characterization  Mode governs power usage  Restrictions on mode changes allowed  High-level yet refined power estimation Aggregation  Smaller components combined into larger ones  New external parameters, interfaces, modes

48 48 Example components Processor :  PowerPC, ARM, Pentium, MIPS Microcontroller  StrongARM, Intel 8051, Motorola 68HC11, 68332 Bus controller/transceiver:  FireWire controller& transceiver  I2C bus controller, GPIB Memory  SRAM  DRAM  Flash memory

49 49 Example component definition FireWire bus transceiver: National Semi CS4103  Working voltage: 3.3 V  Power modes Full-on (400mW) PHY-on (150mW) Standby (50mW) CLK-disable (21mW) Crystal-disable (16mW) FireWire bus controller: National Semi CS4210  Working voltage: 3.3 V  Power modes Full-on (300mW) Standby (17mW) Aggregated bus transceiver/controller  Up to ten working modes to play with  Flexibility in power management

50 50 Library: bus protocols Architecture  Parallelism (parallel or serial)  Topology (serial, tree, ring)  Service layers (physical, link, transaction, application) Communication  Data transfer mode (asynchronouus, isochronous)  Data transfer speed  Response mode (need acknowledgement or not)  Arbitration mode Configuration  Configuration process (deterministic or randomly )  Reconfigurability (statical, hybrid, dynamical) Power  Power mode ( full-on, standby, deep-sleep, shutdown)  Media (cable, wireless, backplane)

51 51 Bus protocols exploration Explore bus protocol dimensions  Protocol simulation Input: bus protocol model Ouput: sequency of events  Map events into relative power quantities  Compare and tradeoff between different design points Example: simulating FireWire bus configuration  Event-driven simulator  Compare two designs with different topology Pure tree topology (acyclic) Tree topology with bus segmentation  Tree-ID process, 9 nodes Tree 37 events Segmented tree 24 events

52 52 Bus optimization Bus: a significant power consumer  Up to 30% - 50% of the total system power consumption[Mehra97]  Bus power consumption determined by Capacitance (load C and bus C, proportional to bus length) Voltage (bus supply voltage and swing voltage) Bus access frequency Bus signal switching activity Why bus power optimization?  System performance requirements  Power constraints  Adapt to execution time variations  Bus segmentation for increased bandwidth  Enable other novel power management techniques

53 53 Bus-level optimizations Bus encoding [Shin98][Benini97][Nakase98]  Minimize switching activity on bus  Makes sense mostly for parallel bus  Gray code, bus-invert code, T0 code and Beach code  Bus driver design Bus clustering (segmentation) [Mehra97][Zhang98]  Optimize bus topology by grouping components  Divide the global bus into multiple segments  Benefits: Reduced bus capacitance (power saving) Shorter bus latency, higher throughput, increased flexibility Partitioning [Hauck95][Yang94][Cong93]  Divide tasks among components  Minimize inter-cluster traffic  Clustering before partitioning

54 54 FireWire (IEEE 1394) High speed serial bus  100, 200, 400 Mbps in 1394a  800M, 1.6Gbps in 1394b Advantages  Low power  Real-time bandwidth guarantee => important for media apps  Isochronous and asynchronous transfer modes  Hot-pluggable, self reconfiguring  Supports bus segmentation

55 55 Legend CAM: camera MC: micro controller HD: hard drive NVM: non-volatile memory SCI: scientific equipment RF modem: radio frequency modem I2C bus omitted on this diagram FireWire 1394 Bus SCI HD / NVM CPU 1 RF Modem CAM MC 1 SCI 1 SCI 2 CPU2 (Bus controller) MC 2 MC 3 Tasks: MC's are responsible for sensing, drive control, steering control Capture picture, compress in CPU1, and send data to RF Modem SCI's carry out scientific experiments, sending data to CPU2 After analysis, CPU2 stores data in HD/ NVM X2000 architecture mapping Map Mars Rover application onto X2000 architecture

56 56 Bottlenecks in an unsegmented architecture Contention for bus bandwidth  Camera, RF, harddisk  Forces serialization of communication globally All nodes must be kept awake  Prevents component shutdown  Global overhead for bus reconfiguration Long routing path  Power overhead on routing controllers

57 57 Segmentation example Three bus segments SCI2RF ModemCAMMC1 HDMC2 SCI1 CPU2/ Bus controller CPU1/DSP MC3 MC sensing drive control steering control SCI scientific experiment CAM picture capture image compression RF transmission Suppose bus bandwidth is 100Mbps, image size 20Mb each, 20 pictures to work on, SCI data volume 16kbps X 10 Ks X 2 (4 hrs a day) Power numbers: CPU1: 4.0W CPU2: 240mW RF modem: 1.7 W Camera: 2.6 W SCI1: 0.8 W SCI2: 3.2 W Power number details

58 58 Bus segmentation with FireWire Blue nodes can't be disabled  All nodes’ PHY layers must remain active.  Request packets are broadcast to all nodes Gray nodes can be safely disabled  They are in different segments from the active ones.  Request packets are broadcast to only active nodes. segmentation

59 59 SCI2RF ModemCAMMC1 HDMC2 SCI1 CPU2/ Bus controller CPU1/DSP MC3 Throughput improvement 100Mbps bandwidth 9s transfer time 300Mbps 5s transfer time No useful traffic Bus segmentation help improve bus bandwidth. FireWire 1394 Bus SCI HD / NVM CPU 1 RF Modem CAM MC 1 SCI 1 SCI 2 CPU2 (Bus controller) MC 2 MC 3

60 60 SCI2RF ModemCAMMC1 HDMC2 SCI1 CPU2/ Bus controller CPU1/DSP MC3 Bandwidth-enabled voltage scaling Use voltage scaling and clock scaling to decrease component power. Bandwidth 100Mbps Power consumption = 12.3 W Could be 300Mbps, keep it at 100Mbps Power consumption after voltage scaling = 9.2 W

61 61 Power/latency reduction energy consumption = 46 J Power consumption after voltage scaling = 9.2 W Data transfer time = 5 s Note: bus configuration power not counted Power consumption = 12.3 W Data transfer time = 9 s energy consumption = 111 J energy saving 58% Power saving 25%

62 62 Segmentation-enabled shutdown All components’ bus interfaces are active. Entire bus is hot. Non-operating bus segments are disabled. Non-operating components are disabled. Bus power is saved. Drive control (10 min.) Drive control (20 min.) Picture capture (6 min.) Science experiment (20 min.)

63 63 Combined energy savings from static techniques Shutting down inactive nodes: 27 times of global bus configs. Only 11 bus configurations Config energy << 165 J Transceiver energy 1962 J Config energy + transceiver energy < 1962 + 165 = 2127 J Not shutting down inactive nodes: Bus transceiver active all the time. Transceiver energy: 150 mW x 10 x 3360 s = 5040 J Transceiver: National Semi CS4103, PHY-active only mode. 2.4 X energy reduction!

64 64 Dynamic bus reconfiguration SCI2 RF Modem CAMMCS1 HDMCS2 SCI1 CPU2/ Bus controller CPU1/DS P MCS3 Solution: dynamically change bus topology Science experiments Radio frequency data transfer SCI+RF (20+60 min) SCI2 RF Modem CAMMC1 HDMC2 SCI1 CPU2/ Bus controller CPU1 MC3 New task: send data from HD to RF modem! (continue from previous task ) Science experiments Radio frequency data transfer SCI+RF (20+60 min)

65 65 Energy savings from dynamic bus reconfiguration Local configuration: 3 Global configuration: none re-segmentation : none Active transceiver: 7 Active bus segment: 2 Energy: 12.7 x 3 x 1+ 0.15 x 7 x 4800 = 5078 J Local configuration: none Global configuration: 1 re-segmentation : 1 Active transceiver: 3+2 Active bus segment: 1 Power number list: Local config: 12.7W Global config: 23.7W Active transceiver: 150mW Segmentation: software support Bus segment: proportional to bus length Energy: 23.7 x 1 x 1+ 0.15 x 3 x 4800 + 0.05 x 2 x 4800 = 2664 J 1.9 X energy reduction!

66 66 Summary of architecture optimization Towards loose coupling  Reduced bus contention  Increased parallel bandwidth  Enabling voltage/frequency scaling Application-driven clustering  Communication bandwidth requirements between processes  Knowledge from high-level behavioral model Static optimization2.4x energy reduction  Bus segmentation  Cluster shutdown Dynamic reclustering1.9x energy reduction

67 67 Power management & optimization Behavioral modeling  Extract power related attributes of all objects Architecture modeling  Use low-power devices or devices that can operate on low-power mode Partitioning  Migration – merge computations on under-utilized processors on one processor to improve utilization  Segmentation – separate tightly coupled computations into clusters to localize communication Scheduling  Arrange operation sequences on multi-processor / multiple power consumer to meet both performance and power requirement

68 68 Behavioral model Application specific knowledge  Input, output and function  Dependency and precedence  Control and data flow  Timing and sequence Software architecture  Operating system features – real-time, centralized, distributed, and etc.  Execution model – event driven, interrupt, distributed agent, client- server, and etc.  Communication model – protocol stack and specification Power related attributes  Data rate, execution time, CPU speed, memory size, communication path, and etc.

69 69 Allocation Map behavioral objects to hardware  Group related OS, communication, control and application objects into processing nodes  Extract data objects into storage nodes  Allocate components/packages for each processing node  Arrange data storage for data nodes and optimize storage location to reduce communication Map communication paths to busses  Setup working mode of each component/package to fit the behavioral requirement  Extract attribute of each structure Function – computation, control, communication CPU utilization Bus traffic Power consumption

70 70 Scheduling Mapping of tasks to time slots  Computation  Communication Mapping of power usage to time slots  Mechanical devices  Thermal subsystems  Other electronics subsystems Constraints  Real-time deadlines, periods, min/max separation  Power budget, power surge (min/max)  Potentially scenario-driven

71 71 Scheduling techniques Deadline based real-time scheduling on multiprocessors  Rate-monotonic scheduling – extend existing RM scheduling to multiprocessors  Timing constraint graph scheduling – multiple serializable sequences in a single heart beat

72 72 Novel IMPACCT scheduler A novel graphical tool  Timing and power constraint visualization  Transforms them into graph problems  Give designers a vision to the power surge at run-time Complete system-level model  All power sources  All power consumers Power-aware scheduling  Schedule operations based on power source output  Both performance requirement and power constraint  Regulate power surge  Optimize for power efficiency and reduce execution time

73 73 Power Time Starting timeEnding time Power levelEnergy consumption Demo IMPACCT scheduler Extended Gantt-chart in real-time scheduling for single processor  Event – bins Timing – horizontal size Power – vertical size Energy – area of the bin  Power surge – compacting bins downward

74 74 A BBBB C CCC C DDD Constant task A Periodic task B Periodic task C Task D follows B Power Time Demo IMPACCT scheduler Scheduling chart for multi-processor and multiple power consumers  Events can overlap vertically Multi-processor Multiple power consumer – electronics, mechanical, thermal  Power awareness – min and max power supply

75 75 A B C D Power Time B C Deadline of B (scheduling space) Deadline of B Min timing constraint of D Max timing constraint of D Deadline of C (scheduling space) Deadline of C Scheduling space of D Slide bin within timing space Squeeze/extend bin to available time slot C C Demo IMPACCT scheduler Timing constraints – bin packing problem to satisfy horizontal constraints  Independent tasks – moving bins horizontally  Dependent tasks – moving grouped bins horizontally  Power/voltage/clock scaling – extending/squeezing bins

76 76 A B C D Power Time B Manual scheduling while monitoring power surge C A B C D Power Time B Attack spike Automated global scheduling to meet min-max power CC Max Min Improve utilization Demo IMPACCT scheduler Power constraints – bin packing problem to satisfy vertical constraints  Automatic optimization – let the tool do everything  Manual optimization – visualizing power in manual scheduling

77 77 Example revisited – Mars Rover System specification  6 wheel motors  4 steering motors  System health check  Hazard detection Power supply  Battery (non-rechargeable)  Solar panel Power consumption  Digital Computation, imaging, communication, control  Mechanical Driving, steering  Thermal Motors must be heated in low-temperature environment

78 78 Timing constraints – Mars Rover

79 79 Scheduling method Constraint graph construction  Nodes: operations  Edges: precedence relationship between operations Resource specification  Resource: an executing unit that can perform operations independently Six thermal resources for wheel heating Four thermal resources for steer motor heating One mechanical resource for driving One mechanical resource for steering One computation resource for control  Operations on one resource must be serialized Scheduling  Primary resource selection  Schedule primary resource by applying graph algorithms  Auxiliary resources and power requirement are considered as scheduling constraints

80 80 Constraint graph System health check / T hc t hc -(t hc + T hc ) Heat wheel 1 / T hw Heat wheel 2 / T hw Heat wheel 3 / T hw Heat wheel 4 / T hw Heat wheel 5 / T hw Heat wheel 6 / T hw Heat steer 2 / T hs Heat steer 3 / T hs Heat steer 4 / T hs Hazard detection / T hd Steer / T s Drive / T d - t hw -t hs Heat steer 1 / T hs

81 81 -t hs + T hs_E -t hw + T hw_E t hc -(t hc + T hc ) Resource specification Hazard detection (C) / T hc / P hc_C Health check (C) / T hc / P hc_C Heat steer i (C) / T hs_C / P hs_C Heat steer i (T) / T hs_T / P hs_T Heat wheel j (C) / T hw_C / P hw_C Heat wheel j (T) / T hw_T / P hw_T Steer (C) / T s_C / P s_C Steer (M) / T s_M / P s_M Drive (C) / T d_C / P d_C Drive (M) / T d_M / P d_M Health check (C) / T hc / P hc_C Computation Mechanical Thermal Heat steer i Heat wheel j Health check Steer Drive Hazard detection

82 82 Scheduling graph Hazard detection (C) / T hc / P hc_C Heat steer i (C) / T hs_E / P hs_E Heat steer i (T) / T hs_T / P hs_T Heat wheel j (C) / T hw_E / P hw_E Heat wheel j (T) / T hw_T / P hw_T Steer (C) / T s_C / P s_C Steer (M) / T s_M / P s_M Drive (C) / T d_C / P d_C Drive (M) / T d_M / P d_M -t hs + T hs_E -t hw Primary resource: Computation Auxiliary resource: Mechanical Auxiliary resource: Thermal Health check (C) / T hc / P hc_C t hc -(t hc + T hc ) -t hs -t hw + T hw_E -T s_C + T s_M

83 83 Example – Mars Rover Power constraints  Different solar power supply over time  Different power consumption over temperature/time

84 84 System heart-beat - moving two steps (a) Begin with health check (b) no health check Previous solution by JPL Over-constrained, conservative  Serialize every operation to satisfy power constraint  Longer execution time and under-utilization of solar power  No scheduling tool is used – manual scheduling Not power-aware  Scheduling without considering power sources and consumers

85 85 System heart-beat - moving two steps (a) Begin with health check (b) no health check Solution 1: high solar power (14.9W) Max solar power: 14.9W at noon  Improved utilization of solar power  Automated scheduling – use scheduling tools Aggressive – do as much as possible  heating motors while doing other operations  Fastest moving speed – no waiting on heating

86 86 System heart-beat - moving two steps (a) Begin with health check (b) no health check Solution 2: typical solar power (12W) Moderate solar power output – 12W  Improved utilization of solar power  Automated scheduling – use scheduling tools Moderately aggressive – avoid exceeding power limit  Relaxed constraint –heating motors while doing other operations  Faster moving speed – some waiting time on heating

87 87 System heart-beat - moving two steps (a) Begin with health check (b) no health check Solution 3: low solar power (9W) Minimum solar power output – 9W  Restricted constraint – serialize operations  Automated scheduling – use scheduling tools Conservative – same as JPL solution  Slow moving speed  Full utilization of low solar power

88 88 Comparison JPL's previous solution  Conservative – long execution time, low solar power utilization  Not power aware – same schedule for all cases  Not intend to use battery energy Our solution  Adaptive – speedup when solar power supply is high  Power-aware – smart scheduling on different power supply/consumption  Use battery energy when necessary

89 89 Application-level evaluation Mission description  Target location – 48 (distance-) steps away from current location Power condition  14.9W solar power for first 10 minutes, 12W for next 10 minutes, 9W thereafter Metrics  Execution time  Total energy drawn from battery

90 90 Application-level evaluation Power-awareness  Execution speed scales with power condition adaptively Smart schedule  Maximize best case  Avoid worst case Tradeoff  Power vs. performance  Energy renewability Application-specific  Application-level knowledge  Working mode parameters of components

91 91 Program plans and milestones

92 92 Development plans Web-based CAD tool  Perl/CGI scripts for configuration  Java applets for interactive scheduling UI  Interface with database engine Interface with commercial CAD backend  Detailed power estimation tools  Functional simulation with proprietary models Rationale  No software installation needed by end user  Ready to use by everyone on the Internet  Open source with all publicly available development tools

93 93 Status & accomplishments to date

94 94 July 2000 Aug 2000 Sept 2000 Oct 2000 Nov 2000 Dec 2000 Jan 2001 core tool UI Library Authoring Partitioning Scheduling Segmentation Volt. Scaling Simulation IMPACCT schedule plannedin progress

95 95 Original schedule 2Q 00 Kickoff 2Q 01 2Q 02 System modeling Coordination synthesis Architecture definition Static partitioning Component partitioning System modeling Coordination synthesis Architecture definition Static partitioning Component partitioning Component simulator PCL benchmarking Synthesizable components System benchmarking Component simulator PCL benchmarking Synthesizable components System benchmarking Power aware design techniques PCL definition Simulatable components Benchmark Identification Power aware design techniques PCL definition Simulatable components Benchmark Identification Authoring tool v1.0 Dynamic partitioning Simulator v1.0 Component partitioning Authoring tool v1.0 Dynamic partitioning Simulator v1.0 Component partitioning network option

96 96 Updated schedule 2Q 00 Kickoff 2Q 01 2Q 02 Static & hybrid optimizations  Partitioning / allocation  Scheduling  Bus segmentation  Voltage scaling Library  COTS components  FireWire and I2C bus models Static composition authoring High-level simulation Benchmark Identification Architecture definition Static & hybrid optimizations  Partitioning / allocation  Scheduling  Bus segmentation  Voltage scaling Library  COTS components  FireWire and I2C bus models Static composition authoring High-level simulation Benchmark Identification Architecture definition Dynamic optimizations  Task migration  Processor shutdown  Bus segmentation  Frequency scaling Library  Parameterizable components  Parameterizable bus models Reconfiguration authoring Architecture reconfiguration Low-level simulation System benchmarking Dynamic optimizations  Task migration  Processor shutdown  Bus segmentation  Frequency scaling Library  Parameterizable components  Parameterizable bus models Reconfiguration authoring Architecture reconfiguration Low-level simulation System benchmarking option Year 1Year2

97 97 Quarterly schedule 3Q 2001 FireWire and I2C bus models Static bus segmentation Architecture definition Low-level simulation System benchmarking Frequency scaling High-level simulation Hybrid partitioning / allocation Voltage scaling Parameterizable components Dynamic scheduling Parameterizable bus models 2000 4Q 1Q 2Q 3Q 4Q 2002 1Q 2Q COTS components library Static scheduling Benchmark identification Static partitioning / allocation Hybrid scheduling Static composition authoring Dynamic processor shutdown Dynamic bus segmentation Dynamic reconfig. authoring Hybrid bus segmentation Architecture reconfiguration Dynamic task migration 2001

98 98 Financial information

99 99 IMPACCT budget Months 1-6$180,000 Months 7-12$180,000 Second year$400,000

100 100 Budget distribution

101 101 http://www.ece.uci.edu/impacct/

102 102 Bibliography [Mehra97] R. Mehra, et al. "A partitioning scheme for optimizing Interconnect power", IEEE Journal of solid-state circuits, Vol. 32, No.3, March 1997 [Shin98] Y. Shin, et al. "Reduction of bus transitions with partial bus-invert coding", Electrons Letters, vol.34, No.7, IEE 2 April 1998 p. 642-3 [Benini97 ] L. Benini et al. "Asymptotic zero-transition activity encoding for address buses in low-power microprocessor-based systems", Proceedings Great Lakes Symposium on VLSI, Los Alamitos, CA, USA: IEEE Comput. Soc. Press, 1997, p.77-82 [Nakase98] Y. Nakase et al. "Complementary half-swing bus architecture and its application for wide band SRAM macros", IEE proceedings-Circuits, Devices and Systems, vol.145, No.5 IEE, Oct 1998, p337-42 [Zhang98] Y. Zhang et al. "An alternative architecture for on-chip global interconnect: segmented bus power modeling", Thirty-Second Asilomar Conference on Signals, Systems and Computers, Pacific Grove, CA, USA, 1-4 Nov. 1998. [Kernighan70] B. Kernighan et al. “An Efficient Heuristic Procedure for Partitioning Graphs”, Bell System technical Journal Vol. 49 No.2, Feb. 1970 p291-307 [Hauck95] S. Hauck et al. “Logic Partition Orderings for Multi-FPGA Systems”, International Symposium on Field-Programmable Gate Arrays, 1995

103 103 Program Goals Evaluation, exploration  power usage, performance, cost  alternative configurations, algorithms Optimization  achieve most effective power usage  high-level, global knowledge Tool integration  many point tools, independent techniques Specialization  configurable platform Reuse  take advantage of rich collection of COTS  not to re-design from scratch

104 104 Technical approach High-level abstraction  component vs. composition  Separate models for architecture and behavior Synthesis and optimization of power manager  Architecture reconfiguration  Scheduling for optimal power usage  adaptable to different power management policies Aggressive, domain-knowledge  Encompass mechanical / thermal power  Aware of power supply model

105 105 System level modeling Architectural modeling  COTS components  component encapsulation  bus architecture  system interconnect Behavioral modeling  Application specific knowledge  Software architecture  Mission goals  High level constraints

106 106 Power-aware coordination Protocols  Coordinate power usage e.g. peak power, resource arbitration  Multiple versions of given algorithm Components  Adaptable to different power management policies, not hardwired  Usable in new applications even if not designed to be power aware! Synthesis  Coordination controller (“mode manager”)  Optimization to minimize control dependency  Optimality depends on architectural mapping

107 107 Measuring power consumption (1) Different levels of analysis by  # of operations: (+) easy to implement (-) neglect of different sizes of modules Appropriate to compare two different architectures with similar modules  # of lines of code: (+) assume the size of hardware to be implemented (-) may be too simple to estimate power consumption With the number of operations, gives a indication of the power consumption of each module  # of F/F: (+) more accurate measure (-) should find the relationship between # of F/F and # of lines of code The number of F/F is the lowest hardware characteristics in the high level simulator Control unit and data path have different power dissipation pattern even with same amount of gates

108 108 Measuring power consumption (2)  # of gates: (+) Makes accurate power estimation possible (-) needs Register transfer level (RTL) description and power analysis tools To get accurate hardware information, we have to implement RTL modules Input/output statistics of each module are also necessary

109 109 USC's Work in Progress Select a processor simulator Analyze the hardware description of each module Estimate the power consumption of each module Find performance-power ratio Design a minimum power processor model

110 110 Program impact & transitions Productivity  Fully exploit off-the-shelf components  Rapid turnaround time to architecture Massive Scalability  Protocol based power management  System architecture platform Robust methodology  Unified functional/power correctness  Confidence in complex design points

111 111 Bus Architecture Perspectives (X) Parallelism  Parallel: high cost, high throughput, enable design exploration  Serial: low cost, constrained throughput, simple bus interface Locality  Functional  Spatial Adaptivity  Adaptive  Deterministic

112 112 Communication model  asynchronous transfer  isochronous transfer Arbitration model  Fair gap arbitration  Priority arbitration Configuration model  Bus initialization  Tree identification  Self identification FireWire (IEEE 1394) bus Service model  Physical layer  Link layer  Transaction layer

113 113 Architectural Model Component – parameterized COTS  Type – processor, memory, I/O, DSP, bus, and etc.  Interface – how the components can be connected to each other  Modes – operation modes parameters, voltage, clock speed, bandwidth, power consumption, and etc. Package – a bundle of connected components that performs certain operation  A set of connected components  Internal/external interface – how components are connected  Modes – configuration space of the collected components specified by each component’s working mode and collective attributes, e.g., voltage, speed, power and etc.

114 114 Approach: system-level modeling High-level abstractions  Employ application specific knowledge in system models  Encompass multiple domains – electronics, mechanical, thermal System modeling  Behavioral modeling – software architecture, application specific knowledge  Architectural modeling – hardware platform built on top of parameterized components  Partitioning – mapping behavioral objects to architectural structures  Scheduling – a valid sequence of concurrent/parallel operations on multiple processors that satisfies real-time requirement

115 115 Example – Mars Rover System specification  6 wheel motors  4 steering motors  System health check  Hazard detection Power supply  Battery (non-rechargeable)  Solar panel Power consumption  Digital computation, imaging, communication, control  Mechanical driving, steering  Thermal motors must be heated in low-temperature environment

116 116 Scheduling example – Mars Rover Power constraints  Solar panel: 14.9W peak power @ noon, 11W for 6hr/sol  Battery: 10W max power output. 150W-hr energy storage  CPU: 3.7W, constant for 4h/sol  Health check: 6.3W, 10s  Hazard detection: 7.3W, 10s  Heating: 7.5W (1 motor) or 11.3W (2 motors), 5s  Steering: 6.8W, 5s (7º/s)  Driving: 12.4W, 10s (7cm) Existing solution  Serialize each operation to satisfy power constraint  Conservative – longer execution time and under utilization of solar power  No scheduling tool is used

117 117 Scheduling techniques Constraint logic solving  Transfer all constraints into a pure mathematical form  Use tools to solve the problem in mathematical domain Example – CLPR  Constraints C1 > 3, C1 2, C2 < 4 # two power consumers C1 + C2 6, S < 12 # one power source  Inputs C1 = 4.5, S = 7  Results C2 < 2.5 2 < C2

118 118 Evaluation Application level evaluation  Metrics based on overall mission objectives  Constraint-driven solutions Power related scenario  Various power constraint (supply/consumption) over different stages of application  Power-aware adaptive scheduling for different stages


Download ppt "1 Integrated Management of Power Aware Computing & Communication Technologies Kickoff review meeting Nader Bagherzadeh, Pai H. Chou, Fadi Kurdahi University."

Similar presentations


Ads by Google