Download presentation
Presentation is loading. Please wait.
Published byWinfred Doyle Modified over 9 years ago
1
1 Integrated Management of Power Aware Computing & Communication Technologies Kickoff review meeting Nader Bagherzadeh, Pai H. Chou, Fadi Kurdahi University of California, Irvine, ECE Dept. DARPA Contract F33615-00-1-1719 September 27, 2000
2
2 Agenda Introduction and overview Management status, financial, milestones, schedule. Technical presentation Task progress Architecture Applications CAD Lessons learned, challenges, issues. Questions + action items review.
3
3 Outline Introduction Program goals Project overview Management status Personnel and teaming plans Plans and milestones Financial information Technical presentation Background Technical approach Status and accomplishments Current detailed schedule Program impact and anticipated transitions
4
4 Introduction
5
5 Program Goals Power-aware system-level design Enhance mission success (time, task) Rapid customization for different missions Design tool Exploration & evaluation Optimization& specialization Technique integration System architecture Statically configurable Dynamically adaptive Use COTS parts & protocols
6
6 Technical approach High-level specification Separate behavior from architecture Explicit constraints (timing, power) Library characterization System synthesis tool Source-aware power usage scheduling Bus topology transformation and communication scheduling Configurable architecture Task migration & selective shutdown Bus segmentation and voltage scaling Domain knowledge Encompass mechanical / thermal power Aware of power supply model
7
7 Quad Chart Innovations Component-based power-aware design Exploit off-the-shelf components & protocols Best price/performance, reliable, cheap to replace CAD tool for global power policy optimization Optimal partitioning, scheduling, configuration Manage entire system, including mechanical & thermal Power-aware reconfigurable architectures Reusable platform for many missions Bus segmentation, voltage / frequency scaling Impact Enhanced mission success More task for the same power Dramatic reduction in mission completion time Cost saving over a variety of missions Reusable platform & design techniques Fast turnaround time by configuration, not redesign Confidence in complex design points Provably correct functional/power constraints Retargetable optimization to eliminate overdesign Power protocol for massive scale Behavior Architecture high-level simulation functional partitioning & scheduling composition operators high-level components behavioral system model busses, protocols system architecture mapping system integration & synthesis static configuration dynamic power management parameterizable components 2Q 00 Kickoff 2Q 01 2Q 02 Static & hybrid optimizations partitioning / allocation scheduling bus segmentation voltage scaling COTS component library FireWire and I2C bus models Static composition authoring Architecture definition High-level simulation Benchmark Identification Dynamic optimizations task migration processor shutdown bus segmentation frequency scaling Parameterizable components library Generalized bus models Dynamic reconfiguration authoring Architecture reconfiguration Low-level simulation System benchmarking Year 1Year 2
8
8 Innovations Component-based power-aware design Exploit off-the-shelf components & protocols COTS offer best price/performance, reliable, cheap to replace CAD tool for global power policy optimization Optimal partitioning, scheduling, configuration Manage entire system, including mechanical & thermal Power-aware reconfigurable architectures Reusable platform for many missions Bus segmentation, voltage / frequency scaling
9
9 Impact Enhanced mission success More task for the same power Dramatic reduction in mission completion time Cost saving over a variety of missions Reusable platform & design techniques Fast turnaround time by configuration, not redesign Confidence in complex design points Provably correct functional/power constraints Retargetable optimization to eliminate overdesign Power protocol for massive scale
10
10 Management Status
11
11 Personnel & teaming plans UC Irvine, Co-PI's- Design tools Nader Bagherzadeh Pai Chou Fadi Kurdahi UC Irvine, research assistants Dexin Li Jinfeng Liu Afshin Niktash USC- Component power optimization Jean-Luc Gaudiot Seong-Won Lee JPL- Applications and benchmarking Nazeeh Aranki Nikzad “Benny” Toomarian
12
12 Previous work Design tools System-level:the Chinook HW/SW codesign tool Architectural synthesis (w/ physical design considerations) Components Reconfigurable computing:the MorphoSys Chip Parameterizable components:PCL Simultaneous MultiThreading vs. Chip MultiProcessing Architectural platform Segmented busX-2000, Mars Pathfinder Configurable SMP
13
13 Responsibilities Bagherzadeh, Chou, Kurdahi -- co-PIs Oversee project operation Integration into curriculum and related research efforts Li, Liu, Afshin -- RA's Development of CAD tools Modeling of demonstrator examples Authoring of component / protocol library JPL Furnish example specifications Co-develop optimization techniques USC Supporting link to low-level technologies
14
14 External collaborations JPL X-2000 multi-mission architecture Mars Pathfinder as baseline JPL to provide COTS testbed JPL to evaluate IMPACCT optimizations USC Parameterizable components Low-level power estimation Consystant Design Technologies (Seattle, WA) Framework for component-based design IMPACCT plugins to support power management
15
15 Technical Background
16
16 Background: MorphoSys project Reconfigurable processor array MIPS-like RISC processor High-bandwidth data interface 100 MHz clock 0.35µm 4metal CMOS Software support Platform for dynamic power management Advanced RISC Processor External Memory (e.g. SDRAM, RDRAM) System Bus Instr./Data Cache (L1) Reconfigurable Processor Array High Bandwidth Data Interface MorphoSys
17
17 RC Array and Context Memory RC 16 column block 16 row block Context Memory 2 blocks 8 sets in each block A set controls 1 row or column (SIMD) 16 contexts in 1 set. Possible to overlap ctx broadcast with ctx reloading
18
18 The M1 chip layout
19
19 M1 chip test fixture
20
20 TR_app a = b + c p = a + 1 TR_app a = b + c p = a + 1 TinyRISC RC Array App. (C Code) C++, VHDL MorphoSys Chip mcc Z=RC_F(X) W=RC_F(Y) mLoad Context Lib. mSched Executable RC Array functions MuLate, MorphoSim mView Configuration context Software environment
21
21 Background on USC's SMT work High performance processors Superscalar processor (SSP) Single chip multiprocessor (CMP) Very long instruction word (VLIW) Simultaneous multithreading (SMT) Performance and power dissipation High performance need high power consumption Recent applications need for low power, high performance processor
22
22 Microarchitectural tradeoffs Power tradeoffs between different architectures SMT vs. SSP: SMT has more modules than SSP SMT has better performance and consumes more power SMT vs. CMP: SMT has better utilization They have similar performance, but SMT consumes less power SMT vs. VLIW: SMT consume more power SMT has compatibility with conventional architecture Design of simple SMT A simplified SMT may consume less power and still have the advantage of TLP Analysis of architectural features Power drain of modern processor (control vs. data path)
23
23 SMT design methodology Measuring power consumption of a processor Checking transitions of signals and module operations Hardware implementation of the processor simulator Measuring performance of modules The contribution of each module to the total performance Performance-power ratio of each module Comparison between architectures Design of a low power processor
24
24 Measuring performance Finding the performance per power of each module Simulate and measure the performance without a module Calculate the performance per power for each module Classify modules if more than two modules cooperate with each other Find the solution for the low power high performance processor
25
25 Background: Chinook project Component-based HW/SW codesign framework Specification, simulation, synthesis Motivated by IP reuse, system integration Problem: IP reuse forces modification Reason:components have hardwired coordination protocols Approach Adaptable components Separate coordination protocols from components Benefits Reuse without modification Enable system-level optimizations
26
26 Example protocol: Subsumption Must handle three cases: Subsuming, yielding, idle Hardwired protocol Generalization: Adaptable components (by mode mapping) Separate protocols & components joystick bumper sonar wheels escape avoid override s s sensors actuators decision modules decision composition is y is y is y is y i i i i si y i si y i y s WBTF Bumper process yieldingsubsumingidle subsumption interface +B BF subsumingidle +W WB +subsuming yieldingsubsuming W B F W T 2s 45d bump release
27
27 Architectural mapping Single processor or multiple processors Multiple mappings to an architecture mode manager modal processes
28
28 Distributed mode managers Automatically partitioned among processors Synthesized control communication Comm. tradeoffs: synchronization, replication mode manager modal processes
29
29 Technical Presentation
30
30 “Sojourner” The Mars Pathfinder Microrover Flight Experiment Alpha Proton X-ray Spectrometer (APXS) Past missions – Mars Pathfinder
31
31 Application requirements System specification 6 wheel motors 4 steering motors System health check Hazard detection Power supply Battery (non-rechargeable) Solar panel Power consumption Digital Computation, imaging, communication, control Mechanical Driving, steering Thermal Motors must be heated in low-temperature environment
32
32 Energy Required FunctionTime and Calculation 7.51W-hr 5.63W-hr 6.92W-hr 1.83W-hr 0.45W-hr 1.2W-hr 5.2W-hr 0.63W-hr 15.0W-hr 50W-hr 95W-hr motor heating: 1 motor at a time motor heating: 2 motors at a time driving (extreme terrain @ -80degC) hazard detection imaging (3 images @ 2 min/image) image compression (compress 3 images @ 6 min/image) 6Mbit communication @ 50min/sol 42, 10 sec health checks during day remainder of 7 hr daytime CPU operation WEB heating (as needed) = 7.51W x 1hr = 11.26W x 0.5hr = 13.85W x 0.5hr = 7.33W x 0.25hr = 4.5W x 0.1hr = 3.7W x 0.3hr = 6.27W x 0.8hr = 6.27W x 0.1hr = 3.7W x 4hr = 50W-hr System-level power budget
33
33 Design issues Timing constraints System health check 10s/10min Heating motor for 5s, 50s prior to driving Hazard detection 10s – steering 5s – driving 10s Power management Low-power electronics cannot make significant power saving No system-level management tool available Conservative hand-crafted schedule Serialize all operations to avoid power surge Long execution time Solar power wasted
34
34 Pancam/Mini-TES Mini-Corer Instrument Arm Cluster : Raman Spectrometer Alpha-Proton-X-Ray Spectrometer (APXS) Mössbauer Spectrometer Microscopic Imager Present missions – Athena/Mars ’03 Rover configuration
35
35 Athena/Mars ‘03 Rovers - power subsystem Power utilization: 38 W = 19 W (CPU&I/O) + 9 W (accel and gyro) + 10 W (wheel motors) for driving. 75 W = 19 W (CPU&I/O) + 55 W (transmission) for orbiter communication 30 W = 19 W (CPU&I/O) + 10 W (transmission) for lander relay communication 55 W = 19 W (CPU&I/O) + 33 W (peak motor) for drilling 29 W = 23 W (CPU&I/O) + 6 W (cameras) required for imaging 11 W Raman, 1.4W APXS and 2.3 W for nighttime spectrometer operation 141Whr daily for housekeeping engineering 75Whr limit for nighttime operations
36
36 Present missions – MUSES-CN Asteroid NanoRover Completely solar powered Requiring only 1 watt, including an RF telecommunications system for communications between the rover and a lander or small-body orbiter for relay to Earth. Power source 500 grams of commercial, non- rechargeable, replaceable lithium batteries, with energy density of 750 joules per gram.
37
37 Power-aware designs Subsume low power as a special case Minimize power consumption Minimal application specific knowledge, limited reconfiguration space Conservative Make best use of available power Use MAX solar power while it's available Increase parallelism, perform more tasks, reduce mission time Both MIN and MAX power constraints Application-specific knowledge Multiple mission requirement Adapt to run-time power supply, operating environment
38
38 System-level power management Amdahl's law -- extended to power Component-level improvements must be scaled by % contributions Synergy between inter-component interactions Scope of system power model Digital, mechanical, thermal Battery model - control power surge Renewable source - solar panel, etc Mission-driven tradeoffs Execution time vs. power saving Adapt to operating environment
39
39 What's needed? Reconfigurable system architecture Statically configurable for different missions Reconfiguration for dynamic power management Support state-of-the-art power management policies System-level design tool Support design space exploration Take full advantage of COTS components Optimize mission-specific system configuration Synthesize system-level power manager Support simulation for early validation
40
40 X2000 avionics system architecture Symmetric COTS multiprocessors Low cost component with strong commercial support Widely accepted specification, design, application and testing Reduced development cost Dual system bus architecture High speed data rate with moderate power Low speed control with low power Industry standard bus protocols FireWire (IEEE 1394) bus I 2 C bus Reconfigurable bus topology
41
41 PA system architecture The NASA X2000 Avionics System high-rate input (camera) high-speed bus (e.g. IEEE 1394) communication module (CDMA) bus power controller symmetric multiprocessor modules altimeter subnet microcontroller-directed subnet - power regulations & control - analog telemetry sensors - safety inhibits - valve & pyro drive reconfigurable hardware blocks low-speed bus (e.g. I 2 C )
42
42 Applicable power optimizations Application level Scheduling under timing and power constraints Task partitioning, allocation, migration Algorithm selection Architecture level Bus segmentation / clustering Communication scheduling Component level Voltage / frequency scaling Power down X-2000 goals Digital electronics power:10x decrease Analog electronics power:2x decrease Computer performance:10 to 20x increase both static & dynamic versions
43
43 The need for a system-level CAD tool Avoid pitfalls with manual design Overdesign (too conservative) Hardwired assumptions in implementation (hard to change/adapt) System integration (bottleneck in projects) Scalable methodology Specification: separation of concerns Behavior vs. architecture Policy vs. mechanism Constraint vs. implementation Exploration Framework for technique integration Rapid feedback Manage complexity Knowledge base for component/bus details Consistent knowledge propagation through design stages
44
44 Design tool Library Components and bus protocols Provides power estimation Defines configuration space Authoring Behavioral description, architecture description Mapping from behavior to architecture Synthesis Scheduling, partitioning Bus segmentation, voltage scaling Synthesis of power manager with task scheduler Simulation High-level: explore design space Detailed-level: power/performance for a given design point
45
45 Behavior Architecture high-level simulation functional partitioning & scheduling composition operators high-level components behavioral system model busses, protocols system architecture mapping system integration & synthesis static configuration dynamic power management IMPAC 2 T overview parameterizable components
46
46 Library: low-level components Supported components COTS Parameterizable Levels of abstraction Parameterizable Simulatable Synthesizable Reconfigurable VHDL code Bus width = 8Bus width = 16
47
47 Library: component definition Component interface Physical:pin interface Functional:data and control interface Power, current, voltage Power/mode characterization Mode governs power usage Restrictions on mode changes allowed High-level yet refined power estimation Aggregation Smaller components combined into larger ones New external parameters, interfaces, modes
48
48 Example components Processor : PowerPC, ARM, Pentium, MIPS Microcontroller StrongARM, Intel 8051, Motorola 68HC11, 68332 Bus controller/transceiver: FireWire controller& transceiver I2C bus controller, GPIB Memory SRAM DRAM Flash memory
49
49 Example component definition FireWire bus transceiver: National Semi CS4103 Working voltage: 3.3 V Power modes Full-on (400mW) PHY-on (150mW) Standby (50mW) CLK-disable (21mW) Crystal-disable (16mW) FireWire bus controller: National Semi CS4210 Working voltage: 3.3 V Power modes Full-on (300mW) Standby (17mW) Aggregated bus transceiver/controller Up to ten working modes to play with Flexibility in power management
50
50 Library: bus protocols Architecture Parallelism (parallel or serial) Topology (serial, tree, ring) Service layers (physical, link, transaction, application) Communication Data transfer mode (asynchronouus, isochronous) Data transfer speed Response mode (need acknowledgement or not) Arbitration mode Configuration Configuration process (deterministic or randomly ) Reconfigurability (statical, hybrid, dynamical) Power Power mode ( full-on, standby, deep-sleep, shutdown) Media (cable, wireless, backplane)
51
51 Bus protocols exploration Explore bus protocol dimensions Protocol simulation Input: bus protocol model Ouput: sequency of events Map events into relative power quantities Compare and tradeoff between different design points Example: simulating FireWire bus configuration Event-driven simulator Compare two designs with different topology Pure tree topology (acyclic) Tree topology with bus segmentation Tree-ID process, 9 nodes Tree 37 events Segmented tree 24 events
52
52 Bus optimization Bus: a significant power consumer Up to 30% - 50% of the total system power consumption[Mehra97] Bus power consumption determined by Capacitance (load C and bus C, proportional to bus length) Voltage (bus supply voltage and swing voltage) Bus access frequency Bus signal switching activity Why bus power optimization? System performance requirements Power constraints Adapt to execution time variations Bus segmentation for increased bandwidth Enable other novel power management techniques
53
53 Bus-level optimizations Bus encoding [Shin98][Benini97][Nakase98] Minimize switching activity on bus Makes sense mostly for parallel bus Gray code, bus-invert code, T0 code and Beach code Bus driver design Bus clustering (segmentation) [Mehra97][Zhang98] Optimize bus topology by grouping components Divide the global bus into multiple segments Benefits: Reduced bus capacitance (power saving) Shorter bus latency, higher throughput, increased flexibility Partitioning [Hauck95][Yang94][Cong93] Divide tasks among components Minimize inter-cluster traffic Clustering before partitioning
54
54 FireWire (IEEE 1394) High speed serial bus 100, 200, 400 Mbps in 1394a 800M, 1.6Gbps in 1394b Advantages Low power Real-time bandwidth guarantee => important for media apps Isochronous and asynchronous transfer modes Hot-pluggable, self reconfiguring Supports bus segmentation
55
55 Legend CAM: camera MC: micro controller HD: hard drive NVM: non-volatile memory SCI: scientific equipment RF modem: radio frequency modem I2C bus omitted on this diagram FireWire 1394 Bus SCI HD / NVM CPU 1 RF Modem CAM MC 1 SCI 1 SCI 2 CPU2 (Bus controller) MC 2 MC 3 Tasks: MC's are responsible for sensing, drive control, steering control Capture picture, compress in CPU1, and send data to RF Modem SCI's carry out scientific experiments, sending data to CPU2 After analysis, CPU2 stores data in HD/ NVM X2000 architecture mapping Map Mars Rover application onto X2000 architecture
56
56 Bottlenecks in an unsegmented architecture Contention for bus bandwidth Camera, RF, harddisk Forces serialization of communication globally All nodes must be kept awake Prevents component shutdown Global overhead for bus reconfiguration Long routing path Power overhead on routing controllers
57
57 Segmentation example Three bus segments SCI2RF ModemCAMMC1 HDMC2 SCI1 CPU2/ Bus controller CPU1/DSP MC3 MC sensing drive control steering control SCI scientific experiment CAM picture capture image compression RF transmission Suppose bus bandwidth is 100Mbps, image size 20Mb each, 20 pictures to work on, SCI data volume 16kbps X 10 Ks X 2 (4 hrs a day) Power numbers: CPU1: 4.0W CPU2: 240mW RF modem: 1.7 W Camera: 2.6 W SCI1: 0.8 W SCI2: 3.2 W Power number details
58
58 Bus segmentation with FireWire Blue nodes can't be disabled All nodes’ PHY layers must remain active. Request packets are broadcast to all nodes Gray nodes can be safely disabled They are in different segments from the active ones. Request packets are broadcast to only active nodes. segmentation
59
59 SCI2RF ModemCAMMC1 HDMC2 SCI1 CPU2/ Bus controller CPU1/DSP MC3 Throughput improvement 100Mbps bandwidth 9s transfer time 300Mbps 5s transfer time No useful traffic Bus segmentation help improve bus bandwidth. FireWire 1394 Bus SCI HD / NVM CPU 1 RF Modem CAM MC 1 SCI 1 SCI 2 CPU2 (Bus controller) MC 2 MC 3
60
60 SCI2RF ModemCAMMC1 HDMC2 SCI1 CPU2/ Bus controller CPU1/DSP MC3 Bandwidth-enabled voltage scaling Use voltage scaling and clock scaling to decrease component power. Bandwidth 100Mbps Power consumption = 12.3 W Could be 300Mbps, keep it at 100Mbps Power consumption after voltage scaling = 9.2 W
61
61 Power/latency reduction energy consumption = 46 J Power consumption after voltage scaling = 9.2 W Data transfer time = 5 s Note: bus configuration power not counted Power consumption = 12.3 W Data transfer time = 9 s energy consumption = 111 J energy saving 58% Power saving 25%
62
62 Segmentation-enabled shutdown All components’ bus interfaces are active. Entire bus is hot. Non-operating bus segments are disabled. Non-operating components are disabled. Bus power is saved. Drive control (10 min.) Drive control (20 min.) Picture capture (6 min.) Science experiment (20 min.)
63
63 Combined energy savings from static techniques Shutting down inactive nodes: 27 times of global bus configs. Only 11 bus configurations Config energy << 165 J Transceiver energy 1962 J Config energy + transceiver energy < 1962 + 165 = 2127 J Not shutting down inactive nodes: Bus transceiver active all the time. Transceiver energy: 150 mW x 10 x 3360 s = 5040 J Transceiver: National Semi CS4103, PHY-active only mode. 2.4 X energy reduction!
64
64 Dynamic bus reconfiguration SCI2 RF Modem CAMMCS1 HDMCS2 SCI1 CPU2/ Bus controller CPU1/DS P MCS3 Solution: dynamically change bus topology Science experiments Radio frequency data transfer SCI+RF (20+60 min) SCI2 RF Modem CAMMC1 HDMC2 SCI1 CPU2/ Bus controller CPU1 MC3 New task: send data from HD to RF modem! (continue from previous task ) Science experiments Radio frequency data transfer SCI+RF (20+60 min)
65
65 Energy savings from dynamic bus reconfiguration Local configuration: 3 Global configuration: none re-segmentation : none Active transceiver: 7 Active bus segment: 2 Energy: 12.7 x 3 x 1+ 0.15 x 7 x 4800 = 5078 J Local configuration: none Global configuration: 1 re-segmentation : 1 Active transceiver: 3+2 Active bus segment: 1 Power number list: Local config: 12.7W Global config: 23.7W Active transceiver: 150mW Segmentation: software support Bus segment: proportional to bus length Energy: 23.7 x 1 x 1+ 0.15 x 3 x 4800 + 0.05 x 2 x 4800 = 2664 J 1.9 X energy reduction!
66
66 Summary of architecture optimization Towards loose coupling Reduced bus contention Increased parallel bandwidth Enabling voltage/frequency scaling Application-driven clustering Communication bandwidth requirements between processes Knowledge from high-level behavioral model Static optimization2.4x energy reduction Bus segmentation Cluster shutdown Dynamic reclustering1.9x energy reduction
67
67 Power management & optimization Behavioral modeling Extract power related attributes of all objects Architecture modeling Use low-power devices or devices that can operate on low-power mode Partitioning Migration – merge computations on under-utilized processors on one processor to improve utilization Segmentation – separate tightly coupled computations into clusters to localize communication Scheduling Arrange operation sequences on multi-processor / multiple power consumer to meet both performance and power requirement
68
68 Behavioral model Application specific knowledge Input, output and function Dependency and precedence Control and data flow Timing and sequence Software architecture Operating system features – real-time, centralized, distributed, and etc. Execution model – event driven, interrupt, distributed agent, client- server, and etc. Communication model – protocol stack and specification Power related attributes Data rate, execution time, CPU speed, memory size, communication path, and etc.
69
69 Allocation Map behavioral objects to hardware Group related OS, communication, control and application objects into processing nodes Extract data objects into storage nodes Allocate components/packages for each processing node Arrange data storage for data nodes and optimize storage location to reduce communication Map communication paths to busses Setup working mode of each component/package to fit the behavioral requirement Extract attribute of each structure Function – computation, control, communication CPU utilization Bus traffic Power consumption
70
70 Scheduling Mapping of tasks to time slots Computation Communication Mapping of power usage to time slots Mechanical devices Thermal subsystems Other electronics subsystems Constraints Real-time deadlines, periods, min/max separation Power budget, power surge (min/max) Potentially scenario-driven
71
71 Scheduling techniques Deadline based real-time scheduling on multiprocessors Rate-monotonic scheduling – extend existing RM scheduling to multiprocessors Timing constraint graph scheduling – multiple serializable sequences in a single heart beat
72
72 Novel IMPACCT scheduler A novel graphical tool Timing and power constraint visualization Transforms them into graph problems Give designers a vision to the power surge at run-time Complete system-level model All power sources All power consumers Power-aware scheduling Schedule operations based on power source output Both performance requirement and power constraint Regulate power surge Optimize for power efficiency and reduce execution time
73
73 Power Time Starting timeEnding time Power levelEnergy consumption Demo IMPACCT scheduler Extended Gantt-chart in real-time scheduling for single processor Event – bins Timing – horizontal size Power – vertical size Energy – area of the bin Power surge – compacting bins downward
74
74 A BBBB C CCC C DDD Constant task A Periodic task B Periodic task C Task D follows B Power Time Demo IMPACCT scheduler Scheduling chart for multi-processor and multiple power consumers Events can overlap vertically Multi-processor Multiple power consumer – electronics, mechanical, thermal Power awareness – min and max power supply
75
75 A B C D Power Time B C Deadline of B (scheduling space) Deadline of B Min timing constraint of D Max timing constraint of D Deadline of C (scheduling space) Deadline of C Scheduling space of D Slide bin within timing space Squeeze/extend bin to available time slot C C Demo IMPACCT scheduler Timing constraints – bin packing problem to satisfy horizontal constraints Independent tasks – moving bins horizontally Dependent tasks – moving grouped bins horizontally Power/voltage/clock scaling – extending/squeezing bins
76
76 A B C D Power Time B Manual scheduling while monitoring power surge C A B C D Power Time B Attack spike Automated global scheduling to meet min-max power CC Max Min Improve utilization Demo IMPACCT scheduler Power constraints – bin packing problem to satisfy vertical constraints Automatic optimization – let the tool do everything Manual optimization – visualizing power in manual scheduling
77
77 Example revisited – Mars Rover System specification 6 wheel motors 4 steering motors System health check Hazard detection Power supply Battery (non-rechargeable) Solar panel Power consumption Digital Computation, imaging, communication, control Mechanical Driving, steering Thermal Motors must be heated in low-temperature environment
78
78 Timing constraints – Mars Rover
79
79 Scheduling method Constraint graph construction Nodes: operations Edges: precedence relationship between operations Resource specification Resource: an executing unit that can perform operations independently Six thermal resources for wheel heating Four thermal resources for steer motor heating One mechanical resource for driving One mechanical resource for steering One computation resource for control Operations on one resource must be serialized Scheduling Primary resource selection Schedule primary resource by applying graph algorithms Auxiliary resources and power requirement are considered as scheduling constraints
80
80 Constraint graph System health check / T hc t hc -(t hc + T hc ) Heat wheel 1 / T hw Heat wheel 2 / T hw Heat wheel 3 / T hw Heat wheel 4 / T hw Heat wheel 5 / T hw Heat wheel 6 / T hw Heat steer 2 / T hs Heat steer 3 / T hs Heat steer 4 / T hs Hazard detection / T hd Steer / T s Drive / T d - t hw -t hs Heat steer 1 / T hs
81
81 -t hs + T hs_E -t hw + T hw_E t hc -(t hc + T hc ) Resource specification Hazard detection (C) / T hc / P hc_C Health check (C) / T hc / P hc_C Heat steer i (C) / T hs_C / P hs_C Heat steer i (T) / T hs_T / P hs_T Heat wheel j (C) / T hw_C / P hw_C Heat wheel j (T) / T hw_T / P hw_T Steer (C) / T s_C / P s_C Steer (M) / T s_M / P s_M Drive (C) / T d_C / P d_C Drive (M) / T d_M / P d_M Health check (C) / T hc / P hc_C Computation Mechanical Thermal Heat steer i Heat wheel j Health check Steer Drive Hazard detection
82
82 Scheduling graph Hazard detection (C) / T hc / P hc_C Heat steer i (C) / T hs_E / P hs_E Heat steer i (T) / T hs_T / P hs_T Heat wheel j (C) / T hw_E / P hw_E Heat wheel j (T) / T hw_T / P hw_T Steer (C) / T s_C / P s_C Steer (M) / T s_M / P s_M Drive (C) / T d_C / P d_C Drive (M) / T d_M / P d_M -t hs + T hs_E -t hw Primary resource: Computation Auxiliary resource: Mechanical Auxiliary resource: Thermal Health check (C) / T hc / P hc_C t hc -(t hc + T hc ) -t hs -t hw + T hw_E -T s_C + T s_M
83
83 Example – Mars Rover Power constraints Different solar power supply over time Different power consumption over temperature/time
84
84 System heart-beat - moving two steps (a) Begin with health check (b) no health check Previous solution by JPL Over-constrained, conservative Serialize every operation to satisfy power constraint Longer execution time and under-utilization of solar power No scheduling tool is used – manual scheduling Not power-aware Scheduling without considering power sources and consumers
85
85 System heart-beat - moving two steps (a) Begin with health check (b) no health check Solution 1: high solar power (14.9W) Max solar power: 14.9W at noon Improved utilization of solar power Automated scheduling – use scheduling tools Aggressive – do as much as possible heating motors while doing other operations Fastest moving speed – no waiting on heating
86
86 System heart-beat - moving two steps (a) Begin with health check (b) no health check Solution 2: typical solar power (12W) Moderate solar power output – 12W Improved utilization of solar power Automated scheduling – use scheduling tools Moderately aggressive – avoid exceeding power limit Relaxed constraint –heating motors while doing other operations Faster moving speed – some waiting time on heating
87
87 System heart-beat - moving two steps (a) Begin with health check (b) no health check Solution 3: low solar power (9W) Minimum solar power output – 9W Restricted constraint – serialize operations Automated scheduling – use scheduling tools Conservative – same as JPL solution Slow moving speed Full utilization of low solar power
88
88 Comparison JPL's previous solution Conservative – long execution time, low solar power utilization Not power aware – same schedule for all cases Not intend to use battery energy Our solution Adaptive – speedup when solar power supply is high Power-aware – smart scheduling on different power supply/consumption Use battery energy when necessary
89
89 Application-level evaluation Mission description Target location – 48 (distance-) steps away from current location Power condition 14.9W solar power for first 10 minutes, 12W for next 10 minutes, 9W thereafter Metrics Execution time Total energy drawn from battery
90
90 Application-level evaluation Power-awareness Execution speed scales with power condition adaptively Smart schedule Maximize best case Avoid worst case Tradeoff Power vs. performance Energy renewability Application-specific Application-level knowledge Working mode parameters of components
91
91 Program plans and milestones
92
92 Development plans Web-based CAD tool Perl/CGI scripts for configuration Java applets for interactive scheduling UI Interface with database engine Interface with commercial CAD backend Detailed power estimation tools Functional simulation with proprietary models Rationale No software installation needed by end user Ready to use by everyone on the Internet Open source with all publicly available development tools
93
93 Status & accomplishments to date
94
94 July 2000 Aug 2000 Sept 2000 Oct 2000 Nov 2000 Dec 2000 Jan 2001 core tool UI Library Authoring Partitioning Scheduling Segmentation Volt. Scaling Simulation IMPACCT schedule plannedin progress
95
95 Original schedule 2Q 00 Kickoff 2Q 01 2Q 02 System modeling Coordination synthesis Architecture definition Static partitioning Component partitioning System modeling Coordination synthesis Architecture definition Static partitioning Component partitioning Component simulator PCL benchmarking Synthesizable components System benchmarking Component simulator PCL benchmarking Synthesizable components System benchmarking Power aware design techniques PCL definition Simulatable components Benchmark Identification Power aware design techniques PCL definition Simulatable components Benchmark Identification Authoring tool v1.0 Dynamic partitioning Simulator v1.0 Component partitioning Authoring tool v1.0 Dynamic partitioning Simulator v1.0 Component partitioning network option
96
96 Updated schedule 2Q 00 Kickoff 2Q 01 2Q 02 Static & hybrid optimizations Partitioning / allocation Scheduling Bus segmentation Voltage scaling Library COTS components FireWire and I2C bus models Static composition authoring High-level simulation Benchmark Identification Architecture definition Static & hybrid optimizations Partitioning / allocation Scheduling Bus segmentation Voltage scaling Library COTS components FireWire and I2C bus models Static composition authoring High-level simulation Benchmark Identification Architecture definition Dynamic optimizations Task migration Processor shutdown Bus segmentation Frequency scaling Library Parameterizable components Parameterizable bus models Reconfiguration authoring Architecture reconfiguration Low-level simulation System benchmarking Dynamic optimizations Task migration Processor shutdown Bus segmentation Frequency scaling Library Parameterizable components Parameterizable bus models Reconfiguration authoring Architecture reconfiguration Low-level simulation System benchmarking option Year 1Year2
97
97 Quarterly schedule 3Q 2001 FireWire and I2C bus models Static bus segmentation Architecture definition Low-level simulation System benchmarking Frequency scaling High-level simulation Hybrid partitioning / allocation Voltage scaling Parameterizable components Dynamic scheduling Parameterizable bus models 2000 4Q 1Q 2Q 3Q 4Q 2002 1Q 2Q COTS components library Static scheduling Benchmark identification Static partitioning / allocation Hybrid scheduling Static composition authoring Dynamic processor shutdown Dynamic bus segmentation Dynamic reconfig. authoring Hybrid bus segmentation Architecture reconfiguration Dynamic task migration 2001
98
98 Financial information
99
99 IMPACCT budget Months 1-6$180,000 Months 7-12$180,000 Second year$400,000
100
100 Budget distribution
101
101 http://www.ece.uci.edu/impacct/
102
102 Bibliography [Mehra97] R. Mehra, et al. "A partitioning scheme for optimizing Interconnect power", IEEE Journal of solid-state circuits, Vol. 32, No.3, March 1997 [Shin98] Y. Shin, et al. "Reduction of bus transitions with partial bus-invert coding", Electrons Letters, vol.34, No.7, IEE 2 April 1998 p. 642-3 [Benini97 ] L. Benini et al. "Asymptotic zero-transition activity encoding for address buses in low-power microprocessor-based systems", Proceedings Great Lakes Symposium on VLSI, Los Alamitos, CA, USA: IEEE Comput. Soc. Press, 1997, p.77-82 [Nakase98] Y. Nakase et al. "Complementary half-swing bus architecture and its application for wide band SRAM macros", IEE proceedings-Circuits, Devices and Systems, vol.145, No.5 IEE, Oct 1998, p337-42 [Zhang98] Y. Zhang et al. "An alternative architecture for on-chip global interconnect: segmented bus power modeling", Thirty-Second Asilomar Conference on Signals, Systems and Computers, Pacific Grove, CA, USA, 1-4 Nov. 1998. [Kernighan70] B. Kernighan et al. “An Efficient Heuristic Procedure for Partitioning Graphs”, Bell System technical Journal Vol. 49 No.2, Feb. 1970 p291-307 [Hauck95] S. Hauck et al. “Logic Partition Orderings for Multi-FPGA Systems”, International Symposium on Field-Programmable Gate Arrays, 1995
103
103 Program Goals Evaluation, exploration power usage, performance, cost alternative configurations, algorithms Optimization achieve most effective power usage high-level, global knowledge Tool integration many point tools, independent techniques Specialization configurable platform Reuse take advantage of rich collection of COTS not to re-design from scratch
104
104 Technical approach High-level abstraction component vs. composition Separate models for architecture and behavior Synthesis and optimization of power manager Architecture reconfiguration Scheduling for optimal power usage adaptable to different power management policies Aggressive, domain-knowledge Encompass mechanical / thermal power Aware of power supply model
105
105 System level modeling Architectural modeling COTS components component encapsulation bus architecture system interconnect Behavioral modeling Application specific knowledge Software architecture Mission goals High level constraints
106
106 Power-aware coordination Protocols Coordinate power usage e.g. peak power, resource arbitration Multiple versions of given algorithm Components Adaptable to different power management policies, not hardwired Usable in new applications even if not designed to be power aware! Synthesis Coordination controller (“mode manager”) Optimization to minimize control dependency Optimality depends on architectural mapping
107
107 Measuring power consumption (1) Different levels of analysis by # of operations: (+) easy to implement (-) neglect of different sizes of modules Appropriate to compare two different architectures with similar modules # of lines of code: (+) assume the size of hardware to be implemented (-) may be too simple to estimate power consumption With the number of operations, gives a indication of the power consumption of each module # of F/F: (+) more accurate measure (-) should find the relationship between # of F/F and # of lines of code The number of F/F is the lowest hardware characteristics in the high level simulator Control unit and data path have different power dissipation pattern even with same amount of gates
108
108 Measuring power consumption (2) # of gates: (+) Makes accurate power estimation possible (-) needs Register transfer level (RTL) description and power analysis tools To get accurate hardware information, we have to implement RTL modules Input/output statistics of each module are also necessary
109
109 USC's Work in Progress Select a processor simulator Analyze the hardware description of each module Estimate the power consumption of each module Find performance-power ratio Design a minimum power processor model
110
110 Program impact & transitions Productivity Fully exploit off-the-shelf components Rapid turnaround time to architecture Massive Scalability Protocol based power management System architecture platform Robust methodology Unified functional/power correctness Confidence in complex design points
111
111 Bus Architecture Perspectives (X) Parallelism Parallel: high cost, high throughput, enable design exploration Serial: low cost, constrained throughput, simple bus interface Locality Functional Spatial Adaptivity Adaptive Deterministic
112
112 Communication model asynchronous transfer isochronous transfer Arbitration model Fair gap arbitration Priority arbitration Configuration model Bus initialization Tree identification Self identification FireWire (IEEE 1394) bus Service model Physical layer Link layer Transaction layer
113
113 Architectural Model Component – parameterized COTS Type – processor, memory, I/O, DSP, bus, and etc. Interface – how the components can be connected to each other Modes – operation modes parameters, voltage, clock speed, bandwidth, power consumption, and etc. Package – a bundle of connected components that performs certain operation A set of connected components Internal/external interface – how components are connected Modes – configuration space of the collected components specified by each component’s working mode and collective attributes, e.g., voltage, speed, power and etc.
114
114 Approach: system-level modeling High-level abstractions Employ application specific knowledge in system models Encompass multiple domains – electronics, mechanical, thermal System modeling Behavioral modeling – software architecture, application specific knowledge Architectural modeling – hardware platform built on top of parameterized components Partitioning – mapping behavioral objects to architectural structures Scheduling – a valid sequence of concurrent/parallel operations on multiple processors that satisfies real-time requirement
115
115 Example – Mars Rover System specification 6 wheel motors 4 steering motors System health check Hazard detection Power supply Battery (non-rechargeable) Solar panel Power consumption Digital computation, imaging, communication, control Mechanical driving, steering Thermal motors must be heated in low-temperature environment
116
116 Scheduling example – Mars Rover Power constraints Solar panel: 14.9W peak power @ noon, 11W for 6hr/sol Battery: 10W max power output. 150W-hr energy storage CPU: 3.7W, constant for 4h/sol Health check: 6.3W, 10s Hazard detection: 7.3W, 10s Heating: 7.5W (1 motor) or 11.3W (2 motors), 5s Steering: 6.8W, 5s (7º/s) Driving: 12.4W, 10s (7cm) Existing solution Serialize each operation to satisfy power constraint Conservative – longer execution time and under utilization of solar power No scheduling tool is used
117
117 Scheduling techniques Constraint logic solving Transfer all constraints into a pure mathematical form Use tools to solve the problem in mathematical domain Example – CLPR Constraints C1 > 3, C1 2, C2 < 4 # two power consumers C1 + C2 6, S < 12 # one power source Inputs C1 = 4.5, S = 7 Results C2 < 2.5 2 < C2
118
118 Evaluation Application level evaluation Metrics based on overall mission objectives Constraint-driven solutions Power related scenario Various power constraint (supply/consumption) over different stages of application Power-aware adaptive scheduling for different stages
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.