Presentation is loading. Please wait.

Presentation is loading. Please wait.

Kickoff review meeting

Similar presentations


Presentation on theme: "Kickoff review meeting"— Presentation transcript:

1 Kickoff review meeting
Integrated Management of Power Aware Computing & Communication Technologies Kickoff review meeting Nader Bagherzadeh, Pai H. Chou, Fadi Kurdahi University of California, Irvine, ECE Dept. DARPA Contract F September 27, 2000

2 Agenda Introduction and overview
Management status, financial, milestones, schedule. Technical presentation Task progress Architecture Applications CAD Lessons learned, challenges, issues. Questions + action items review.

3 Outline Introduction Management status Technical presentation
Program goals Project overview Management status Personnel and teaming plans Plans and milestones Financial information Technical presentation Background Technical approach Status and accomplishments Current detailed schedule Program impact and anticipated transitions

4 Introduction

5 Program Goals Power-aware system-level design Design tool
Enhance mission success (time, task) Rapid customization for different missions Design tool Exploration & evaluation Optimization& specialization Technique integration System architecture Statically configurable Dynamically adaptive Use COTS parts & protocols

6 Technical approach High-level specification System synthesis tool
Separate behavior from architecture Explicit constraints (timing, power) Library characterization System synthesis tool Source-aware power usage scheduling Bus topology transformation and communication scheduling Configurable architecture Task migration & selective shutdown Bus segmentation and voltage scaling Domain knowledge Encompass mechanical / thermal power Aware of power supply model

7 Quad Chart Innovations Impact Year 1 Year 2
Behavior Innovations high-level components behavioral system model high-level simulation Component-based power-aware design Exploit off-the-shelf components & protocols Best price/performance, reliable, cheap to replace CAD tool for global power policy optimization Optimal partitioning, scheduling, configuration Manage entire system, including mechanical & thermal Power-aware reconfigurable architectures Reusable platform for many missions Bus segmentation, voltage / frequency scaling composition operators functional partitioning & scheduling Architecture mapping system integration & synthesis parameterizable components system architecture static configuration busses, protocols dynamic power management Impact Year 1 Year 2 Kickoff 2Q 00 2Q 01 2Q 02 Static & hybrid optimizations partitioning / allocation scheduling bus segmentation voltage scaling COTS component library FireWire and I2C bus models Static composition authoring Architecture definition High-level simulation Benchmark Identification Dynamic optimizations task migration processor shutdown bus segmentation frequency scaling Parameterizable components library Generalized bus models Dynamic reconfiguration authoring Architecture reconfiguration Low-level simulation System benchmarking Enhanced mission success More task for the same power Dramatic reduction in mission completion time Cost saving over a variety of missions Reusable platform & design techniques Fast turnaround time by configuration, not redesign Confidence in complex design points Provably correct functional/power constraints Retargetable optimization to eliminate overdesign Power protocol for massive scale

8 Innovations Component-based power-aware design
Exploit off-the-shelf components & protocols COTS offer best price/performance, reliable, cheap to replace CAD tool for global power policy optimization Optimal partitioning, scheduling, configuration Manage entire system, including mechanical & thermal Power-aware reconfigurable architectures Reusable platform for many missions Bus segmentation, voltage / frequency scaling

9 Impact Enhanced mission success Cost saving over a variety of missions
More task for the same power Dramatic reduction in mission completion time Cost saving over a variety of missions Reusable platform & design techniques Fast turnaround time by configuration, not redesign Confidence in complex design points Provably correct functional/power constraints Retargetable optimization to eliminate overdesign Power protocol for massive scale

10 Management Status

11 Personnel & teaming plans
UC Irvine, Co-PI's - Design tools Nader Bagherzadeh Pai Chou Fadi Kurdahi UC Irvine, research assistants Dexin Li Jinfeng Liu Afshin Niktash USC - Component power optimization Jean-Luc Gaudiot Seong-Won Lee JPL - Applications and benchmarking Nazeeh Aranki Nikzad “Benny” Toomarian

12 Previous work Design tools Components Architectural platform
System-level: the Chinook HW/SW codesign tool Architectural synthesis (w/ physical design considerations) Components Reconfigurable computing: the MorphoSys Chip Parameterizable components: PCL Simultaneous MultiThreading vs. Chip MultiProcessing Architectural platform Segmented bus X-2000, Mars Pathfinder Configurable SMP Design tools At UCI we have experiences in system leve design tools and architectural synthesis. Components (list them) both at UCI and USC. - MorphoSys project, which is another DARPA funded reconfigurable computing project. - it is one of the components we will be using to gain experience - parameterizable component library - SMT

13 Responsibilities Bagherzadeh, Chou, Kurdahi -- co-PIs
Oversee project operation Integration into curriculum and related research efforts Li, Liu, Afshin -- RA's Development of CAD tools Modeling of demonstrator examples Authoring of component / protocol library JPL Furnish example specifications Co-develop optimization techniques USC Supporting link to low-level technologies

14 External collaborations
JPL X-2000 multi-mission architecture Mars Pathfinder as baseline JPL to provide COTS testbed JPL to evaluate IMPACCT optimizations USC Parameterizable components Low-level power estimation Consystant Design Technologies (Seattle, WA) Framework for component-based design IMPACCT plugins to support power management

15 Technical Background

16 Background: MorphoSys project
Advanced RISC Processor MorphoSys Reconfigurable Processor Array Reconfigurable processor array MIPS-like RISC processor High-bandwidth data interface 100 MHz clock 0.35µm 4metal CMOS Software support Platform for dynamic power management System Bus Instr./Data Cache (L1) High Bandwidth Data Interface External Memory (e.g. SDRAM, RDRAM) what are these components? - hardware or software processes - high-level logical components vs. concrete components.

17 RC Array and Context Memory
16 column block Context Memory 2 blocks 8 sets in each block A set controls 1 row or column (SIMD) 16 contexts in 1 set. Possible to overlap ctx broadcast with ctx reloading 16 row block RC

18 The M1 chip layout

19 M1 chip test fixture

20 Software environment mView App. (C Code) mLoad mcc mSched MuLate,
TR_app a = b + c p = a + 1 Configuration context Z=RC_F(X) W=RC_F(Y) RC Array functions mLoad Context Lib. mcc mSched Executable MuLate, MorphoSim MorphoSys Chip C++, VHDL TinyRISC RC Array

21 Background on USC's SMT work
High performance processors Superscalar processor (SSP) Single chip multiprocessor (CMP) Very long instruction word (VLIW) Simultaneous multithreading (SMT) Performance and power dissipation High performance need high power consumption Recent applications need for low power, high performance processor

22 Microarchitectural tradeoffs
Power tradeoffs between different architectures SMT vs. SSP: SMT has more modules than SSP SMT has better performance and consumes more power SMT vs. CMP: SMT has better utilization They have similar performance, but SMT consumes less power SMT vs. VLIW: SMT consume more power SMT has compatibility with conventional architecture Design of simple SMT A simplified SMT may consume less power and still have the advantage of TLP Analysis of architectural features Power drain of modern processor (control vs. data path)

23 SMT design methodology
Measuring power consumption of a processor Checking transitions of signals and module operations Hardware implementation of the processor simulator Measuring performance of modules The contribution of each module to the total performance Performance-power ratio of each module Comparison between architectures Design of a low power processor

24 Measuring performance
Finding the performance per power of each module Simulate and measure the performance without a module Calculate the performance per power for each module Classify modules if more than two modules cooperate with each other Find the solution for the low power high performance processor

25 Background: Chinook project
Component-based HW/SW codesign framework Specification, simulation, synthesis Motivated by IP reuse, system integration Problem: IP reuse forces modification Reason: components have hardwired coordination protocols Approach Adaptable components Separate coordination protocols from components Benefits Reuse without modification Enable system-level optimizations Previously, I worked on a hardware/software codesign tool. called Chinook at the Uiv of Washington. It was jointly funded by DARPA and NSF. The distinguishing features of Chinook are that it is component based. It helps designer with system integration by interfacing the components. We don’t go inside the components. IP-based design is one of the goals of JPL - wanted to use only COTS, because it not only helps them cut down on the cost, but more importantly help them meet their schedule. But one of the fundamental obstacles by IP-based design is components can’t be integrated w/out modification. This is not just simple modifications to the interfaces; but these require you to go inside and make intricate modifications. And we believe we’ve made some breakthrough in this area. We came up with a new way of packaging the components and a new way of composing them. We made the observation that in order to compose components, they need to”speak the same protocol; otherwise, modification is necessary. I am a cofounder of a startup company that is commercializing this work.

26 Example protocol: Subsumption
Must handle three cases: Subsuming, yielding, idle Hardwired protocol Generalization: Adaptable components (by mode mapping) Separate protocols & components joystick bumper sonar wheels escape avoid override s sensors actuators decision modules composition y s s i i i y y i s i +subsuming y s y subsumption interface idle B F subsuming idle subsuming yielding subsuming W yielding as an example of a COORDINATION protocol, consider the case of a mobile robot, which is a simpler way to illustrate similar concepts as found n the X-2000. - system can be organized as a set of coordinated concurrent processes. - subsumption architecture [Rodney Brooks] for the purpose of composition, each process must behave like a 3-state FSM: - idle, yield, and subsuming. As long as everybody behaves according to this protocol, they can be composed. - of course the controller for coordinating these processes must be synthesized, but there is a lot of freedom --- in fact they can be optimized for centralized or distributed. - claim earlier about being able to compose w/out modification: how? - package up a process by exposing modes as an interface, not just data ports. - example: module for controlling robot: forward, back, wait, turn. - does not speak subsumption -> but can be adapted to the required interface by mapping. so F maps to idle, detect event back/wait/turn -> subsuming, - once this is adapted, it can be composed just like any other subsumption component w/out concerns about whether it’s a bumper or arm underneath. s s i i Bumper process release y y i F B W B W T W 2s F B T s i 45d bump +B +W

27 Architectural mapping
Single processor or multiple processors Multiple mappings to an architecture mode manager modal processes

28 Distributed mode managers
Automatically partitioned among processors Synthesized control communication Comm. tradeoffs: synchronization, replication mode manager modal processes

29 Technical Presentation

30 Past missions – Mars Pathfinder
“Sojourner” The Mars Pathfinder Microrover Flight Experiment Alpha Proton X-ray Spectrometer (APXS)

31 Application requirements
System specification 6 wheel motors 4 steering motors System health check Hazard detection Power supply Battery (non-rechargeable) Solar panel Power consumption Digital Computation, imaging, communication, control Mechanical Driving, steering Thermal Motors must be heated in low-temperature environment

32 System-level power budget
Energy Required Function Time and Calculation 7.51W-hr 5.63W-hr 6.92W-hr 1.83W-hr 0.45W-hr 1.2W-hr 5.2W-hr 0.63W-hr 15.0W-hr 50W-hr 95W-hr motor heating: 1 motor at a time motor heating: 2 motors at a time driving (extreme -80degC) hazard detection imaging (3 2 min/image) image compression (compress 3 6 min/image) 6Mbit 50min/sol 42, 10 sec health checks during day remainder of 7 hr daytime CPU operation WEB heating (as needed) = 7.51W x 1hr = 11.26W x 0.5hr = 13.85W x 0.5hr = 7.33W x 0.25hr = 4.5W x 0.1hr = 3.7W x 0.3hr = 6.27W x 0.8hr = 6.27W x 0.1hr = 3.7W x 4hr = 50W-hr

33 Design issues Timing constraints Power management
System health check 10s/10min Heating motor for 5s, 50s prior to driving Hazard detection 10s – steering 5s – driving 10s Power management Low-power electronics cannot make significant power saving No system-level management tool available Conservative hand-crafted schedule Serialize all operations to avoid power surge Long execution time Solar power wasted

34 Present missions – Athena/Mars ’03 Rover configuration
Pancam/Mini-TES Instrument Arm Cluster : Raman Spectrometer Alpha-Proton-X-Ray Spectrometer (APXS) Mössbauer Spectrometer Microscopic Imager Mini-Corer

35 Athena/Mars ‘03 Rovers - power subsystem
Power utilization: 38 W = 19 W (CPU&I/O) + 9 W (accel and gyro) + 10 W (wheel motors) for driving. 75 W = 19 W (CPU&I/O) + 55 W (transmission) for orbiter communication 30 W = 19 W (CPU&I/O) + 10 W (transmission) for lander relay communication 55 W = 19 W (CPU&I/O) + 33 W (peak motor) for drilling 29 W = 23 W (CPU&I/O) + 6 W (cameras) required for imaging 11 W Raman, 1.4W APXS and 2.3 W for nighttime spectrometer operation 141Whr daily for housekeeping engineering 75Whr limit for nighttime operations

36 Present missions – MUSES-CN Asteroid NanoRover
Completely solar powered Requiring only 1 watt, including an RF telecommunications system for communications between the rover and a lander or small-body orbiter for relay to Earth. Power source 500 grams of commercial, non-rechargeable, replaceable lithium batteries, with energy density of 750 joules per gram.

37 Power-aware designs Subsume low power as a special case
Minimize power consumption Minimal application specific knowledge, limited reconfiguration space Conservative Make best use of available power Use MAX solar power while it's available Increase parallelism, perform more tasks, reduce mission time Both MIN and MAX power constraints Application-specific knowledge Multiple mission requirement Adapt to run-time power supply, operating environment

38 System-level power management
Amdahl's law -- extended to power Component-level improvements must be scaled by % contributions Synergy between inter-component interactions Scope of system power model Digital, mechanical, thermal Battery model - control power surge Renewable source - solar panel, etc Mission-driven tradeoffs Execution time vs. power saving Adapt to operating environment

39 What's needed? Reconfigurable system architecture
Statically configurable for different missions Reconfiguration for dynamic power management Support state-of-the-art power management policies System-level design tool Support design space exploration Take full advantage of COTS components Optimize mission-specific system configuration Synthesize system-level power manager Support simulation for early validation

40 X2000 avionics system architecture
Symmetric COTS multiprocessors Low cost component with strong commercial support Widely accepted specification, design, application and testing Reduced development cost Dual system bus architecture High speed data rate with moderate power Low speed control with low power Industry standard bus protocols FireWire (IEEE 1394) bus I2C bus Reconfigurable bus topology

41 PA system architecture
The NASA X2000 Avionics System high-rate input symmetric multiprocessor modules reconfigurable hardware blocks communication module (CDMA) (camera) high-speed bus (e.g. IEEE 1394) low-speed bus (e.g. I2C ) bus power controller microcontroller-directed subnet - power regulations & control - analog telemetry sensors - safety inhibits - valve & pyro drive altimeter subnet

42 Applicable power optimizations
Application level Scheduling under timing and power constraints Task partitioning, allocation, migration Algorithm selection Architecture level Bus segmentation / clustering Communication scheduling Component level Voltage / frequency scaling Power down X-2000 goals Digital electronics power: 10x decrease Analog electronics power: 2x decrease Computer performance: 10 to 20x increase both static & dynamic versions

43 The need for a system-level CAD tool
Avoid pitfalls with manual design Overdesign (too conservative) Hardwired assumptions in implementation (hard to change/adapt) System integration (bottleneck in projects) Scalable methodology Specification: separation of concerns Behavior vs. architecture Policy vs. mechanism Constraint vs. implementation Exploration Framework for technique integration Rapid feedback Manage complexity Knowledge base for component/bus details Consistent knowledge propagation through design stages

44 Design tool Library Authoring Synthesis Simulation
Components and bus protocols Provides power estimation Defines configuration space Authoring Behavioral description, architecture description Mapping from behavior to architecture Synthesis Scheduling, partitioning Bus segmentation, voltage scaling Synthesis of power manager with task scheduler Simulation High-level: explore design space Detailed-level: power/performance for a given design point

45 IMPAC2T overview Behavior Architecture mapping
high-level components behavioral system model high-level simulation composition operators functional partitioning & scheduling Architecture mapping system integration & synthesis impact is a hardware software codesign tool - behavioral (architecture independent) vs. system architectural. level. parameterizable components system architecture static configuration busses, protocols dynamic power management

46 Library: low-level components
Supported components COTS Parameterizable Levels of abstraction Simulatable Synthesizable Reconfigurable Bus width = 8 Bus width = 16 VHDL code

47 Library: component definition
Component interface Physical: pin interface Functional: data and control interface Power, current, voltage Power/mode characterization Mode governs power usage Restrictions on mode changes allowed High-level yet refined power estimation Aggregation Smaller components combined into larger ones New external parameters, interfaces, modes

48 Example components Processor : Microcontroller
PowerPC, ARM, Pentium, MIPS Microcontroller StrongARM, Intel 8051, Motorola 68HC11, 68332 Bus controller/transceiver: FireWire controller& transceiver I2C bus controller, GPIB Memory SRAM DRAM Flash memory

49 Example component definition
FireWire bus transceiver: National Semi CS4103 Working voltage: 3.3 V Power modes Full-on (400mW) PHY-on (150mW) Standby (50mW) CLK-disable (21mW) Crystal-disable (16mW) FireWire bus controller: National Semi CS4210 Full-on (300mW) Standby (17mW) Aggregated bus transceiver/controller Up to ten working modes to play with Flexibility in power management

50 Library: bus protocols
Architecture Parallelism (parallel or serial) Topology (serial, tree, ring) Service layers (physical, link, transaction, application) Communication Data transfer mode (asynchronouus, isochronous) Data transfer speed Response mode (need acknowledgement or not) Arbitration mode Configuration Configuration process (deterministic or randomly ) Reconfigurability (statical, hybrid, dynamical) Power Power mode ( full-on, standby, deep-sleep, shutdown) Media (cable, wireless, backplane)

51 Bus protocols exploration
Explore bus protocol dimensions Protocol simulation Input: bus protocol model Ouput: sequency of events Map events into relative power quantities Compare and tradeoff between different design points Example: simulating FireWire bus configuration Event-driven simulator Compare two designs with different topology Pure tree topology (acyclic) Tree topology with bus segmentation Tree-ID process, 9 nodes Tree 37 events Segmented tree 24 events

52 Bus optimization Bus: a significant power consumer
Up to 30% - 50% of the total system power consumption[Mehra97] Bus power consumption determined by Capacitance (load C and bus C, proportional to bus length) Voltage (bus supply voltage and swing voltage) Bus access frequency Bus signal switching activity Why bus power optimization? System performance requirements Power constraints Adapt to execution time variations Bus segmentation for increased bandwidth Enable other novel power management techniques

53 Bus-level optimizations
Bus encoding [Shin98][Benini97][Nakase98] Minimize switching activity on bus Makes sense mostly for parallel bus Gray code, bus-invert code, T0 code and Beach code Bus driver design Bus clustering (segmentation) [Mehra97][Zhang98] Optimize bus topology by grouping components Divide the global bus into multiple segments Benefits: Reduced bus capacitance (power saving) Shorter bus latency, higher throughput, increased flexibility Partitioning [Hauck95][Yang94][Cong93] Divide tasks among components Minimize inter-cluster traffic Clustering before partitioning

54 FireWire (IEEE 1394) High speed serial bus Advantages
100, 200, 400 Mbps in 1394a 800M, 1.6Gbps in 1394b Advantages Low power Real-time bandwidth guarantee => important for media apps Isochronous and asynchronous transfer modes Hot-pluggable, self reconfiguring Supports bus segmentation

55 X2000 architecture mapping
Map Mars Rover application onto X2000 architecture Legend CAM: camera MC: micro controller HD: hard drive NVM: non-volatile memory SCI: scientific equipment RF modem: radio frequency modem I2C bus omitted on this diagram CPU 1 CPU2 (Bus controller) HD / NVM SCI SCI1 SCI2 SCI FireWire 1394 Bus MC1 MC2 MC3 RF Modem CAM Tasks: Capture picture, compress in CPU1, and send data to RF Modem MC's are responsible for sensing, drive control, steering control SCI's carry out scientific experiments, sending data to CPU2 After analysis, CPU2 stores data in HD/ NVM

56 Bottlenecks in an unsegmented architecture
Contention for bus bandwidth Camera, RF, harddisk Forces serialization of communication globally All nodes must be kept awake Prevents component shutdown Global overhead for bus reconfiguration Long routing path Power overhead on routing controllers

57 scientific experiment
Segmentation example SCI2 RF Modem CAM MC1 HD MC2 SCI1 CPU2/ Bus controller CPU1/DSP MC3 Three bus segments Suppose bus bandwidth is 100Mbps, image size 20Mb each, 20 pictures to work on, SCI data volume 16kbps X 10 Ks X 2 (4 hrs a day) Power numbers: CPU1: 4.0W CPU2: 240mW RF modem: 1.7 W Camera: 2.6 W SCI1: 0.8 W SCI2: 3.2 W Power number details CAM picture capture image compression RF transmission MC sensing drive control steering control SCI scientific experiment

58 Bus segmentation with FireWire
Blue nodes can't be disabled All nodes’ PHY layers must remain active. Request packets are broadcast to all nodes Gray nodes can be safely disabled They are in different segments from the active ones. Request packets are broadcast to only active nodes.

59 Throughput improvement
No useful traffic FireWire 1394 Bus SCI HD / NVM CPU 1 RF Modem CAM MC1 SCI1 SCI2 CPU2 (Bus controller) MC2 MC3 Bus segmentation help improve bus bandwidth. SCI2 RF Modem CAM MC1 HD MC2 SCI1 CPU2/ Bus controller CPU1/DSP MC3 100Mbps bandwidth 9s transfer time 300Mbps 5s transfer time

60 Bandwidth-enabled voltage scaling
Use voltage scaling and clock scaling to decrease component power. Bandwidth 100Mbps Power consumption = 12.3 W Power consumption after voltage scaling = 9.2 W SCI2 RF Modem CAM MC1 HD MC2 SCI1 CPU2/ Bus controller CPU1/DSP MC3 Could be 300Mbps, keep it at 100Mbps

61 Power/latency reduction
Power consumption = 12.3 W Data transfer time = 9 s energy consumption = 111 J Power saving 25% energy saving 58% Power consumption after voltage scaling = 9.2 W Data transfer time = 5 s energy consumption = 46 J Note: bus configuration power not counted

62 Segmentation-enabled shutdown
Picture capture (6 min.) Science experiment (20 min.) Drive control (10 min.) Drive control (20 min.) Non-operating bus segments are disabled. Non-operating components are disabled. Bus power is saved. All components’ bus interfaces are active. Entire bus is hot.

63 Combined energy savings from static techniques
Shutting down inactive nodes: 27 times of global bus configs. Only 11 bus configurations Config energy << 165 J Transceiver energy 1962 J Config energy + transceiver energy < = 2127 J Not shutting down inactive nodes: Bus transceiver active all the time. Transceiver energy: 150 mW x 10 x 3360 s = 5040 J Transceiver: National Semi CS4103, PHY-active only mode. 2.4 X energy reduction!

64 Dynamic bus reconfiguration
New task: send data from HD to RF modem! (continue from previous task ) SCI2 RF Modem CAM MC1 HD MC2 SCI1 CPU2/ Bus controller CPU1 MC3 Science experiments Radio frequency data transfer SCI+RF (20+60 min) Solution: dynamically change bus topology SCI2 RF Modem CAM MCS1 HD MCS2 SCI1 CPU2/ Bus controller CPU1/DSP MCS3 Science experiments Radio frequency data transfer SCI+RF (20+60 min)

65 Energy savings from dynamic bus reconfiguration
Power number list: Local config: 12.7W Global config: 23.7W Active transceiver: 150mW Segmentation: software support Bus segment: proportional to bus length Local configuration: 3 Global configuration: none re-segmentation : none Active transceiver: 7 Active bus segment: 2 Local configuration: none Global configuration: 1 re-segmentation : 1 Active transceiver: 3+2 Active bus segment: 1 Energy: 23.7 x 1 x x 3 x x 2 x 4800 = 2664 J Energy: 12.7 x 3 x x 7 x 4800 = 5078 J 1.9 X energy reduction!

66 Summary of architecture optimization
Towards loose coupling Reduced bus contention Increased parallel bandwidth Enabling voltage/frequency scaling Application-driven clustering Communication bandwidth requirements between processes Knowledge from high-level behavioral model Static optimization 2.4x energy reduction Bus segmentation Cluster shutdown Dynamic reclustering 1.9x energy reduction take over from Dexin, transition to application level

67 Power management & optimization
Behavioral modeling Extract power related attributes of all objects Architecture modeling Use low-power devices or devices that can operate on low-power mode Partitioning Migration – merge computations on under-utilized processors on one processor to improve utilization Segmentation – separate tightly coupled computations into clusters to localize communication Scheduling Arrange operation sequences on multi-processor / multiple power consumer to meet both performance and power requirement

68 Behavioral model Application specific knowledge Software architecture
Input, output and function Dependency and precedence Control and data flow Timing and sequence Software architecture Operating system features – real-time, centralized, distributed, and etc. Execution model – event driven, interrupt, distributed agent, client-server, and etc. Communication model – protocol stack and specification Power related attributes Data rate, execution time, CPU speed, memory size, communication path, and etc.

69 Allocation Map behavioral objects to hardware
Group related OS, communication, control and application objects into processing nodes Extract data objects into storage nodes Allocate components/packages for each processing node Arrange data storage for data nodes and optimize storage location to reduce communication Map communication paths to busses Setup working mode of each component/package to fit the behavioral requirement Extract attribute of each structure Function – computation, control, communication CPU utilization Bus traffic Power consumption

70 Scheduling Mapping of tasks to time slots
Computation Communication Mapping of power usage to time slots Mechanical devices Thermal subsystems Other electronics subsystems Constraints Real-time deadlines, periods, min/max separation Power budget, power surge (min/max) Potentially scenario-driven

71 Scheduling techniques
Deadline based real-time scheduling on multiprocessors Rate-monotonic scheduling – extend existing RM scheduling to multiprocessors Timing constraint graph scheduling – multiple serializable sequences in a single heart beat

72 Novel IMPACCT scheduler
A novel graphical tool Timing and power constraint visualization Transforms them into graph problems Give designers a vision to the power surge at run-time Complete system-level model All power sources All power consumers Power-aware scheduling Schedule operations based on power source output Both performance requirement and power constraint Regulate power surge Optimize for power efficiency and reduce execution time hand off to Jinfeng after this

73 IMPACCT scheduler Extended Gantt-chart in real-time scheduling for single processor Event – bins Timing – horizontal size Power – vertical size Energy – area of the bin Power surge – compacting bins downward Power Time Starting time Ending time Power level Energy consumption Demo

74 IMPACCT scheduler Scheduling chart for multi-processor and multiple power consumers Events can overlap vertically Multi-processor Multiple power consumer – electronics, mechanical, thermal Power awareness – min and max power supply A B C D Constant task A Periodic task B Periodic task C Task D follows B Power Time Demo

75 IMPACCT scheduler Timing constraints – bin packing problem to satisfy horizontal constraints Independent tasks – moving bins horizontally Dependent tasks – moving grouped bins horizontally Power/voltage/clock scaling – extending/squeezing bins A B C D Power Time Deadline of B (scheduling space) Deadline of B Min timing constraint of D Max timing constraint of D Deadline of C (scheduling space) Deadline of C Scheduling space of D Slide bin within timing space Squeeze/extend bin to available time slot Demo

76 IMPACCT scheduler Power constraints – bin packing problem to satisfy vertical constraints Automatic optimization – let the tool do everything Manual optimization – visualizing power in manual scheduling A B C D Power Time Manual scheduling while monitoring power surge Attack spike Automated global scheduling to meet min-max power Max Min Improve utilization Demo

77 Example revisited – Mars Rover
System specification 6 wheel motors 4 steering motors System health check Hazard detection Power supply Battery (non-rechargeable) Solar panel Power consumption Digital Computation, imaging, communication, control Mechanical Driving, steering Thermal Motors must be heated in low-temperature environment

78 Timing constraints – Mars Rover

79 Scheduling method Constraint graph construction Resource specification
Nodes: operations Edges: precedence relationship between operations Resource specification Resource: an executing unit that can perform operations independently Six thermal resources for wheel heating Four thermal resources for steer motor heating One mechanical resource for driving One mechanical resource for steering One computation resource for control Operations on one resource must be serialized Scheduling Primary resource selection Schedule primary resource by applying graph algorithms Auxiliary resources and power requirement are considered as scheduling constraints

80 System health check / Thc System health check / Thc
Constraint graph Hazard detection / Thd System health check / Thc Heat steer 1 / Ths Heat steer 2 / Ths Heat steer 3 / Ths Heat steer 4 / Ths Steer / Ts thc -(thc + Thc) -ths System health check / Thc Heat wheel 1 / Thw Heat wheel 2 / Thw Heat wheel 3 / Thw Heat wheel 4 / Thw Heat wheel 5 / Thw Heat wheel 6 / Thw Drive / Td - thw

81 Resource specification
Hazard detection (C) / Thc / Phc_C Health check (C) / Thc / Phc_C Heat steer i (C) / Ths_C / Phs_C Heat steer i (T) / Ths_T / Phs_T Heat wheel j (C) / Thw_C / Phw_C Heat wheel j (T) / Thw_T / Phw_T Steer (C) / Ts_C / Ps_C Steer (M) / Ts_M / Ps_M Drive (C) / Td_C / Pd_C Drive (M) / Td_M / Pd_M Computation Mechanical Thermal Heat steer i Heat wheel j Health check Steer Drive Hazard detection -ths + Ths_E -thw + Thw_E thc -(thc + Thc)

82 Scheduling graph Hazard detection (C) / Thc / Phc_C
Heat steer i (C) / Ths_E / Phs_E Heat steer i (T) / Ths_T / Phs_T Heat wheel j (C) / Thw_E / Phw_E Heat wheel j (T) / Thw_T / Phw_T Steer (C) / Ts_C / Ps_C Steer (M) / Ts_M / Ps_M Drive (C) / Td_C / Pd_C Drive (M) / Td_M / Pd_M -ths + Ths_E -thw Primary resource: Computation Auxiliary resource: Mechanical Auxiliary resource: Thermal Health check (C) / Thc / Phc_C thc -(thc + Thc) -ths -thw + Thw_E -Ts_C + Ts_M

83 Example – Mars Rover Power constraints
Different solar power supply over time Different power consumption over temperature/time

84 Previous solution by JPL
Over-constrained, conservative Serialize every operation to satisfy power constraint Longer execution time and under-utilization of solar power No scheduling tool is used – manual scheduling Not power-aware Scheduling without considering power sources and consumers System heart-beat - moving two steps (a) Begin with health check (b) no health check

85 Solution 1: high solar power (14.9W)
Max solar power: 14.9W at noon Improved utilization of solar power Automated scheduling – use scheduling tools Aggressive – do as much as possible heating motors while doing other operations Fastest moving speed – no waiting on heating System heart-beat - moving two steps (a) Begin with health check (b) no health check

86 Solution 2: typical solar power (12W)
Moderate solar power output – 12W Improved utilization of solar power Automated scheduling – use scheduling tools Moderately aggressive – avoid exceeding power limit Relaxed constraint –heating motors while doing other operations Faster moving speed – some waiting time on heating System heart-beat - moving two steps (a) Begin with health check (b) no health check

87 Solution 3: low solar power (9W)
Minimum solar power output – 9W Restricted constraint – serialize operations Automated scheduling – use scheduling tools Conservative – same as JPL solution Slow moving speed Full utilization of low solar power System heart-beat - moving two steps (a) Begin with health check (b) no health check

88 Comparison JPL's previous solution Our solution
Conservative – long execution time, low solar power utilization Not power aware – same schedule for all cases Not intend to use battery energy Our solution Adaptive – speedup when solar power supply is high Power-aware – smart scheduling on different power supply/consumption Use battery energy when necessary

89 Application-level evaluation
Mission description Target location – 48 (distance-) steps away from current location Power condition 14.9W solar power for first 10 minutes, 12W for next 10 minutes, 9W thereafter Metrics Execution time Total energy drawn from battery

90 Application-level evaluation
Power-awareness Execution speed scales with power condition adaptively Smart schedule Maximize best case Avoid worst case Tradeoff Power vs. performance Energy renewability Application-specific Application-level knowledge Working mode parameters of components

91 Program plans and milestones

92 Development plans Web-based CAD tool
Perl/CGI scripts for configuration Java applets for interactive scheduling UI Interface with database engine Interface with commercial CAD backend Detailed power estimation tools Functional simulation with proprietary models Rationale No software installation needed by end user Ready to use by everyone on the Internet Open source with all publicly available development tools

93 Status & accomplishments to date

94 IMPACCT schedule Library Authoring Partitioning Scheduling
July 2000 Aug 2000 Sept 2000 Oct 2000 Nov 2000 Dec 2000 Jan 2001 Library Authoring Partitioning Scheduling Segmentation Volt. Scaling Simulation planned in progress core tool UI

95 Original schedule network option Kickoff 2Q 00 2Q 01 2Q 02
System modeling Coordination synthesis Architecture definition Static partitioning Component partitioning Authoring tool v1.0 Dynamic partitioning Simulator v1.0 Component partitioning network option Kickoff 2Q 00 2Q 01 2Q 02 Power aware design techniques PCL definition Simulatable components Benchmark Identification Component simulator PCL benchmarking Synthesizable components System benchmarking

96 Updated schedule option Kickoff 2Q 00 2Q 01 2Q 02 Year 1 Year2
Static & hybrid optimizations Partitioning / allocation Scheduling Bus segmentation Voltage scaling Library COTS components FireWire and I2C bus models Static composition authoring High-level simulation Benchmark Identification Architecture definition Dynamic optimizations Task migration Processor shutdown Bus segmentation Frequency scaling Library Parameterizable components Parameterizable bus models Reconfiguration authoring Architecture reconfiguration Low-level simulation System benchmarking

97 Quarterly schedule 2000 2001 3Q 3Q 4Q 4Q 2002 2001 1Q 1Q 2Q 2Q
COTS components library Static scheduling Benchmark identification 2001 Parameterizable components Dynamic scheduling Parameterizable bus models 3Q 3Q FireWire and I2C bus models Static bus segmentation Architecture definition Hybrid bus segmentation Architecture reconfiguration Dynamic task migration 4Q 4Q 2002 2001 Static partitioning / allocation Hybrid scheduling Static composition authoring Dynamic processor shutdown Dynamic bus segmentation Dynamic reconfig. authoring 1Q 1Q High-level simulation Hybrid partitioning / allocation Voltage scaling Low-level simulation System benchmarking Frequency scaling 2Q 2Q

98 Financial information

99 IMPACCT budget Months 1-6 $180,000 Months $180,000 Second year $400,000

100 Budget distribution

101

102 Bibliography [Mehra97] R. Mehra, et al. "A partitioning scheme for optimizing Interconnect power", IEEE Journal of solid-state circuits, Vol. 32, No.3, March 1997 [Shin98] Y. Shin, et al. "Reduction of bus transitions with partial bus-invert coding", Electrons Letters, vol.34, No.7, IEE 2 April 1998 p [Benini97 ] L. Benini et al. "Asymptotic zero-transition activity encoding for address buses in low-power microprocessor-based systems", Proceedings Great Lakes Symposium on VLSI, Los Alamitos, CA, USA: IEEE Comput. Soc. Press, 1997, p.77-82 [Nakase98] Y. Nakase et al. "Complementary half-swing bus architecture and its application for wide band SRAM macros", IEE proceedings-Circuits, Devices and Systems, vol.145, No.5 IEE, Oct 1998, p337-42 [Zhang98] Y. Zhang et al. "An alternative architecture for on-chip global interconnect: segmented bus power modeling", Thirty-Second Asilomar Conference on Signals, Systems and Computers, Pacific Grove, CA, USA, 1-4 Nov [Kernighan70] B. Kernighan et al. “An Efficient Heuristic Procedure for Partitioning Graphs”, Bell System technical Journal Vol. 49 No.2, Feb p [Hauck95] S. Hauck et al. “Logic Partition Orderings for Multi-FPGA Systems”, International Symposium on Field-Programmable Gate Arrays, 1995

103 Program Goals Evaluation, exploration Optimization Tool integration
power usage, performance, cost alternative configurations, algorithms Optimization achieve most effective power usage high-level, global knowledge Tool integration many point tools, independent techniques Specialization configurable platform Reuse take advantage of rich collection of COTS not to re-design from scratch

104 Technical approach High-level abstraction
component vs. composition Separate models for architecture and behavior Synthesis and optimization of power manager Architecture reconfiguration Scheduling for optimal power usage adaptable to different power management policies Aggressive, domain-knowledge Encompass mechanical / thermal power Aware of power supply model

105 System level modeling Architectural modeling Behavioral modeling
COTS components component encapsulation bus architecture system interconnect Behavioral modeling Application specific knowledge Software architecture Mission goals High level constraints

106 Power-aware coordination
Protocols Coordinate power usage e.g. peak power, resource arbitration Multiple versions of given algorithm Components Adaptable to different power management policies, not hardwired Usable in new applications even if not designed to be power aware! Synthesis Coordination controller (“mode manager”) Optimization to minimize control dependency Optimality depends on architectural mapping

107 Measuring power consumption (1)
Different levels of analysis by # of operations: (+) easy to implement (-) neglect of different sizes of modules Appropriate to compare two different architectures with similar modules # of lines of code: (+) assume the size of hardware to be implemented (-) may be too simple to estimate power consumption With the number of operations, gives a indication of the power consumption of each module # of F/F: (+) more accurate measure (-) should find the relationship between # of F/F and # of lines of code The number of F/F is the lowest hardware characteristics in the high level simulator Control unit and data path have different power dissipation pattern even with same amount of gates

108 Measuring power consumption (2)
# of gates: (+) Makes accurate power estimation possible (-) needs Register transfer level (RTL) description and power analysis tools To get accurate hardware information, we have to implement RTL modules Input/output statistics of each module are also necessary

109 USC's Work in Progress Select a processor simulator
Analyze the hardware description of each module Estimate the power consumption of each module Find performance-power ratio Design a minimum power processor model

110 Program impact & transitions
Productivity Fully exploit off-the-shelf components Rapid turnaround time to architecture Massive Scalability Protocol based power management System architecture platform Robust methodology Unified functional/power correctness Confidence in complex design points

111 Bus Architecture Perspectives (X)
Parallelism Parallel: high cost, high throughput, enable design exploration Serial: low cost, constrained throughput, simple bus interface Locality Functional Spatial Adaptivity Adaptive Deterministic

112 FireWire (IEEE 1394) bus Service model Physical layer Link layer
Transaction layer Communication model asynchronous transfer isochronous transfer Arbitration model Fair gap arbitration Priority arbitration Configuration model Bus initialization Tree identification Self identification

113 Architectural Model Component – parameterized COTS
Type – processor, memory, I/O, DSP, bus, and etc. Interface – how the components can be connected to each other Modes – operation modes parameters, voltage, clock speed, bandwidth, power consumption, and etc. Package – a bundle of connected components that performs certain operation A set of connected components Internal/external interface – how components are connected Modes – configuration space of the collected components specified by each component’s working mode and collective attributes, e.g., voltage, speed, power and etc.

114 Approach: system-level modeling
High-level abstractions Employ application specific knowledge in system models Encompass multiple domains – electronics, mechanical, thermal System modeling Behavioral modeling – software architecture, application specific knowledge Architectural modeling – hardware platform built on top of parameterized components Partitioning – mapping behavioral objects to architectural structures Scheduling – a valid sequence of concurrent/parallel operations on multiple processors that satisfies real-time requirement

115 Example – Mars Rover System specification Power supply
6 wheel motors 4 steering motors System health check Hazard detection Power supply Battery (non-rechargeable) Solar panel Power consumption Digital computation, imaging, communication, control Mechanical driving, steering Thermal motors must be heated in low-temperature environment

116 Scheduling example – Mars Rover
Power constraints Solar panel: 14.9W peak noon, 11W for 6hr/sol Battery: 10W max power output. 150W-hr energy storage CPU: 3.7W, constant for 4h/sol Health check: 6.3W, 10s Hazard detection: 7.3W, 10s Heating: 7.5W (1 motor) or 11.3W (2 motors), 5s Steering: 6.8W, 5s (7º/s) Driving: 12.4W, 10s (7cm) Existing solution Serialize each operation to satisfy power constraint Conservative – longer execution time and under utilization of solar power No scheduling tool is used

117 Scheduling techniques
Constraint logic solving Transfer all constraints into a pure mathematical form Use tools to solve the problem in mathematical domain Example – CLPR Constraints C1 > 3, C1 < 5, C2 > 2, C2 < 4 # two power consumers C1 + C2 < S, S > 6, S < # one power source Inputs C1 = 4.5, S = 7 Results C2 < 2.5 2 < C2

118 Evaluation Application level evaluation Power related scenario
Metrics based on overall mission objectives Constraint-driven solutions Power related scenario Various power constraint (supply/consumption) over different stages of application Power-aware adaptive scheduling for different stages


Download ppt "Kickoff review meeting"

Similar presentations


Ads by Google