Instructor: Dr. Phillip Jones CPRE 583 Reconfigurable Computing Lecture 26: Fri 12/9/2011 (Dr. Jones: PhD research) Instructor: Dr. Phillip Jones (phjones@iastate.edu) Reconfigurable Computing Laboratory Iowa State University Ames, Iowa, USA http://class.ee.iastate.edu/cpre583/
Announcements/Reminders Presentation and demo Friday 12/16 (9:00am – 11:30am) 20 minutes for presentation 5 minutes Q&A Each team send me 2 questions on about their project Class will be held in Howe 1324!! Final write up and submission due Monday after demos at midnight HW3: will be assigned as extra credit Weekly Project Updates due: Friday’s (midnight)
Project Grading Breakdown 50% Final Project Demo 30% Final Project Report 20% of your project report grade will come from your 5-6 project updates. Friday’s midnight 20% Final Project Presentation
Projects Ideas: Relevant conferences FPL FPT FCCM FPGA DAC ICCAD Reconfig RTSS RTAS ISCA Micro Super Computing HPCA IPDPS
Projects: Target Timeline Teams Formed and Topic: Mon 10/10 Project idea in Power Point 3-5 slides Motivation (why is this interesting, useful) What will be the end result High-level picture of final product Project team list: Name, Responsibility High-level Plan/Proposal: Fri 10/14 Power Point 5-10 slides (presentation to class Wed 10/19) System block diagrams High-level algorithms (if any) Concerns Implementation Conceptual Related research papers (if any)
Projects: Target Timeline Work on projects: 10/19 - 12/9 Weekly update reports More information on updates will be given Presentations: Finals week Present / Demo what is done at this point 15-20 minutes (depends on number of projects) Final write up and Software/Hardware turned in: Day of final (TBD)
Initial Project Proposal Slides (5-10 slides) Project team list: Name, Responsibility (who is project leader) Team size: 3-4 (5 case-by-case) Project idea Motivation (why is this interesting, useful) What will be the end result High-level picture of final product High-level Plan Break project into mile stones Provide initial schedule: I would initially schedule aggressively to have project complete by Thanksgiving. Issues will pop up to cause the schedule to slip. System block diagrams High-level algorithms (if any) Concerns Implementation Conceptual Research papers related to you project idea
Weekly Project Updates The current state of your project write up Even in the early stages of the project you should be able to write a rough draft of the Introduction and Motivation section The current state of your Final Presentation Your Initial Project proposal presentation (Due Wed 10/19). Should make for a starting point for you Final presentation What things are work & not working What roadblocks are you running into
Adaptive Thermoregulation for Applications on Reconfigurable Devices Phillip Jones Applied Research Laboratory Washington University Saint Louis, Missouri, USA http://www.arl.wustl.edu/arl/~phjones Iowa State University Seminar April 2008 Funded by NSF Grant ITR 0313203
What are FPGAs? FPGA: Field Programmable Gate Array Sea of general purpose logic gates CLB Configurable Logic Block
What are FPGAs? FPGA: Field Programmable Gate Array Sea of general purpose logic gates CLB CLB CLB Configurable Logic Block CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB CLB
What are FPGAs? FPGA: Field Programmable Gate Array Sea of general purpose logic gates CLB CLB Configurable Logic Block CLB CLB CLB CLB CLB CLB
FPGA Usage Models Partial Reconfiguration Fast Prototyping System on Experimental ISA Experimental Micro Architectures Run-time adaptation Run-time Customization CPU + Specialized HW - Sparc-V8 Leon Partial Reconfiguration Fast Prototyping System on Chip (SoC) Parallel Applications Full Reconfiguration Image Processing Computational Biology Remote Update Fault Tolerance
Some FPGA Details CLB CLB CLB CLB
Some FPGA Details CLB CLB CLB 4 input Look Up Table 0000 0001 1110 1111 ABCD Z Z A LUT B C D
Some FPGA Details CLB CLB CLB Z A LUT B C D ABCD Z 0000 0001 1110 1111 1 A AND Z 4 input Look Up Table B C D
Some FPGA Details CLB CLB CLB Z A LUT B C D ABCD Z 0000 0001 1110 1111 1 A OR Z 4 input Look Up Table B C D
Some FPGA Details CLB CLB CLB Z A LUT B C D ABCD Z B X000 X001 X110 1 Z 4 input Look Up Table C 2:1 Mux D
Some FPGA Details CLB CLB CLB Z A LUT B C D
Some FPGA Details CLB CLB PIP Programmable Interconnection Point CLB Z LUT DFF B C D
Some FPGA Details CLB CLB PIP Programmable Interconnection Point CLB Z LUT DFF B C D
Outline Why Thermal Management? Measuring Temperature Thermally Driven Adaptation Experimental Results Temperature-Safe Real-time Systems Future Directions
Why Thermal Management?
Why Thermal Management? Location? Hot Cold Regulated
Why Thermal Management? Mobile? Hot Cold Regulated
Why Thermal Management? Reconfigurability FPGA Plasma Physics Microcontroller
Why Thermal Management? Exceptional Events
Why Thermal Management? Exceptional Events
Local Experience Thermally aggressive application Disruption of air flow
Damaged Board (bottom view) Thermally aggressive application Disruption of air flow
Damaged Board (side view) Thermally aggressive application Disruption of air flow
Response to catastrophic thermal events Easy Fix Not Feasible!! Very Inconvenient
Solutions Over provision Use thermal feedback Large heat sinks and fans Restrict performance Limiting operating frequency Limit amount chip utilization Use thermal feedback Dynamic operating frequency Adaptive Computation Shutdown device My approach
Outline Why Thermal Management? Measuring Temperature Thermally Driven Adaptation Experimental Results Temperature-Safe Real-time Systems Future Directions
Measuring Temperature FPGA
Measuring Temperature FPGA A/D 60 C
Background: Measuring Temperature FPGA S. Lopez-Buedo, J. Garrido, and E. Boemo, . Thermal testing on reconfigurable computers,. IEEE Design and Test of Computers, vol. 17, pp. 84.91, 2000. Temperature 1. .0 .1 0. .0 1. Period
Background: Measuring Temperature FPGA Temperature 1. .0 1. .1 0. 1. .0 0. 1. Period
Background: Measuring Temperature FPGA Temperature 1. .0 1. .1 0. 1. .0 0. 1. Period
Background: Measuring Temperature FPGA S. Lopez-Buedo, J. Garrido, and E. Boemo, . Thermal testing on reconfigurable computers,. IEEE Design and Test of Computers, vol. 17, pp. 84.91, 2000. Temperature 1. .1 .0 Period Voltage
Background: Measuring Temperature FPGA Temperature 1. .1 .0 Period Voltage
Background: Measuring Temperature FPGA “Adaptive Thermoregulation for Applications on Reconfigurable Devices”, by Phillip H. Jones, James Moscola, Young H. Cho, and John W. Lockwood; Field Programmable Logic and Applications (FPL’07), Amsterdam, Netherlands Temperature 1. .1 .0 Period Voltage
Background: Measuring Temperature FPGA Mode 1 Core 1 Core 2 Temperature Core 3 Core 4 Period Frequency: High
Background: Measuring Temperature FPGA Mode 1 Mode 2 Core 1 Core 2 Temperature Core 3 Core 4 Period Frequency: High
Background: Measuring Temperature FPGA Mode 3 Mode 1 Mode 2 Core 1 Core 2 70C Temperature 40C Core 3 Core 4 Period 8,000 8,300 Frequency: Low Frequency: High
Background: Measuring Temperature FPGA Mode 3 Mode 1 Mode 2 Pause Sample Controller Core 1 Core 2 Temperature Core 3 Core 4 Period Frequency: High
Background: Measuring Temperature FPGA Mode 3 Mode 1 Mode 2 Pause Time out Counter Core 1 Core 2 Temperature Core 3 Core 4 Period Frequency: High
Background: Measuring Temperature FPGA Mode 3 Mode 1 Mode 2 Pause Time out Counter 2 5 3 1 4 5 2 3 1 Core 1 Core 2 Temperature Core 3 Core 4 Period Frequency: Low Frequency: High
Background: Measuring Temperature FPGA Mode 3 Mode 1 Mode 2 Pause Time out Counter 3 2 5 1 4 5 3 1 2 3 Core 1 Core 2 Temperature Core 3 Core 4 Period Frequency: Low Frequency: High
Background: Measuring Temperature FPGA Mode 2 1 3 Sample Mode Pause Time out Counter 2 1 5 4 3 5 2 3 3 1 Core 1 Core 2 Temperature Core 3 Core 4 Period Frequency: High
Temperature Benchmark Circuits Desired Properties: Scalable Work over a wide range of frequencies Can easily increase or decrease circuit size Simple to analyze Regular structure Distributes evenly over chip Help reduce thermal gradients that may cause damage to the chip May serve as standard Further experimentation Repeatability of results “A Thermal Management and Profiling Method for Reconfigurable Hardware Applications”, by Phillip H. Jones, John W. Lockwood, and Young H. Cho; Field Programmable Logic and Applications (FPL’06), Madrid, Spain,
Temperature Benchmark Circuits LUT 00 70 05 75 DFF Core Block (CB): Array of 48 LUTs and 48 DFF
Temperature Benchmark Circuits RLOC: Row, Col 0 , 0 7 , 5 AND 00 70 05 75 DFF Core Block (CB): Array of 48 LUTs and 48 DFF Each LUT configured to be a 4-input AND gate 8 Input Gen Array of 18 core blocks (864 LUTs, 864 DFFs) (1 LUT, 1 DFF) Thermal workload unit: Computation Row CB 0 CB 17 CB 1 CB 16
Temperature Benchmark Circuits RLOC: Row, Col 0 , 0 7 , 5 AND 00 70 05 75 DFF Core Block (CB): Array of 48 LUTs and 48 DFF Each LUT configured to be a 4-input AND gate RLOC_ORIGIN: Row, Col 100% Activation Rate Thermal workload unit: Computation Row 01 Input Gen CB 0 CB 1 CB 16 CB 17 00 1 1 8 8 (1 LUT, 1 DFF) Array of 18 core blocks (864 LUTs, 864 DFFs)
Example Circuit Layout (Configuration 1x, 9% LUTs and DFFs) RLOC_ORIGIN: Row, Col (27,6) Thermal Workload Unit
Example Circuit Layout (Configuration 4x, 36% LUTs and DFFs)
Observed Temperature vs. Frequency T ~ P P ~ F*C*V2 Steady-State Temperatures Cfg4x Cfg10x Cfg2x Cfg1x
Observed Temperature vs. Active Area Max rated Tj 85 C T ~ P P ~ F*C*V2 Steady-State Temperatures 200 MHz 100 MHz 50 MHz 25 MHz 10 MHz
Projecting Thermal Trajectories Estimate Steady State Temperature 5.4±.5 Tj_ss = Power * θjA + TA θjA is the FPGA Thermal resistance (ºC/W) Use measured power at t=0 Exponential specific equation Temperature(t) = ½*(-41*e(-t/20) + 71) + ½*(-41*e(-t/180) + 71)
Projecting Thermal Trajectories Estimate Steady State Temperature How long until 60 C? 5.4±.5 Exploit this phase for performance Tj_ss = Power * θjA + TA θjA is the FPGA Thermal resistance (ºC/W) Use measured power at t=0 Exponential specific equation Temperature(t) = ½*(-41*e(-t/20) + 71) + ½*(-41*e(-t/180) + 71)
Thermal Shutdown Max Tj (70C)
Outline Why Thermal Management? Measuring Temperature Thermally Driven Adaptation Experimental Results Temperature-Safe Real-time Systems Future Directions
Image Correlation Application Template
Image Correlation Application Heats FPGA a lot! (> 85 C) Virtex-4 100FX Resource Utilization 200 MHz 44 (11%) 32,868 (77%) 49,148 (58%) 57,461 (68%) Max Frequency Block RAM Occupied Slices D Flip Flops (DFFs) Lookup Tables (LUTs)
Application Infrastructure Temperature Sample Controller Thermoregulation Controller Pause 65 C Application Mode “Adaptive Thermoregulation for Applications on Reconfigurable Devices”, by Phillip H. Jones, James Moscola, Young H. Cho, and John W. Lockwood; Field Programmable Logic and Applications (FPL’07), Amsterdam, Netherlands
Application Specific Adaptation Temperature Sample Controller Thermoregulation Controller Pause 65 C Image Buffer Mode Image Processor Core 1 Mask 1 2 Image Processor Core 3 Image Processor Core 2 Mask 1 2 Image Processor Core 4 Mask 1 2 Mask 2 Mask 1 Score Out
Application Specific Adaptation Temperature Sample Controller Thermoregulation Controller Pause 65 C Frequency Quality Image Buffer 200 Mode MHz 8 Image Processor Core 1 Mask 1 2 Image Processor Core 3 Mask 1 2 Image Processor Core 2 Mask 1 2 Image Processor Core 4 Mask 1 2 Score Out
Application Specific Adaptation Temperature Sample Controller Thermoregulation Controller Pause 65 C Frequency Quality Image Buffer 200 MHz 8 Image Processor Core 1 Mask 1 2 Image Processor Core 2 Mask 1 2 Image Processor Core 3 Image Processor Core 4 Mask 1 2 Mask 2 Mask 1 High Priority Features Low Priority Features Score Out
Application Specific Adaptation Temperature Sample Controller Thermoregulation Controller Pause 65 C Frequency Quality Image Buffer 200 180 150 100 MHz 8 Image Processor Core 1 Mask 1 2 Image Processor Core 2 Mask 1 2 Image Processor Core 3 Image Processor Core 4 Mask 1 2 Mask 2 Mask 1 High Priority Features Low Priority Features Score Out
Application Specific Adaptation Temperature Sample Controller Thermoregulation Controller Pause 65 C Frequency Quality Image Buffer 100 75 50 MHz MHz 8 Image Processor Core 1 Mask 1 2 Image Processor Core 2 Mask 1 2 Image Processor Core 3 Image Processor Core 4 Mask 1 2 Mask 2 Mask 1 High Priority Features Low Priority Features Score Out
Application Specific Adaptation Temperature Sample Controller Thermoregulation Controller Pause 65 C Frequency Quality Image Buffer 50 MHz 6 4 5 7 8 Image Processor Core 1 Mask 1 2 Image Processor Core 2 Mask 1 2 Image Processor Core 3 Image Processor Core 4 Mask 2 Mask 1 Mask 2 Mask 1 Mask 2 Mask 2 High Priority Features Low Priority Features Score Out
Application Specific Adaptation Temperature Sample Controller Thermoregulation Controller Pause 65 C Frequency Quality Image Buffer 75 100 180 150 50 200 MHz MHz 4 7 8 6 5 Image Processor Core 1 Mask 1 2 Image Processor Core 2 Mask 1 2 Image Processor Core 3 Image Processor Core 4 Mask 1 Mask 2 Mask 1 Mask 2 High Priority Features Low Priority Features Score Out
Thermally Adaptive Frequency High Frequency Thermal Budget = 72 C “An Adaptive Frequency Control Method Using Thermal Feedback for Reconfigurable Hardware Applications”, by Phillip H. Jones, Young H. Cho, and John W. Lockwood; Field Programmable Technology (FPT’06), Bangkok, Thailand Junction Temperature, Tj (C) Low Frequency Low Threshold = 67 C Time (s)
Thermally Adaptive Frequency Thermal Budget = 72 C High Frequency Low Frequency Low Threshold = 67 C Junction Temperature, Tj (C) Time (s)
Thermally Adaptive Frequency Thermal Budget = 72 C High Frequency Low Frequency Low Threshold = 67 C Junction Temperature, Tj (C) S. Wang (“Reactive Speed Control”, ECRTS06) Time (s)
Outline Why Thermal Management? Measuring Temperature Thermally Driven Adaptation Experimental Results Temperature-Safe Real-time Systems Future Directions
Platform Overview Virtex-4 FPGA Temperature Probe
Thermal Budget Efficiency 200 MHz 106 MHz 184 MHz 50 MHz 65 MHz 50 MHz 50 MHz Adaptive Fixed 70 Adaptive Thermal Budget (65 C) 65 4 Features 50 MHz 4 50 6 50 8 65 8 106 8 184 60 Fixed 8 200 25 C Unused 55 Junction Temperature (C) 50 45 40 35 30 40 C 35 C 30 C 25 C 25 C 25 C 0 Fans 0 Fans 0 Fans 0 Fans 1 Fan 2 Fans Thermal Condition
Conclusions Motivated the need for thermal management Measuring temperature Application dependent voltage variations effects. Temperature benchmark circuits Examined application specific adaptation for improving performance in dynamic thermal environments
Outline Why Thermal Management? Measuring Temperature Thermally Driven Adaptation Experimental Results Temperature-Safe Real-time Systems Future Directions
Thermally Constrained Systems Space Craft Sun Earth
Thermally Constrained Systems
Temperature-Safe Real-time Systems Task scheduling is a concern in many embedded systems Goal: Satisfy thermal constraints without violating real-time constraints
How to manage temperature? Static frequency scaling Sleep while idle Time T1 T2 T3 T1 T2 T3 Time
How to manage temperature? Static frequency scaling Sleep while idle Time T1 T2 T3 Too hot? Deadlines could be missed T1 T2 T3 Idle Time
How to manage temperature? Static frequency scaling Sleep while idle Time T1 T2 T3 Deadlines could be missed T1 T2 T3 Idle Idle Idle Time Generalization: Idle task insertion
Idle Task Insertion More Powerful Task for schedule at F_max (100 MHz) Period (s) Cost (s) Deadline (s) Utilization (%) Deadline equals cost, frequency cannot be scaled or task schedule becomes infeasible 30 10.0 10.0 33.33 120 30.0 120 25.00 480 30.0 480 6.25 960 20.0 960 2.08 66.66 a. No idle task inserted Tasks scheduled at F_max (100 MHz), 1 Idle Task 960 480 120 60.0 10.0 Deadline (s) 33.33 20.0 60 2.08 99.99 6.25 30.0 25.00 30 Utilization (%) Cost (s) Period (s) b. 1 idle task inserted Idle task insertion No impact on tasks’ cost Higher priority task response times unaffected Allow control over distribution of idle time
Sleep when idle is insufficient Temperature constraint = 65 C Peak Temperature = 70 C
Idle-task inserted Temperature constraint = 65 C Peak Temperature = 61 C
Idle-Task Insertion + Deadlines Temperature met? Yes No System (task set) Idle tasks Scheduler (e.g. RMS) + Deadlines met? Temperature Yes No a. Original schedule does not meet temperature constraints b. Use idle tasks to redistribute device idle time in order to reduce peak device temperature
Related Research Power Management Thermal Management EDF, Dynamic Frequency Scaling Yao (FOCS’95) EDF, Minimize Temperature Bansal (FOCS’04) Worst Case Execution Time Shin (DAC’99) RMS, Reactive Frequency, CIA Wang (RTSS’06, ECRTS’06)
Outline Why Thermal Management? Measuring Temperature Thermally Driven Adaptation Experimental Results Conclusions Temperature-Safe Real-time Systems Future Directions
Research Fronts Near term Longer term Exploration of adaptation techniques Advanced FPGA reconfiguration capabilities Other frequency adaptation techniques Integration of temperature into real-time systems Longer term Cyber physical systems (NSF initiative)
Questions/Comments? Near term Longer term Exploration of adaptation techniques Advanced FPGA reconfiguration capabilities Other frequency adaptation techniques Integration of temperature into real-time systems Longer term Cyber physical systems (NSF initiative)
Temperature per Processing Core Temperature vs. Number of Processing Core 70 y = + 60.1 2.21x 65 S1 y = + 57.1 2.24x S2 60 y = + 52.1 2.23x S3 55 2.07x Junction Temperature (C) y = + 44.2 50 S4 45 y = + 37.5 1.43x S5 40 y = + 34.0 1.22x S6 35 1 2 3 4 Number of Processing Cores
Temperature Sample Mode
Ring Oscillator Thermometer Characteristics Thermometer size Ring oscillator size Oscillation period Incrementer Cycle Period Temperature resolution ~100 LUTs 48 LUTs (47 NOT + 1 OR) ~40 ns ~.16 ms (40ns * 4096) .1ºC/ count Or .1ºC/ 20ns
Application Mode B C Count = 8235 Count = 8425 Count = 8620 Temperature vs. Incrementer Period (Measuring Temperature while Application Active) 10 20 30 40 50 60 70 80 90 8100 8200 8300 8400 8500 8600 8700 Incrementer Period (20ns/count) Temperature (C) Application Mode A B C Count = 8235 Count = 8425 Count = 8620
Virtex-4 100FX Resource Utilization Application implementation statistics Virtex-4 100FX Resource Utilization 200 MHz 44 (11%) 32,868 (77%) 49,148 (58%) 57,461 (68%) Max Frequency Block RAM Occupied Slices D Flip Flops (DFFs) Lookup Tables (LUTs) Image Correlation Characteristics 40.6 (at 200 MHz) 1 - 8 8-bit (grey scale) 320x480 Image Processing Rate (Frames per second) # of Features Pixel Resolution Image Size (# pixels)
VirtexE 2000 Resource Utilization Image Correlation Characteristics Application implementation statistics 125 MHz 26% (43) 32,868 (15,808) 49,148 (58%) 57,461 (68%) Max Frequency Block RAM Occupied Slices D Flip Flops (DFFs) Lookup Tables (LUTs) VirtexE 2000 Resource Utilization 12.7/second (at 125 MHz) 10 (in parallel) 1 - 4 8-bit (grey scale) 640x480 Image Processing Rate # of Templates # of Mask Patterns Pixel Resolution Image Size (# pixels) Image Correlation Characteristics a.) b.)
Scenario Descriptions 30 C (86 F) S3 25 C (77 F) S4 40 C (104 F) S1 35 C (95 F) S2 # of Fans Ambient Temperature Scenario S1 – S6 1 S5 2 S6
High Level Architecture Application Pause Thermal Manager Frequency & Quality Controller Frequency mode Quality Temperature
Periodic Temperature Sampling Application Pause Thermal Manager 50 ms Event Counter Event Ring Oscillator Based Thermometer ready Sample Mode Controller Temperature Frequency & Quality capture Frequency mode Quality
Ring Oscillator Based Thermometer Reset 12-bit incrementer ring_clk MSB Edge Detect 14-bit Clk DFF reset 14 Temperature sel Ready mux
ASIC, GPP, FPGA Comparison Cost Performance Power Flexibility
Frequency Multiplexing Circuit Frequency Control Clk Multiplier (DLLs) clk clk to global clock tree 2:1 MUX 4xclk BUFG Current Virtex-4 platform uses glitch free BUFGMUX component
Thermally Adaptive Frequency High Frequency Thermal Budget = 72 C Junction Temperature, Tj (C) Low Frequency Low Threshold = 67 C Time (s)
Thermally Adaptive Frequency Thermal Budget = 72 C High Frequency Low Frequency Low Threshold = 67 C Junction Temperature, Tj (C) Time (s)
Thermally Adaptive Frequency Thermal Budget = 72 C High Frequency Low Frequency Low Threshold = 67 C Junction Temperature, Tj (C) Time (s)
Worst Case Thermal Condition Thermally Safe Frequency Thermal Budget = 70 C Thermally Safe Frequency 50 MHz
Worst Case Thermal Condition Thermally Safe Frequency Thermal Budget = 70 C 30/120MHz Adaptive Frequency Thermally Safe Frequency 50 MHz
Worst Case Thermal Condition Thermally Safe Frequency Thermal Budget = 70 C 30/120MHz Adaptive Frequency 48.5 MHz Thermally Safe Frequency 50 MHz
Typical Thermal Condition Thermally Safe Frequency Thermal Budget = 70 C 30/120MHz Adaptive Frequency 48.5 MHz Thermally Safe Frequency 50 MHz
Typical Thermal Condition Thermally Safe Frequency Thermal Budget = 70 C 30/120MHz Adaptive Frequency 95 MHz Adaptive Frequency 48.5 MHz Thermally Safe Frequency 50 MHz
Best Case Thermal Condition Thermally Safe Frequency Thermal Budget = 70 C 30/120MHz Adaptive Frequency 95 MHz Thermally Safe Frequency 50 MHz
Best Case Thermal Condition Thermally Safe Frequency Thermal Budget = 70 C 30/120MHz Adaptive Frequency 95 MHz Adaptive Frequency 119 MHz Thermally Safe Frequency 50 MHz 2.4x Factor Performance Increase