COE 561 Digital System Design & Synthesis Architectural Synthesis Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum.

Slides:



Advertisements
Similar presentations
© 2004 Wayne Wolf Topics Task-level partitioning. Hardware/software partitioning.  Bus-based systems.
Advertisements

Mani Srivastava UCLA - EE Department Room: 6731-H Boelter Hall Tel: WWW: Copyright 2003.
CALTECH CS137 Fall DeHon 1 CS137: Electronic Design Automation Day 19: November 21, 2005 Scheduling Introduction.
ECE 667 Synthesis and Verification of Digital Circuits
Example of Scheduling and Allocation based on Jaap Hofstede IIR Filter.
EDA (CS286.5b) Day 10 Scheduling (Intro Branch-and-Bound)
COE 561 Digital System Design & Synthesis Scheduling Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum & Minerals.
Introduction to Data Flow Graphs and their Scheduling Sources: Gang Quan.
Winter 2005ICS 252-Intro to Computer Design ICS 252 Introduction to Computer Design Lecture 5-Scheudling Algorithms Winter 2005 Eli Bozorgzadeh Computer.
Reconfigurable Computing S. Reda, Brown University Reconfigurable Computing (EN2911X, Fall07) Lecture 10: RC Principles: Software (3/4) Prof. Sherief Reda.
Modern VLSI Design 3e: Chapter 10 Copyright  2002 Prentice Hall Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture 24: CAD Systems &
ECE Synthesis & Verification - Lecture 2 1 ECE 697B (667) Spring 2006 ECE 697B (667) Spring 2006 Synthesis and Verification of Digital Circuits Scheduling.
Courseware Path-Based Scheduling Sune Fallgaard Nielsen Informatics and Mathematical Modelling Technical University of Denmark Richard Petersens Plads,
Improving code generation. Better code generation requires greater context Over expressions: optimal ordering of subtrees Over basic blocks: Common subexpression.
ECE Synthesis & Verification - Lecture 3 1 ECE 697B (667) Spring 2006 ECE 697B (667) Spring 2006 Synthesis and Verification of Digital Circuits Scheduling.
Courseware High-Level Synthesis an introduction Prof. Jan Madsen Informatics and Mathematical Modelling Technical University of Denmark Richard Petersens.
COE 561 Digital System Design & Synthesis Introduction Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum & Minerals.
Architectural-Level Synthesis Giovanni De Micheli Integrated Systems Centre EPF Lausanne This presentation can be used for non-commercial purposes as long.
Simulated-Annealing-Based Solution By Gonzalo Zea s Shih-Fu Liu s
Reconfigurable Computing S. Reda, Brown University Reconfigurable Computing (EN2911X, Fall07) Lecture 08: RC Principles: Software (1/4) Prof. Sherief Reda.
System Partitioning Kris Kuchcinski
1 EECS Components and Design Techniques for Digital Systems Lec 21 – RTL Design Optimization 11/16/2004 David Culler Electrical Engineering and Computer.
COE 561 Digital System Design & Synthesis Introduction Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum & Minerals.
Mahapatra-Texas A&M-Fall'001 Partitioning - I Introduction to Partitioning.
ICS 252 Introduction to Computer Design Fall 2006 Eli Bozorgzadeh Computer Science Department-UCI.
COE 561 Digital System Design & Synthesis Resource Sharing and Binding Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum.
SCHEDULING SOURCES- Mark Manwaring Kia Bazargan Giovanni De Micheli Gupta Youn-Long Lin M. Balakrishnan Camposano, J. Hofstede, Knapp, MacMillen Lin.
ECE Synthesis & Verification - Lecture 4 1 ECE 697B (667) Spring 2006 ECE 697B (667) Spring 2006 Synthesis and Verification of Digital Circuits Allocation:
Interconnect Efficient LDPC Code Design Aiman El-Maleh Basil Arkasosy Adnan Al-Andalusi King Fahd University of Petroleum & Minerals, Saudi Arabia Aiman.
ICS 252 Introduction to Computer Design
ECE Synthesis & Verification - LP Scheduling 1 ECE 667 ECE 667 Synthesis and Verification of Digital Circuits Scheduling Algorithms Analytical approach.
Fall 2006EE VLSI Design Automation I VII-1 EE 5301 – VLSI Design Automation I Kia Bazargan University of Minnesota Part VII: High Level Synthesis.
King Fahd University of Petroleum and Minerals King Fahd University of Petroleum and Minerals Computer Engineering Department Computer Engineering Department.
Introduction to Data Flow Graphs and their Scheduling Sources: Gang Quan.
COE 405 Introduction to Digital Design Methodology
Universität Dortmund  P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Hardware/software partitioning  Functionality to be implemented in software.
COE 561 Digital System Design & Synthesis Architectural Synthesis Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum.
L11: Lower Power High Level Synthesis(2) 성균관대학교 조 준 동 교수
COE 561 Digital System Design & Synthesis Introduction Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum & Minerals.
UNIT 1 Introduction. 1-2 OutlineOutline n Course Topics n Microelectronics n Design Styles n Design Domains and Levels of Abstractions n Digital System.
Resource Mapping and Scheduling for Heterogeneous Network Processor Systems Liang Yang, Tushar Gohad, Pavel Ghosh, Devesh Sinha, Arunabha Sen and Andrea.
6. A PPLICATION MAPPING 6.3 HW/SW partitioning 6.4 Mapping to heterogeneous multi-processors 1 6. Application mapping (part 2)
High-Level Synthesis-II Virendra Singh Indian Institute of Science Bangalore IEP on Digital System IIT Kanpur.
System-level power analysis and estimation September 20, 2006 Chong-Min Kyung.
L13 :Lower Power High Level Synthesis(3) 성균관대학교 조 준 동 교수
COE 561 Digital System Design & Synthesis Architectural Synthesis Dr. Muhammad Elrabaa Computer Engineering Department King Fahd University of Petroleum.
L12 : Lower Power High Level Synthesis(3) 성균관대학교 조 준 동 교수
A High-Level Synthesis Flow for Custom Instruction Set Extensions for Application-Specific Processors Asia and South Pacific Design Automation Conference.
Mapping of Regular Nested Loop Programs to Coarse-grained Reconfigurable Arrays – Constraints and Methodology Presented by: Luis Ortiz Department of Computer.
High Performance Embedded Computing © 2007 Elsevier Lecture 10: Code Generation Embedded Computing Systems Michael Schulte Based on slides and textbook.
VADA Lab.SungKyunKwan Univ. 1 L5:Lower Power Architecture Design 성균관대학교 조 준 동 교수
Lecture 17: Dynamic Reconfiguration I November 10, 2004 ECE 697F Reconfigurable Computing Lecture 17 Dynamic Reconfiguration I Acknowledgement: Andre DeHon.
High-Level Synthesis Creating Custom Circuits from High-Level Code Hao Zheng Comp Sci & Eng University of South Florida.
Scheduling Determines the precise start time of each task.
Register Transfer Specification And Design
ECE 565 High-Level Synthesis—An Introduction
High-Level Synthesis Creating Custom Circuits from High-Level Code
Reconfigurable Computing
COE 561 Digital System Design & Synthesis Introduction
Lesson 4 Synchronous Design Architectures: Data Path and High-level Synthesis (part two) Sept EE37E Adv. Digital Electronics.
Modeling Languages and Abstract Models
Architectural-Level Synthesis
Architecture Synthesis
ICS 252 Introduction to Computer Design
HIGH LEVEL SYNTHESIS.
Resource Sharing and Binding
Integrated Systems Centre © Giovanni De Micheli – All rights reserved
ICS 252 Introduction to Computer Design
ICS 252 Introduction to Computer Design
Reconfigurable Computing (EN2911X, Fall07)
Presentation transcript:

COE 561 Digital System Design & Synthesis Architectural Synthesis Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum & Minerals [Adapted from slides of Prof. G. De Micheli: Synthesis & Optimization of Digital Circuits] Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum & Minerals [Adapted from slides of Prof. G. De Micheli: Synthesis & Optimization of Digital Circuits]

2 OutlineOutline n Motivation n Dataflow graphs & Sequencing graphs n Resources n Synthesis in temporal domain: Scheduling n Synthesis in spatial domain: Binding n Scheduling Models Unconstrained scheduling Scheduling with timing constraints Scheduling with resource constraints n Algorithmic Solution to the Optimum Binding Problem n Register Binding Problem n Motivation n Dataflow graphs & Sequencing graphs n Resources n Synthesis in temporal domain: Scheduling n Synthesis in spatial domain: Binding n Scheduling Models Unconstrained scheduling Scheduling with timing constraints Scheduling with resource constraints n Algorithmic Solution to the Optimum Binding Problem n Register Binding Problem

3 SynthesisSynthesis n Transform behavioral into structural view. n Architectural-level synthesis Architectural abstraction level. Determine macroscopic structure. Example: major building blocks like adder, register, mux. n Logic-level synthesis Logic abstraction level. Determine microscopic structure. Example: logic gate interconnection. n Transform behavioral into structural view. n Architectural-level synthesis Architectural abstraction level. Determine macroscopic structure. Example: major building blocks like adder, register, mux. n Logic-level synthesis Logic abstraction level. Determine microscopic structure. Example: logic gate interconnection.

4 Synthesis and Optimization

5 Architectural-Level Synthesis Motivation n Raise input abstraction level. Reduce specification of details. Extend designer base. Self-documenting design specifications. Ease modifications and extensions. n Reduce design time. n Explore and optimize macroscopic structure Series/parallel execution of operations. n Raise input abstraction level. Reduce specification of details. Extend designer base. Self-documenting design specifications. Ease modifications and extensions. n Reduce design time. n Explore and optimize macroscopic structure Series/parallel execution of operations.

6 Architectural-Level Synthesis n Translate HDL models into sequencing graphs. n Behavioral-level optimization Optimize abstract models independently from the implementation parameters. n Architectural synthesis and optimization Create macroscopic structure data-path and control-unit. Consider area and delay information of the implementation. n Translate HDL models into sequencing graphs. n Behavioral-level optimization Optimize abstract models independently from the implementation parameters. n Architectural synthesis and optimization Create macroscopic structure data-path and control-unit. Consider area and delay information of the implementation.

7 Dataflow Graphs … n Behavioral views of architectural models. n Useful to represent data- paths. n Graph Vertices = operations. Edges = dependencies. n Dependencies arise due Input to an operation is result of another operation. Serialization constraints in specification. Two tasks share the same resource. n Behavioral views of architectural models. n Useful to represent data- paths. n Graph Vertices = operations. Edges = dependencies. n Dependencies arise due Input to an operation is result of another operation. Serialization constraints in specification. Two tasks share the same resource.

8 … Dataflow Graphs n Assumes the existence of variables who store information required and generated by operations. n Each variable has a lifetime which is the interval from birth to death. n Variable birth is the time at which the value is generated. n Variable death is the latest time at which the value is referenced as input to an operation. n Values must be preserved during life-time. n Assumes the existence of variables who store information required and generated by operations. n Each variable has a lifetime which is the interval from birth to death. n Variable birth is the time at which the value is generated. n Variable death is the latest time at which the value is referenced as input to an operation. n Values must be preserved during life-time.

9 Sequencing Graphs n Useful to represent data-path and control. n Extended dataflow graphs Control Data Flow Graphs (CDFGs). Polar: source and sink. Operation serialization. Hierarchy. Control-flow commands branching and iteration. n Paths in the graph represent concurrent streams of operations. n Useful to represent data-path and control. n Extended dataflow graphs Control Data Flow Graphs (CDFGs). Polar: source and sink. Operation serialization. Hierarchy. Control-flow commands branching and iteration. n Paths in the graph represent concurrent streams of operations.

10 Behavioral-level optimization n Tree-height reduction using commutativity and associativity n x = a + b * c + d => x = (a + d) + b * c n Tree-height reduction using distributivity n x = a * (b * c * d + e) => x = a * b * c * d + a * e n Tree-height reduction using commutativity and associativity n x = a + b * c + d => x = (a + d) + b * c n Tree-height reduction using distributivity n x = a * (b * c * d + e) => x = a * b * c * d + a * e

11 Architectural Synthesis and Optimization n Synthesize macroscopic structure in terms of building- blocks. n Explore area/performance trade-offs maximum performance implementations subject to area constraints. minimum area implementations subject to performance constraints. n Determine an optimal implementation. n Create logic model for data-path and control. n Synthesize macroscopic structure in terms of building- blocks. n Explore area/performance trade-offs maximum performance implementations subject to area constraints. minimum area implementations subject to performance constraints. n Determine an optimal implementation. n Create logic model for data-path and control.

12 Circuit Specification for Architectural Synthesis n Circuit behavior Sequencing graphs. n Building blocks Resources. Functional resources: process data (e.g. ALU). Memory resources: store data (e.g. Register). Interface resources: support data transfer (e.g. MUX and Buses). n Constraints Interface constraints Format and timing of I/O data transfers. Implementation constraints Timing and resource usage. Area Cycle-time and latency n Circuit behavior Sequencing graphs. n Building blocks Resources. Functional resources: process data (e.g. ALU). Memory resources: store data (e.g. Register). Interface resources: support data transfer (e.g. MUX and Buses). n Constraints Interface constraints Format and timing of I/O data transfers. Implementation constraints Timing and resource usage. Area Cycle-time and latency

13 ResourcesResources n Functional resources: perform operations on data. Example: arithmetic and logic blocks. Standard resources Existing macro-cells. Well characterized (area/delay). Example: adders, multipliers, ALUs, Shifters,... Application-specific resources Circuits for specific tasks. Yet to be synthesized. Example: instruction decoder. n Memory resources: store data. Example: memory and registers. n Interface resources Example: busses and ports. n Functional resources: perform operations on data. Example: arithmetic and logic blocks. Standard resources Existing macro-cells. Well characterized (area/delay). Example: adders, multipliers, ALUs, Shifters,... Application-specific resources Circuits for specific tasks. Yet to be synthesized. Example: instruction decoder. n Memory resources: store data. Example: memory and registers. n Interface resources Example: busses and ports.

14 Resources and Circuit Families n Resource-dominated circuits. Area and performance depend on few, well-characterized blocks. Example: DSP circuits. n Non resource-dominated circuits. Area and performance are strongly influenced by sparse logic, control and wiring. Example: some ASIC circuits. n Resource-dominated circuits. Area and performance depend on few, well-characterized blocks. Example: DSP circuits. n Non resource-dominated circuits. Area and performance are strongly influenced by sparse logic, control and wiring. Example: some ASIC circuits.

15 Synthesis in the Temporal Domain: Scheduling n Scheduling Associate a start-time with each operation. Satisfying all the sequencing (timing and resource) constraint. n Goal Determine area/latency trade-off. Determine latency and parallelism of the implementation. n Scheduled sequencing graph Sequencing graph with start-time annotation. n Unconstrained scheduling. n Scheduling with timing constraints n Scheduling with resource constraints. n Scheduling Associate a start-time with each operation. Satisfying all the sequencing (timing and resource) constraint. n Goal Determine area/latency trade-off. Determine latency and parallelism of the implementation. n Scheduled sequencing graph Sequencing graph with start-time annotation. n Unconstrained scheduling. n Scheduling with timing constraints n Scheduling with resource constraints.

16 Scheduling … 4 Multipliers, 2 ALUs 1 Multiplier, 1 ALU

17 … Scheduling … Scheduling 2 Multipliers, 3 ALUs 2 Multipliers, 2 ALUs

18 Synthesis in the Spatial Domain: Binding n Binding Associate a resource with each operation with the same type. Determine area of the implementation. n Sharing Bind a resource to more than one operation. Operations must not execute concurrently. n Bound sequencing graph Sequencing graph with resource annotation. n Binding Associate a resource with each operation with the same type. Determine area of the implementation. n Sharing Bind a resource to more than one operation. Operations must not execute concurrently. n Bound sequencing graph Sequencing graph with resource annotation.

19 Example: Bound Sequencing Graph

20 Performance and Area Estimation n Resource-dominated circuits Area = sum of the area of the resources bound to the operations. Determined by binding. Latency = start time of the sink operation (minus start time of the source operation). Determined by scheduling n Non resource-dominated circuits Area also affected by registers, steering logic, wiring and control. Cycle-time also affected by steering logic, wiring and (possibly) control. n Resource-dominated circuits Area = sum of the area of the resources bound to the operations. Determined by binding. Latency = start time of the sink operation (minus start time of the source operation). Determined by scheduling n Non resource-dominated circuits Area also affected by registers, steering logic, wiring and control. Cycle-time also affected by steering logic, wiring and (possibly) control.

21 SchedulingScheduling n Circuit model Sequencing graph. Cycle-time is given. Operation delays expressed in cycles. n Scheduling Determine the start times for the operations. Satisfying all the sequencing (timing and resource) constraint. n Goal Determine area/latency trade-off. n Scheduling affects Area: maximum number of concurrent operations of same type is a lower bound on required hardware resources. Performance: concurrency of resulting implementation. n Circuit model Sequencing graph. Cycle-time is given. Operation delays expressed in cycles. n Scheduling Determine the start times for the operations. Satisfying all the sequencing (timing and resource) constraint. n Goal Determine area/latency trade-off. n Scheduling affects Area: maximum number of concurrent operations of same type is a lower bound on required hardware resources. Performance: concurrency of resulting implementation.

22 Scheduling Models n Unconstrained scheduling. n Scheduling with timing constraints Latency. Detailed timing constraints. n Scheduling with resource constraints. n Simplest scheduling model All operations have bounded delays. All delays are in cycles. Cycle-time is given. No constraints - no bounds on area. Goal Minimize latency. n Unconstrained scheduling. n Scheduling with timing constraints Latency. Detailed timing constraints. n Scheduling with resource constraints. n Simplest scheduling model All operations have bounded delays. All delays are in cycles. Cycle-time is given. No constraints - no bounds on area. Goal Minimize latency.

23 Minimum-Latency Unconstrained Scheduling Problem n Given a set of operations V with integer delays D and a partial order on the operations E n Find an integer labeling of the operations  : V  Z +, such that t i =  (v i ), t i  t j + d j  i, j s.t. (v j, v i )  E and t n is minimum. n Unconstrained scheduling used when Dedicated resources are used. Operations differ in type. Operations cost is marginal when compared to that of steering logic, registers, wiring, and control logic. Binding is done before scheduling: resource conflicts solved by serializing operations sharing same resource. Deriving bounds on latency for constrained problems. n Given a set of operations V with integer delays D and a partial order on the operations E n Find an integer labeling of the operations  : V  Z +, such that t i =  (v i ), t i  t j + d j  i, j s.t. (v j, v i )  E and t n is minimum. n Unconstrained scheduling used when Dedicated resources are used. Operations differ in type. Operations cost is marginal when compared to that of steering logic, registers, wiring, and control logic. Binding is done before scheduling: resource conflicts solved by serializing operations sharing same resource. Deriving bounds on latency for constrained problems.

24 ASAP Scheduling Algorithm n Denote by t s the start times computed by the as soon as possible (ASAP) algorithm. n Yields minimum values of start times. n Denote by t s the start times computed by the as soon as possible (ASAP) algorithm. n Yields minimum values of start times.

25 ALAP Scheduling Algorithm n Denote by t L the start times computed by the as late as possible (ALAP) algorithm. n Yields maximum values of start times. n Latency upper bound n Denote by t L the start times computed by the as late as possible (ALAP) algorithm. n Yields maximum values of start times. n Latency upper bound

26 Latency-Constrained Scheduling n ALAP solves a latency-constrained problem. n Latency bound can be set to latency computed by ASAP algorithm. n Mobility Defined for each operation. Difference between ALAP and ASAP schedule. Zero mobility implies that an operation can be started only at one given time step. Mobility greater than 0 measures span of time interval in which an operation may start. n Slack on the start time. n ALAP solves a latency-constrained problem. n Latency bound can be set to latency computed by ASAP algorithm. n Mobility Defined for each operation. Difference between ALAP and ASAP schedule. Zero mobility implies that an operation can be started only at one given time step. Mobility greater than 0 measures span of time interval in which an operation may start. n Slack on the start time.

27 ExampleExample n Operations with zero mobility {v1, v2, v3, v4, v5}. Critical path. n Operations with mobility one {v6, v7}. n Operations with mobility two {v8, v9, v10, v11} n Operations with zero mobility {v1, v2, v3, v4, v5}. Critical path. n Operations with mobility one {v6, v7}. n Operations with mobility two {v8, v9, v10, v11}

28 Minimum Latency Resource-Constrained Scheduling Problem n Given a set of ops V with integer delays D, a partial order on the operations E, and upper bounds {a k ; k = 1, 2, …, n res } n Find an integer labeling of the operations  : V  Z +, such that t i =  (v i ), t i  t j + d j  i, j s.t. (v j, v i )  E and t n is minimum. n Number of operations of any given type in any schedule step does not exceed bound. n Given a set of ops V with integer delays D, a partial order on the operations E, and upper bounds {a k ; k = 1, 2, …, n res } n Find an integer labeling of the operations  : V  Z +, such that t i =  (v i ), t i  t j + d j  i, j s.t. (v j, v i )  E and t n is minimum. n Number of operations of any given type in any schedule step does not exceed bound. :V  {1,2, …n res }

29 List Scheduling Algorithms n Heuristic method for Minimum latency subject to resource bound. Minimum resource subject to latency bound. n Greedy strategy. n Priority list heuristics. Assign a weight to each vertex indicating its scheduling priority Longest path to sink. Longest path to timing constraint. n Heuristic method for Minimum latency subject to resource bound. Minimum resource subject to latency bound. n Greedy strategy. n Priority list heuristics. Assign a weight to each vertex indicating its scheduling priority Longest path to sink. Longest path to timing constraint.

30 List Scheduling Algorithm for Minimum Latency …

31 … List Scheduling Algorithm for Minimum Latency n Candidate Operations U l,k Operations of type k whose predecessors are scheduled and completed at time step before l n Unfinished operations T l,k are operations of type k that started at earlier cycles and whose execution is not finished at time l Note that when execution delays are 1, T l,k is empty. n Candidate Operations U l,k Operations of type k whose predecessors are scheduled and completed at time step before l n Unfinished operations T l,k are operations of type k that started at earlier cycles and whose execution is not finished at time l Note that when execution delays are 1, T l,k is empty.

32 ExampleExample n Assumptions a 1 = 2 multipliers with delay 1. a 2 = 2 ALUs with delay 1. n First Step U 1,1 = {v 1, v 2, v 6, v 8 } Select {v 1, v 2 } U 1,2 = {v 10 }; selected n Second step U 2,1 = {v 3, v 6, v 8 } select {v 3, v 6 } U 2,2 = {v 11 }; selected n Third step U 3,1 = {v 7, v 8 } Select {v 7, v 8 } U 3,2 = {v 4 }; selected n Fourth step U 4,2 = {v 5, v 9 }; selected n Assumptions a 1 = 2 multipliers with delay 1. a 2 = 2 ALUs with delay 1. n First Step U 1,1 = {v 1, v 2, v 6, v 8 } Select {v 1, v 2 } U 1,2 = {v 10 }; selected n Second step U 2,1 = {v 3, v 6, v 8 } select {v 3, v 6 } U 2,2 = {v 11 }; selected n Third step U 3,1 = {v 7, v 8 } Select {v 7, v 8 } U 3,2 = {v 4 }; selected n Fourth step U 4,2 = {v 5, v 9 }; selected

33 ExampleExample n Assumptions a 1 = 3 multipliers with delay 2. a 2 = 1 ALU with delay 1. n Assumptions a 1 = 3 multipliers with delay 2. a 2 = 1 ALU with delay 1.

34 List Scheduling Algorithm for Minimum Resource Usage

35 ExampleExample n Assume =4 n Let a = [1, 1] T n First Step U 1,1 = {v 1, v 2, v 6, v 8 } Operations with zero slack {v 1, v 2 } a = [2, 1] T U 1,2 = {v 10 } n Second step U 2,1 = {v 3, v 6, v 8 } Operations with zero slack {v 3, v 6 } U 2,2 = {v 11 } n Third step U 3,1 = {v 7, v 8 } Operations with zero slack {v 7, v 8 } U 3,2 = {v 4 } n Fourth step U 4,2 = {v 5, v 9 } Both have zero slack; a = [2, 2] T n Assume =4 n Let a = [1, 1] T n First Step U 1,1 = {v 1, v 2, v 6, v 8 } Operations with zero slack {v 1, v 2 } a = [2, 1] T U 1,2 = {v 10 } n Second step U 2,1 = {v 3, v 6, v 8 } Operations with zero slack {v 3, v 6 } U 2,2 = {v 11 } n Third step U 3,1 = {v 7, v 8 } Operations with zero slack {v 7, v 8 } U 3,2 = {v 4 } n Fourth step U 4,2 = {v 5, v 9 } Both have zero slack; a = [2, 2] T

36 Allocation and Binding n Allocation Determine number of resources needed n Binding Mapping between operations and resources. n Sharing Assignment of a resource to more than one operation. n Optimum binding/sharing Minimize the resource usage. n Allocation Determine number of resources needed n Binding Mapping between operations and resources. n Sharing Assignment of a resource to more than one operation. n Optimum binding/sharing Minimize the resource usage.

37 Compatibility and Conflicts n Operation compatibility Same resource type. Non concurrent. n Compatibility graph Vertices: operations. Edges: compatibility relation. n Conflict graph Complement of compatibility graph. n Operation compatibility Same resource type. Non concurrent. n Compatibility graph Vertices: operations. Edges: compatibility relation. n Conflict graph Complement of compatibility graph. Multiplier ALU

38 Algorithmic Solution to the Optimum Binding Problem n Compatibility graph. Partition the graph into a minimum number of cliques. Find clique cover number. n Conflict graph. Color the vertices by a minimum number of colors. Find chromatic number. n NP-complete problems - Heuristic algorithms. n Compatibility graph. Partition the graph into a minimum number of cliques. Find clique cover number. n Conflict graph. Color the vertices by a minimum number of colors. Find chromatic number. n NP-complete problems - Heuristic algorithms.

39 ExampleExample ALU1: 1, 3, 5 ALU2: 2,

40 Register Binding Problem n Given a schedule Lifetime intervals for variables. Lifetime overlaps. n Conflict graph (interval graph). Vertices  variables. Edges  overlaps. n Find minimum number of registers storing all the variables. n Compatibility graph. n Given a schedule Lifetime intervals for variables. Lifetime overlaps. n Conflict graph (interval graph). Vertices  variables. Edges  overlaps. n Find minimum number of registers storing all the variables. n Compatibility graph.

41 ExampleExample n Six intermediate variables that need to be stored in registers {z1, z2, z3, z4, z5, z6} n Six variables can be stored in two registers n Six intermediate variables that need to be stored in registers {z1, z2, z3, z4, z5, z6} n Six variables can be stored in two registers

42 ExampleExample n 7 intermediate variables, 3 loop variables, 3 loop invariants n 5 registers suffice to store 10 intermediate loop variables n 7 intermediate variables, 3 loop variables, 3 loop invariants n 5 registers suffice to store 10 intermediate loop variables