The Hardware / Software Tradeoff -John Burnette-

Slides:



Advertisements
Similar presentations
© 2004 Wayne Wolf Topics Task-level partitioning. Hardware/software partitioning.  Bus-based systems.
Advertisements

Multilevel Hypergraph Partitioning Daniel Salce Matthew Zobel.
Constraint Optimization We are interested in the general non-linear programming problem like the following Find x which optimizes f(x) subject to gi(x)
Hardware/ Software Partitioning 2011 年 12 月 09 日 Peter Marwedel TU Dortmund, Informatik 12 Germany Graphics: © Alexandra Nolte, Gesine Marwedel, 2003 These.
ECE-777 System Level Design and Automation Hardware/Software Co-design
ECOE 560 Design Methodologies and Tools for Software/Hardware Systems Spring 2004 Serdar Taşıran.
Scheduling in Distributed Systems Gurmeet Singh CS 599 Lecture.
Presentation of Designing Efficient Irregular Networks for Heterogeneous Systems-on-Chip by Christian Neeb and Norbert Wehn and Workload Driven Synthesis.
1 Advancing Supercomputer Performance Through Interconnection Topology Synthesis Yi Zhu, Michael Taylor, Scott B. Baden and Chung-Kuan Cheng Department.
- 1 -  P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 Universität Dortmund Hardware/Software Codesign.
Extensible Processors. 2 ASIP Gain performance by:  Specialized hardware for the whole application (ASIC). −  Almost no flexibility. −High cost.  Use.
Parallel Simulation etc Roger Curry Presentation on Load Balancing.
1 HW/SW Partitioning Embedded Systems Design. 2 Hardware/Software Codesign “Exploration of the system design space formed by combinations of hardware.
CS244-Introduction to Embedded Systems and Ubiquitous Computing Instructor: Eli Bozorgzadeh Computer Science Department UC Irvine Winter 2010.
Behavioral Design Outline –Design Specification –Behavioral Design –Behavioral Specification –Hardware Description Languages –Behavioral Simulation –Behavioral.
MAE 552 – Heuristic Optimization Lecture 6 February 6, 2002.
System Partitioning Kris Kuchcinski
Mahapatra-Texas A&M-Fall'001 Partitioning - I Introduction to Partitioning.
A Tool for Partitioning and Pipelined Scheduling of Hardware-Software Systems Karam S Chatha and Ranga Vemuri Department of ECECS University of Cincinnati.
Winter-Spring 2001Codesign of Embedded Systems1 Introduction to HW/SW Co-Synthesis Algorithms Part of HW/SW Codesign of Embedded Systems Course (CE )
Universität Dortmund  P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Hardware/software partitioning  Functionality to be implemented in software.
EECE **** Embedded System Design
1 CMPT 275 Software Engineering Software life cycle.
CSE 242A Integrated Circuit Layout Automation Lecture: Partitioning Winter 2009 Chung-Kuan Cheng.
May 2004 Department of Electrical and Computer Engineering 1 ANEW GRAPH STRUCTURE FOR HARDWARE- SOFTWARE PARTITIONING OF HETEROGENEOUS SYSTEMS A NEW GRAPH.
Winter-Spring 2001Codesign of Embedded Systems1 Co-Synthesis Algorithms: HW/SW Partitioning Part of HW/SW Codesign of Embedded Systems Course (CE )
SOFTWARE / HARDWARE PARTITIONING TECHNIQUES SHaPES: A New Approach.
1 Exploring Custom Instruction Synthesis for Application-Specific Instruction Set Processors with Multiple Design Objectives Lin, Hai Fei, Yunsi ACM/IEEE.
1 Nasser Alsaedi. The ultimate goal for any computer system design are reliable execution of task and on time delivery of service. To increase system.
Chapter 5B: Hardware/Software Codesign / Partitioning EECE **** Embedded System Design.
- 1 - EE898_HW/SW Partitioning Hardware/software partitioning  Functionality to be implemented in software or in hardware? No need to consider special.
CS244-Introduction to Embedded Systems and Ubiquitous Computing Instructor: Eli Bozorgzadeh Computer Science Department UC Irvine Winter 2010.
Resource Mapping and Scheduling for Heterogeneous Network Processor Systems Liang Yang, Tushar Gohad, Pavel Ghosh, Devesh Sinha, Arunabha Sen and Andrea.
C OMPARING T HREE H EURISTIC S EARCH M ETHODS FOR F UNCTIONAL P ARTITIONING IN H ARDWARE -S OFTWARE C ODESIGN Theerayod Wiangtong, Peter Y. K. Cheung and.
CSE 589 Part VI. Reading Skiena, Sections 5.5 and 6.8 CLR, chapter 37.
CAS 721 Course Project Implementing Branch and Bound, and Tabu search for combinatorial computing problem By Ho Fai Ko ( )
Task Graph Scheduling for RTR Paper Review By Gregor Scott.
6. A PPLICATION MAPPING 6.3 HW/SW partitioning 6.4 Mapping to heterogeneous multi-processors 1 6. Application mapping (part 2)
1 Copyright  2001 Pao-Ann Hsiung SW HW Module Outline l Introduction l Unified HW/SW Representations l HW/SW Partitioning Techniques l Integrated HW/SW.
Pipelined and Parallel Computing Partition for 1 Hongtao Du AICIP Research Dec 1, 2005 Part 2.
Winter-Spring 2001Codesign of Embedded Systems1 Co-Synthesis Algorithms: Distributed System Co- Synthesis Part of HW/SW Codesign of Embedded Systems Course.
ESE 566: Hardware/Software Co-Design of Embedded Systems Fall 2005 Instructor: Dr. Alex Doboli. Paper discussed in class: P. Eles, Z. Peng, K. Kuchcinski,
Multi-cellular paradigm The molecular level can support self- replication (and self- repair). But we also need cells that can be designed to fit the specific.
Contents Introduction Bus Power Model Related Works Motivation
ASIC Design Methodology
Dynamo: A Runtime Codesign Environment
Heuristic Optimization Methods
Parallel processing is not easy
Design-Space Exploration
Database Performance Tuning and Query Optimization
Juan Rubio, Lizy K. John Charles Lefurgy
IP – Based Design Methodology
Subject Name: Operation Research Subject Code: 10CS661 Prepared By:Mrs
Introduction to cosynthesis Rabi Mahapatra CSCE617
CSCI1600: Embedded and Real Time Software
Chapter 2 – Netlist and System Partitioning
Haim Kaplan and Uri Zwick
Lesson 4 Synchronous Design Architectures: Data Path and High-level Synthesis (part two) Sept EE37E Adv. Digital Electronics.
CIS 488/588 Bruce R. Maxim UM-Dearborn
Verification Plan & Levels of Verification
Fast Communication and User Level Parallelism
by Xiang Mao and Qin Chen
Algorithms for Budget-Constrained Survivable Topology Design
The use of Neural Networks to schedule flow-shop with dynamic job arrival ‘A Multi-Neural Network Learning for lot Sizing and Sequencing on a Flow-Shop’
CS 584 Lecture7 Assignment -- Due Now! Paper Review is due next week.
Department of Electrical Engineering Joint work with Jiong Luo
Chapter 11 Database Performance Tuning and Query Optimization
CENG 351 Data Management and File Structures
CSCI1600: Embedded and Real Time Software
Paper discussed in class: M. Chiodo, P. Giusto, A. Jurecska, H
Presentation transcript:

The Hardware / Software Tradeoff -John Burnette- Partitioning The Hardware / Software Tradeoff -John Burnette-

Systems Requiring Codesign ASIPs synthesis (Application Specific Instruction-set Processor) Custom computing machines, for execution acceleration System on a chip (ASICs with embedded processor) Embedded systems design

Goal of Partitioning Task Obvious: find implementation that meets specifications at minimal cost Q: How was it done originally? A: Developers had to make decisions based on their design experience and expertise. Q: How can the partitioning task be automated? A: Lots of crazy math.

Varying Environments It is difficult to develop an all-encompassing partitioning technique Many different applications Many different Specifications

Varying Environments cont’d Taken from Lopez and Vallejo, “On the Hardware-Software Partitioning Problem: System Modeling and Partitioning Techniques”

System Model A system model must be specified. Consider: Available hardware area Available software memory size Hardware execution time Software execution time How often is task scheduled Communication time Data transfer synchronization

System Model cont’d How many coprocessors are available? How is everything connected?

Design Quality Attributes Cost and performance parameters to describe the solution. For example: What is required hardware area for co-processor? What is the design latency? What is the required memory space?

A Few Partitioning Methods Some basic partitioning methods that are considered when developing Codesign partitioning tools. Simulated Annealing Kernighan & Lin Heuristic Hierarchal Clustering Knowledge-based Partitioning

Simulated Annealing Kirpatrick, Gelatt, and Vecchi (1983) 1st Method Simulated Annealing Kirpatrick, Gelatt, and Vecchi (1983)

It’s All About the Cost Function Main goal: “to measure the quality of a solution and guide the algorithm to the best solution.” [1] Fixed costs and variable costs should be treated separately If you already have an FPGA, you might as well use it all at a fixed cost. Can’t do the same thing with an ASIC

Cost Function 1 Cost function proposed by Lopez and Vallejo: Fc( ) are corrections terms Ci is design constraint applied to i-th quality attribute of solution P and is used as a normalization parameter kci is weight factor for correction terms

Cost Function 2 Some techniques for correcting the cost function: Mean Square Error Minimization Barrier Techniques Penalty Methods

Mean Square Error Minimization Objective is to minimize the mean square error between quality attributes and their corresponding constraints. Recall FPGA example; maximum exploitation results in performance improvement with no added cost. This should be applied to goals that should be completely fulfilled instead of minimized.

Barrier Techniques Asymptotes are place in the constraint-defined boundaries => cost of any solution outside of the design space is infinity. Example of a barrier function: Ensures no hard design constraints are violated.

Penalty Methods Suitable when constraints are not too hard Do not contribute to cost function when solution is within allowable search space Solutions around border of exploration region can be accepted if really close

Penalty Methods cont’d where The weight factor Kci is most important here.

Cooling Schedule For completeness, it should be mentioned that for simulated annealing to work, a cooling schedule must be provided. This basically tunes the parameters of the optimization routine for the next execution. It would have to be done manually otherwise

Simulated Annealing cont’d Advantage: generality, can be used to optimize in many environments Disadvantage: long computation time required

Kernighan & Lin Heuristic 2nd Method Kernighan & Lin Heuristic (1970)

K&L Heuristic Based on iterative improvement Start with random initial partition Swap nodes between both sides of partition Best solution found from swap is used as new initial partition for next iteration Finishes when no more improvements are achieved

K&L Heuristic cont’d (SLIF) access graph is used to compute design quality parameters Add values of attributes associated with nodes and edges of system graph

K&L Heuristic cont’d Drawbacks Design can not be scheduled so time estimation is difficult Process involves adding node attributes => parallelism of a multiprocessor architecture is ignored.

Hierarchical Clustering 3rd Method Hierarchical Clustering

Hierarchical Clustering This method groups pairs of partitioning objects based on proximity values between these objects. Algorithm characterized by: Closeness function Cut Level in cluster tree (based on closeness)

Hierarchical Clustering cont’d Algorithm selects the two objects with best time improvement when implemented as hardware; the rest assigned to software process. Objects are gradually extracted to hardware until all constraints are met.

Hierarchical Clustering cont’d Taken from Lopez and Vallejo, “On the Hardware-Software Partitioning Problem: System Modeling and Partitioning Techniques” While these primary constraints are met, secondary constraints (ie, memory) can be checked before finishing

Clustering Process Control Scheme Taken from Lopez and Vallejo, “On the Hardware-Software Partitioning Problem: System Modeling and Partitioning Techniques”

Closeness Function This function reflects latency of information exchange between different processes. ∆ti = sti – hti represents time improvement when object i is moved from software to hardware. tcomm(vi ,vj ): communication between vi and vj. qT ,qC , qA are weight factors Closeness value is greater for objects with bigger difference between hardware and software execution time.

Closeness Function cont’d Really we want to cluster objects that meet our constraints but still use less hardware area, thus the third term. Term value is greater when the area of the resulting cluster is less than the average system area. Fourth term helps us group objects with large sized memory; this is important for consideration of memory space. After closeness is determined and clusters are formed, the hardware, software, and memory space can be determined.

Knowledge-Based Partitioning 4th Method Knowledge-Based Partitioning proposed by Lopez-Vallejo and Lopez (2003)

Knowledge-Based Partitioning Knowledge acquired by designers can be conserved as partitioning technologies become obsolete. The knowledge is included in the system. Knowledge base can be expanded as new cases are considered.

Inference Structure

Inference Structure cont’d Four inferences Match (Heuristic Classification) Assign Evaluate Select

Match Heuristic rules are stored in knowledge base to match input and output variables. Variables can be: Hardware Quite-Hardware Unkown Quite-Software Software Example: if hw-area is small and time-improvement is high and number-executions are not few then implementation is very hardware

Assign Provides first solution proposal Allocates part of the processes to hardware and some to software This step looks at system blocks, their implementation values and the system constraints (like max hardware/software area, memory available) After the “hardness” of the constraints are considered, a system configuration is composed and initial partition determined.

Evaluate After assignment, computes parameters that will characterize the design. Based on parameters we get an idea of the quality and acceptability of the design. Parameters include: Estimation of required hardware area Estimation of required memory space Execution time (latency) Communication costs (TX and penalty)

Select Purpose: revise proposed solution and search for another proposal. Two step process: Diagnosis – how close is the solution to optimum and what corrections are needed? Maybe correct to lower cost, or maybe design doesn’t meet spec yet. Operation – perform selection of new proposal based on previous info and knowledge base of prior experience.

Select Process

Knowledge-Based Partitioning cont’d Advantage: Can be used over a variety of environments. Takes advantage of system knowledge and past experience Disadvantage: Newer, so not many tools out there that do it this way.

Conclusion Main goal is to find optimized solution Minimize cost Minimize Latency Minimize power use There is no all-encompassing partitioning tool out there. Must identify your constraints and environment to choose appropriate partitioning method.

Emerging Studies Lots of attention given to partitioning methods. Verification is open for study. Currently it is hard to verify an entire HW/SW system Hard to find incompatibilities across HW/SW boundaries (usually found at prototyping stage) Codesign tools need to allow us to synthesize controls, events, response times, scheduling CoWare: http://www.coware.com/portal/page?_pageid=167,105683&_dad=cust_portal&_schema=STAGE Polis Project: http://www-cad.eecs.berkeley.edu/~polis/ ImpulseC: http://www.impulsec.com/ Lots of partitioning studies: http://www-lsi.die.upm.es/publications/publications.html

About the Math For more info on the math for the partitioning methods presented, see Lopez, J. and Vallejo, M. "On the Hardware-Software Partitioning Problem: System Modeling and Partitioning Techniques.“ ACM Transactions on Design Automation of Electronic Systems, Vol. 8, No. 3, July 2003 Also online at: http://www.ee.washington.edu/class/590/peckol/doc/papers/partition1.pdf

References Lopez,J. and Vallejo, M. "On the Hardware-Software Partitioning Problem: System Modeling and Partitioning Techniques. “ACM Transactions on Design Automation of Electronic Systems, Vol. 8, No. 3, July 2003 Kirpatrick, S., Gelatt, C., Vecchi, M. “Optimization by Simulated Annealing.” Science 220, 4598, 671-680, 1983 http://www2.toki.or.id/book/AlgDesignManual/BOOK/BOOK2/NODE92.HTM http://www.queryplus.com/newsletter_2003q3.htm http://www.birc.dk/Research/expdata.html Silva, L., Sampaio, A., Barros, E. "A Constructive Approach to Hardware/Software Partitioning". Formal Methods in System Design, 24, pp. 45-90, 2004 http://www.ida.ing.tu-bs.de/research/projects/cosyma/overview/node4.html http://polimage.polito.it/~lavagno/publications/talk/same99.ppt.gz F. Balarin, E. Sentovich, M. Chiodo, P. Giusto, H. Hsieh, B. Tabbara, A. Jurecska, L. Lavagno, C. Passerone, K. Suzuki, and A. Sangiovanni-Vincentelli. “Hardware- Software Co-design of Embedded Systems -- The POLIS approach”. Kluwer Academic Publishers, 1997.

References cont’d M. Chiodo, P. Giusto, H. Hsieh, A. Jurecska, L. Lavagno, and A. Sangiovanni-Vincentelli. “Hardware/software codesign of embedded systems”. IEEE Micro, 14(4):26--36, August 1994.