On-Chip Communication Architectures Models for Performance Exploration ICS 295 Sudeep Pasricha and Nikil Dutt Slides based on book chapter 4 1© 2008 Sudeep.

Slides:



Advertisements
Similar presentations
DATE 2003, Munich, Germany Formal Verification of a System-on-Chip Bus Protocol Abhik Roychoudhury Tulika Mitra S.R. Karri National University of Singapore.
Advertisements

Evaluation of On-Chip Interconnect Architectures for Multi-Core DSP Students : Haim Assor, Horesh Ben Shitrit 2. Shared Bus 3. Fabric 4. Network on Chip.
3D Graphics Content Over OCP Martti Venell Sr. Verification Engineer Bitboys.
Presenter : Cheng-Ta Wu Kenichiro Anjo, Member, IEEE, Atsushi Okamura, and Masato Motomura IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 39,NO. 5, MAY 2004.
Computer Science & Engineering
Datorteknik BusInterfacing bild 1 Bus Interfacing Processor-Memory Bus –High speed memory bus Backplane Bus –Processor-Interface bus –This is what we usually.
PradeepKumar S K Asst. Professor Dept. of ECE, KIT, TIPTUR. PradeepKumar S K, Asst.
LOGO HW/SW Co-Verification -- Mentor Graphics® Seamless CVE By: Getao Liang March, 2006.
Puneet Arora ESCUG, 09 Abstraction Levels in SoC Modelling.
Transaction Level Modeling with SystemC Adviser :陳少傑 教授 Member :王啟欣 P Member :陳嘉雄 R Member :林振民 P
Chapter 8 Hardware Conventional Computer Hardware Architecture.
Addressing the System-on-a-Chip Interconnect Woes Through Communication-Based Design N. Vinay Krishnan EE249 Class Presentation.
Interrupts (contd..) Multiple I/O devices may be connected to the processor and the memory via a bus. Some or all of these devices may be capable of generating.
ECE669 L20: Evaluation and Message Passing April 13, 2004 ECE 669 Parallel Computer Architecture Lecture 20 Evaluation and Message Passing.
© 2006 Pearson Education, Upper Saddle River, NJ All Rights Reserved.Brey: The Intel Microprocessors, 7e Chapter 13 Direct Memory Access (DMA)
NoC Modeling Networks-on-Chips seminar May, 2008 Anton Lavro.
1 HW/SW Partitioning Embedded Systems Design. 2 Hardware/Software Codesign “Exploration of the system design space formed by combinations of hardware.
MINIMISING DYNAMIC POWER CONSUMPTION IN ON-CHIP NETWORKS Robert Mullins Computer Architecture Group Computer Laboratory University of Cambridge, UK.
Sonics Bus Modeling for Felix/VCC EE249 Project Presentation December 3, 1999 Mike Sheets.
Transaction Level Modeling Definitions and Approximations Trevor Meyerowitz EE290A Presentation May 12, 2005.
Dipartimento di Informatica - Università di Verona Networked Embedded Systems The HW/SW/Network Cosimulation-based Design Flow Introduction Transaction.
ARTIST2 Network of Excellence on Embedded Systems Design cluster meeting –Bologna, May 22 nd, 2006 System Modelling Infrastructure Activity leader : Jan.
Trend towards Embedded Multiprocessors Popular Examples –Network processors (Intel, Motorola, etc.) –Graphics (NVIDIA) –Gaming (IBM, Sony, and Toshiba)
CPU Chips The logical pinout of a generic CPU. The arrows indicate input signals and output signals. The short diagonal lines indicate that multiple pins.
Principle of Functional Verification Chapter 1~3 Presenter : Fu-Ching Yang.
Network-on-Chip: Communication Synthesis Department of Computer Science Texas A&M University.
COMP3221 lec31-mem-bus-II.1 Saeid Nooshabadi COMP 3221 Microprocessors and Embedded Systems Lectures 32: Memory and Bus Organisation - II
Embedded Systems Design at Mentor. Platform Express Drag and Drop Design in Minutes IP Described In XML Databook s Simple System Diagrams represent complex.
1 Embedded Computer System Laboratory RTOS Modeling in Electronic System Level Design.
Presenter : Shao-Cheih Hou Sight count : 11 ASPDAC ‘08.
OCP: Open Core Protocol Marta Posada ESA/ESTEC June 2006.
On-Chip Communication Architectures Synthesis Techniques ICS 295 Sudeep Pasricha and Nikil Dutt Slides based on book chapter 6 1© 2008 Sudeep Pasricha.
Digital System Bus A bus in a digital system is a collection of (usually unbroken) signal lines that carry module-to-module communications. The signals.
CHAPTER 3 TOP LEVEL VIEW OF COMPUTER FUNCTION AND INTERCONNECTION
Presenter: Zong Ze-Huang Fast and Accurate Resource Conflict Simulation for Performance Analysis of Multi- Core Systems Stattelmann, S. ; Bringmann, O.
Other Chapters From the text by Valvano: Introduction to Embedded Systems: Interfacing to the Freescale 9S12.
TELE202 Lecture 5 Packet switching in WAN 1 Lecturer Dr Z. Huang Overview ¥Last Lectures »C programming »Source: ¥This Lecture »Packet switching in Wide.
SystemC and Levels of System Abstraction: Part I.
F. Gharsalli, S. Meftali, F. Rousseau, A.A. Jerraya TIMA laboratory 46 avenue Felix Viallet Grenoble Cedex - France Embedded Memory Wrapper Generation.
Architectural and Physical Design Optimization for Efficient Intra-Tile Communication Liza Rodriguez Aurelio Morales EEL Embedded Systems Dept.
3 rd Nov CSV881: Low Power Design1 Power Estimation and Modeling M. Balakrishnan.
Data and Computer Communications Chapter 11 – Asynchronous Transfer Mode.
I/O Computer Organization II 1 Interconnecting Components Need interconnections between – CPU, memory, I/O controllers Bus: shared communication channel.
COMPUTER ORGANIZATIONS CSNB123. COMPUTER ORGANIZATIONS CSNB123 Expected Course Outcome #Course OutcomeCoverage 1Explain the concepts that underlie modern.
MBG 1 CIS501, Fall 99 Lecture 18: Input/Output (I/O): Buses and Peripherals Michael B. Greenwald Computer Architecture CIS 501 Fall 1999.
EEE440 Computer Architecture
Winter-Spring 2001Codesign of Embedded Systems1 Methodology for HW/SW Co-verification in SystemC Part of HW/SW Codesign of Embedded Systems Course (CE.
Field Programmable Port Extender (FPX) 1 Modular Design Techniques for the FPX.
SCE-MI Meeting 1 San Jose’, 14 th Nov Author: Andrea Castelnuovo SCE-MI Integrating Emulation in a system level design methodology San Jose’, 14/11/2003.
SOC Virtual Prototyping: An Approach towards fast System- On-Chip Solution Date – 09 th April 2012 Mamta CHALANA Tech Leader ST Microelectronics Pvt. Ltd,
Soc 5.1 Chapter 5 Interconnect Computer System Design System-on-Chip by M. Flynn & W. Luk Pub. Wiley 2011 (copyright 2011)
Chapter 13 – I/O Systems (Pgs ). Devices  Two conflicting properties A. Growing uniformity in interfaces (both h/w and s/w): e.g., USB, TWAIN.
Teaching The Principles Of System Design, Platform Development and Hardware Acceleration Tim Kranich
An Efficient Gigabit Ethernet Switch Model for Large-Scale Simulation Dong (Kevin) Jin.
Bus Protocols and Interfacing (adopted Steven and Marios’s slides) Bus basics I/O transactions MPC823 bus Reference: Chapter 13 of “White Book”
Spring 2007W. Rhett DavisNC State UniversityECE 747Slide 1 ECE 747 Digital Signal Processing Architecture SoC Lecture – Working with Buses & Interconnects.
Sunpyo Hong, Hyesoon Kim
Chapter 3 System Buses.  Hardwired systems are inflexible  General purpose hardware can do different tasks, given correct control signals  Instead.
Field Programmable Port Extender (FPX) 1 Modular Design Techniques for the Field Programmable Port Extender John Lockwood and David Taylor Washington University.
Disk Drive Architecture Exploration VisualSim Mirabilis Design.
Creation and Utilization of a Virtual Platform for Embedded Software Optimization: An Industrial Case Study Sungpack Hong, Sungjoo Yoo, Sheayun Lee, Sangwoo.
Aditya Dayal M. Tech, VLSI Design ITM University, Gwalior.
Chapter 6 Input/Output Organization
Framework For Exploring Interconnect Level Cache Coherency
CoCentirc System Studio (CCSS) by
Overview of Computer Architecture and Organization
Presentation transcript:

On-Chip Communication Architectures Models for Performance Exploration ICS 295 Sudeep Pasricha and Nikil Dutt Slides based on book chapter 4 1© 2008 Sudeep Pasricha & Nikil Dutt

Outline Introduction Static Performance Estimation Models ◦ Analytical/Estimation-based Dynamic Performance Estimation Models ◦ Simulation-based Hybrid Performance Estimation Models ◦ Static/dynamic-based 2© 2008 Sudeep Pasricha & Nikil Dutt

Introduction On-chip communication architectures have numerous sources of delay ◦ signal propagation ◦ synchronization (e.g., handshaking) ◦ transfer modes  pipeline access, burst transfer, etc. ◦ arbitration mechanisms ◦ cross-bridge or cross-clock domain transfers ◦ data packing/unpacking at interfaces These significantly influence SoC performance and are a major bottleneck in many designs ◦ important to consider these during SoC exploration 3© 2008 Sudeep Pasricha & Nikil Dutt

Communication Architecture Performance Estimation in ESL Design Flow 4© 2008 Sudeep Pasricha & Nikil Dutt

Static Communication Architecture Performance Estimation Attempts to determine the performance of a system through analysis ◦ closed form expressions that capture system performance as a function of parameters Key challenge: determine the right set of system parameters and their interactions Next few slides ◦ Review of static performance estimation methods 5© 2008 Sudeep Pasricha & Nikil Dutt

Static Communication Architecture Performance Estimation Knudsen et al [CODES 1998] presented a high level estimation model for communication throughput for a given protocol Delays are estimated for the following components ◦ Transmitting drivers ◦ Receiving drivers ◦ Channel Approach assumes pipelined transfers and estimates ◦ burst time, ◦ data packet splitting/joining time at interface 6© 2008 Sudeep Pasricha & Nikil Dutt

7 transmission delay channel delay Static Communication Architecture Performance Estimation © 2008 Sudeep Pasricha & Nikil Dutt

8 Static Communication Architecture Performance Estimation © 2008 Sudeep Pasricha & Nikil Dutt receiver delay maximum total delay (assuming pipelined operation) total transmission delay

9 Renner et al [RSP 1999] presented more detailed communication performance estimation models ◦ transmitter, channel, and receiver delays ◦ also considers software, wire delay, protocol latencies Static Communication Architecture Performance Estimation © 2008 Sudeep Pasricha & Nikil Dutt

10 Transmitter/Receiver delay model n – number of cycles to put data on channel f – frequency of core Example timing results of transmitter/receiver part Static Communication Architecture Performance Estimation

11 Static Communication Architecture Performance Estimation Channel delay model Delay for one bit link Example timing results of channel part t WIRE = wire delayt SW = switch delay t FPGA = FPGA delayt DPR = memory access time where

12 Static Communication Architecture Performance Estimation Protocol delay model

13 Static Communication Architecture Performance Estimation Total communication delay ◦ for a single transmission ◦ for pipelined transmission

Static Communication Architecture Performance Estimation Cho et al. [SLIP 2006] proposed analytical performance model for AMBA 2.0 AHB single shared bus and hierarchical shared bus architectures Latency of shared bus  N d = number of data items to be transferred  N m = number of masters on the bus  B = fixed burst size  S = probability of single mode transfers on shared bus  U = usage of the bus, and is a probability of continuing single transfers, in a pipelined manner (helping to reduce L s ) 14© 2008 Sudeep Pasricha & Nikil Dutt

Static Communication Architecture Performance Estimation Latency of hierarchical shared bus  N l = number of layers (or buses) in hierarchical shared bus architecture  A = probability of the path of the data transfer passing through a bridge  = bridge factor; represents latency overhead caused by using bridge Assumptions of model: ◦ slave does not introduce any wait states ◦ request and address phases occur in the same cycle Using appropriate A, S and U values, an accuracy of 96% and 85% was obtained compared to a simulation-based approach for shared bus and hierarchical bus 15© 2008 Sudeep Pasricha & Nikil Dutt 1

Limitations of Static Performance Estimation Methods Require several assumptions that depend on application functionality and are not so easy to model ◦ e.g., probabilistic values for parameters, single cycle arbitration for all transfers, etc. Unable to account for non-deterministic traffic generation by the components on the buses ◦ cannot predict dynamic component (e.g., memory access) delays Cannot easily account for other sources of dynamic delays, due to ◦ complex arbitration and traffic congestion, cache misses, burst interruptions, interface buffer overflows, the effects of advanced bus architecture features such as SPLIT/OO transaction completion, etc Limited applicability for most medium- to large-scale SoCs ◦ useful for obtaining worst case performance bounds ◦ can provide (conservative) performance estimates early in design flow 16© 2008 Sudeep Pasricha & Nikil Dutt

Dynamic (Simulation-based) Communication Architecture Performance Estimation Simulate application; capture application specific effects Several modeling abstractions used by designers ◦ trade-off simulation speed, modeling effort and accuracy 17© 2008 Sudeep Pasricha & Nikil Dutt

Cycle Accurate (CA) Models 18© 2008 Sudeep Pasricha & Nikil Dutt TLM PA-BCA CA Algorithm Detailed system debug and analysis Time consuming to model - /1 to /3 RTL Too slow for exploring SoC designs - 100x RTL var1 = a + b; wait(); REG = d<<var1; wait(); HREQ.set(1); e = REG4 | 0xff wait(); bus arb case CTR_WR: CTR_WR = in; wait(); CTR_WR |=0xf; wait(); ST_RG = in|0x1 wait(); masterslave pin interface T-BCA

Cycle Accurate (CA) Models 19© 2008 Sudeep Pasricha & Nikil Dutt Loghi et al [DATE 2004] used CA models written in SystemC to explore AMBA2 and STBus communication architectures for MPSoCs

Pin Accurate Bus Cycle Accurate (PA-BCA) Models 20© 2008 Sudeep Pasricha & Nikil Dutt High level system exploration Still time consuming to model - /5 to /10 RTL Still slow for exploring SoC designs - 100x to 500x RTL … var1 = a + b; REG = d<<var1; HREQ.set(1); e = REG4 | 0xff wait(3, SC_NS); … bus arb … case CTR_WR: CTR_WR = in; CTR_WR |=0xf; ST_RG = in|0x1 wait(3,SC_NS); … slavemaster pin interface TLM PA-BCA CA T-BCA Algorithm

Pin Accurate Bus Cycle Accurate (PA-BCA) Models Séméria et al. [ASPDAC 2000] used PA-BCA models (also called bus functional models or BFM) to improve simulation speed over CA models ◦ for the purpose of HW/SW co-verification ◦ modeled in SystemC ◦ 20x speedup if processor ISS model granularity raised Kalla et al. [ASPDAC 2005] executed traces of component behavior on a PA-BCA simulator ◦ as much as a 94% speedup over CA simulation model 21© 2008 Sudeep Pasricha & Nikil Dutt

Transaction-based Bus Cycle Accurate (T-BCA) Models 22© 2008 Sudeep Pasricha & Nikil Dutt Uses Transaction Level Modeling (TLM) techniques to speed up BCA model simulation Time to model varies Simulation speed generally faster than PA-BCA … var1 = a + b; d = d << var1; request(port1); e = REG4 | 0xff wait(3, SC_NS); HSEL.set(1); … case CTR_WR: CTR_WR = in; CTR_WR |=0xf; ST_RG = in|0x1 wait(3, SC_NS); … slavemaster pin, transaction interface bus arb TLM PA-BCA CA T-BCA Algorithm

Transaction-based Bus Cycle Accurate (T-BCA) Models Caldari et al. [DATE 2003] modeled AMBA2 AHB, APB using function calls for reads/writes ◦ used SystemC 2.0, with clocked threads to capture components ◦ in addition to read( ) and write( ) transaction functions signals such as HREADY and HRESP were also captured  to maintain cycle accuracy ◦ compared PA-BCA model of the STBus and a T-BCA model of the AMBA AHB and APB buses  showed a speedup of between 3x and 7x for the T-BCA model  for different traffic profiles on a small SoC testbench ◦ 100x speedup for T-BCA model over a CA model of AMBA AHB 23© 2008 Sudeep Pasricha & Nikil Dutt

Transaction-based Bus Cycle Accurate (T-BCA) Models Ogawa et al. [DATE 2004] created another T-BCA model variant for the AMBA AHB bus architecture ◦ using C as the modeling language ◦ explicit low level handshaking semantics with request, response signaling captured ◦ speedup of about 30x compared to CA model during design space exploration of an AMBA AHB based graphics display SoC Kim et al. [30] used another approach for T-BCA modeling ◦ capture signals as function calls, which enables simulation speedup while still maintaining bus cycle accuracy ◦ used in the Synopsys Cycle Accurate SystemC models for AMBA AHB and APB 24© 2008 Sudeep Pasricha & Nikil Dutt

Transaction-based Bus Cycle Accurate (T-BCA) Models Pasricha et al. [DAC 2004] proposed the Cycle Count Accurate at Transaction Boundaries (CCATB) modeling abstraction can be modeled in SystemC, or any other modeling language (C, C++, Java, etc) raises modeling abstraction above T-BCA maintains overall cycle accuracy, essential for system exploration uses concepts of transactions from TLM ◦ no pins modeled ◦ extension of TLM read(), write() interface 25© 2008 Sudeep Pasricha & Nikil Dutt

Transaction-based Bus Cycle Accurate (T-BCA) Models CCATB read and write (SystemC 2.0) 26© 2008 Sudeep Pasricha & Nikil Dutt

Transaction-based Bus Cycle Accurate (T-BCA) Models Control token structure in CCATB 27© 2008 Sudeep Pasricha & Nikil Dutt

Transaction-based Bus Cycle Accurate (T-BCA) Models 28© 2008 Sudeep Pasricha & Nikil Dutt CCATB model captures all delays encountered by transaction ◦ clusters timing delays & minimizes no. of actively simulating IPs ◦ maximizes opportunity to increment simulation time in bursts Target delay Interface delay Communication protocol delay Arbitration delay Initiator delay ITC interface TIMER interface MEM1 interface ARM Processor interface MASTER 1 interface MEM CONTROLLER interface ARBITER MEM2MEM3 DMA interface AMBA 2.0 Bus

29 Contrasting CCATB with Detailed Pin Accurate Abstraction CCATB model takes the same amount of time to complete a read/write transaction as a detailed pin-accurate model CCATB trades off intra-transaction visibility for simulation speed

30 Comparing CCATB with Other Abstractions Switch AHB System bus 1 ARM926EJ-S ROM SDRAM I/F Arbiter DMA RAM AHB/APB Bridge APB peripheral bus ITC Timer UART EMC USB AHB/AHB Bridge AHB System bus 2 RAM Traffic gen1Arbiter AHB System bus 3 RAM Traffic gen2Arbiter Traffic gen3 Compared CCATB performance with PA-BCA and T-BCA models Explore effect of changing system complexity on simulation speed ◦ start with simple SoC system ◦ iteratively add components to increase complexity ◦ measure simulation speed at each iteration

31 Model AbstractionAverage CCATB speedup (x times)Modeling Effort CCATB1~3 days T-BCA1.67~4 days PA-BCA2.2~1.5 wks CCATB takes less time to model than other abstractions CCATB consistently faster than PA-BCA and T-BCA Comparing CCATB with Other Abstractions

Transaction Level Models 32© 2008 Sudeep Pasricha & Nikil Dutt High level system validation and embedded software development Fast to model - /10 to /50 RTL Fast simulation speed, but model not too detailed for exploring SoC designs - >>1000x RTL … var1 = a + b; d = d << var1; request(port1); e = REG4 | 0xff wait(); … bus arb … case CTR_WR: CTR_WR = in; CTR_WR |=0xf; ST_RG = in|0x1 wait(); … slavemaster generic channel interface channel TLM PA-BCA CA T-BCA Algorithm

Transaction Level Models TLM can be thought of as a P2P, zero-time interconnection between system components To enable comm. architecture exploration at the TLM level, some approaches incorporate bus protocol structural and timing details in TLM ◦ not guaranteed to be very accurate in estimating performance Arbitrated-TLM (ATLM) add support for arbitration and shared buses, to capture contention during communication ◦ Pasricha et al. [SNUG 2002] ◦ Ariyamparambath et al. [ISSOC 2003] ◦ Schirner et al. [DATE 2006] 33© 2008 Sudeep Pasricha & Nikil Dutt

Transaction Level Models Ariyamparambath et al. [ISSOC 2003] annotated ATLM models with bus-protocol-specific timing details ◦ Introduced the near cycle accurate (NCA) bus that has timing annotation to capture bus protocol specific delays ◦ NCA abstract bus model automatically calculates the time delay associated with the data transfer ◦ Waits for that time delay before calling the slave interface and writing the data to it ◦ Delay information captures  Internal bus delay cycles (e.g, request, grant, etc)  Pipeline delay cycles  Burst length cycles 34© 2008 Sudeep Pasricha & Nikil Dutt

Transaction Level Models Viaud et al. [DATE 2006] proposed TLM/T (transaction level model with time) abstraction level ◦ each component modeled as a thread, and has a local clock ◦ communication via packets transferred on P2P channels ◦ effect of arbitration modeled by global interconnect model, which includes all the P2P links interconnecting components ◦ local clocks of two threads are synchronized every time a packet is sent from one thread to the other. ◦ simulation speed is improved because each (master) component has a local clock, with no need for global synchronization at every system cycle ◦ Experimental results on a generic OCP/VCI comm. architecture showed a speedup of 10x to 60x compared to a PA-BCA model, at a slight loss in accuracy of less than 1% 35© 2008 Sudeep Pasricha & Nikil Dutt

Transaction Level Models Schirner et al. [CODES+ISSS 2006] proposed result oriented modeling (ROM) ◦ model initially predicts time taken to complete a transaction, and corrects prediction if required at the end of prediction period ◦ correction accounts for disturbing influences such as transactions from higher priority masters that can lengthen transaction completion time ◦ due to the correction mechanism, the model complexity is higher than CCATB and other T-BCA models ◦ can provide speedup for statically scheduled, predictable applications such as real-time CAN-based systems 36© 2008 Sudeep Pasricha & Nikil Dutt

Multiple Abstraction Modeling Flows Modeling abstractions described till now have had different strengths and weaknesses stemming from inherent trade-off between ◦ complexity of details captured ◦ estimation accuracy ◦ simulation speed Useful to have a communication-centric exploration flow that integrates several abstraction levels ◦ allow performance exploration with different levels of captured details, accuracy, and simulation speed in an SoC design flow A few pieces of work have proposed such communication-centric design space exploration flows 37© 2008 Sudeep Pasricha & Nikil Dutt

Multiple Abstraction Modeling Flows Rowson et al. [DAC 1997] illustrated the use of multiple abstraction levels for communication architecture exploration of an ATM packet network 38© 2008 Sudeep Pasricha & Nikil Dutt

Multiple Abstraction Modeling Flows Hines et al. [DAC 1997] proposed using multiple levels of abstraction for comm. architecture exploration, with the ability to dynamically switch between them ◦ for greater exploration flexibility in terms of simulation speed and accuracy ◦ approach allows a designer to switch from a detailed PA-BCA model to less detailed TLM-like models to speed up exploration Beltrame et al. [DATE 2006] proposed a similar approach ◦ dynamic switching between BCA, untimed TLM, timed TLM ◦ to improve simulation speed for exploration 39© 2008 Sudeep Pasricha & Nikil Dutt

Multiple Abstraction Modeling Flows Haverinen et al. [OCP White Paper 2003] proposed a stack of comm. abstraction layers, each having a different level of detail for modeling comm. in a design flow ◦ adapted for use in the LISA Processor Design Platform, to jointly design and explore processor architecture with an on-chip communication architecture 40© 2008 Sudeep Pasricha & Nikil Dutt

Multiple Abstraction Modeling Flows Kogel et al. [CODES+ISSS 2003] made use of 3 of the abstraction levels from the comm. layer stack to explore design of a network processing unit for IP forwarding 41© 2008 Sudeep Pasricha & Nikil Dutt

Multiple Abstraction Modeling Flows Pasricha et al. [DAC 2004] proposed another variant of communication-centric design flow 42© 2008 Sudeep Pasricha & Nikil Dutt

Hybrid Performance Estimation Approaches Hybrid performance estimation techniques ◦ combine static and dynamic performance estimation strategies ◦ speed up comm. architecture performance estimation while generating accurate performance exploration results 43© 2008 Sudeep Pasricha & Nikil Dutt

Hybrid Performance Estimation Approaches Lahiri et al. [VLSID 2000] proposed a hybrid trace-based comm. architecture performance exploration technique 44© 2008 Sudeep Pasricha & Nikil Dutt dynamic static

Hybrid Performance Estimation Approaches Trace generated from simulation phase 45© 2008 Sudeep Pasricha & Nikil Dutt

Hybrid Performance Estimation Approaches CAG generated from simulation trace 46© 2008 Sudeep Pasricha & Nikil Dutt

Hybrid Performance Estimation Approaches Augmenting CAG with comm. protocol details in static phase 47© 2008 Sudeep Pasricha & Nikil Dutt

Hybrid Performance Estimation Approaches Accuracy comparisons 48© 2008 Sudeep Pasricha & Nikil Dutt

Hybrid Performance Estimation Approaches Speedup comparisons 49© 2008 Sudeep Pasricha & Nikil Dutt

Hybrid Performance Estimation Approaches Kim et al. [CODES+ISSS 2003] proposed another hybrid performance estimation approach ◦ static performance-estimation technique based on a queuing analysis as the first step to prune the design space ◦ simulation-based approach to accurately explore the reduced design space as the second step ◦ Limitations  static queuing approach insufficient to handle complex bus protocol features (e.g., SPLIT/OO transactions, OO transaction completion) 50© 2008 Sudeep Pasricha & Nikil Dutt

Summary Static performance estimation techniques ◦ + enable fast, early performance estimation ◦ - unable to account for dynamic effects that can have a significant effect on performance Dynamic performance estimation techniques ◦ + provide accurate and reliable performance results, ◦ - can become time consuming for large applications Hybrid performance estimation techniques ◦ combine static and dynamic performance estimation strategies ◦ can speed up communication architecture performance estimation while generating accurate performance exploration results © 2008 Sudeep Pasricha & Nikil Dutt51

52© 2008 Sudeep Pasricha & Nikil Dutt