Yikes! Why is my SystemVerilog Testbench So Slooooow?

Slides:



Advertisements
Similar presentations
© 2004 Wayne Wolf Topics Task-level partitioning. Hardware/software partitioning.  Bus-based systems.
Advertisements

Using emulation for RTL performance verification
Abstraction and Modular Reasoning for the Verification of Software Corina Pasareanu NASA Ames Research Center.
Creating Test Environments HDL Model HDL Testbench Simulation Engine API stimulus check Testbench Program stimulus check Non-HDL languages may be used.
Garbage Collection and High-Level Languages Programming Languages Fall 2003.
1 C++ and Real-Time Programming B.E. Glendenning 1999-Apr-13.
Programming Models & Runtime Systems Breakout Report MICS PI Meeting, June 27, 2002.
Chonnam national university VLSI Lab 8.4 Block Integration for Hard Macros The process of integrating the subblocks into the macro.
Fast Simulation Techniques for Design Space Exploration Daniel Knorreck, Ludovic Apvrille, Renaud Pacalet
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
Boost Verification Results by Bridging the Hw/Sw Testbench Gap by Matthew Ballance Verification Technologist Mentor Graphics.
Processes and Virtual Memory
Virtual Application Profiler (VAPP) Problem – Increasing hardware complexity – Programmers need to understand interactions between architecture and their.
1 Process Description and Control Chapter 3. 2 Process A program in execution An instance of a program running on a computer The entity that can be assigned.
Sung-Dong Kim, Dept. of Computer Engineering, Hansung University Java - Introduction.
Memory Management What if pgm mem > main mem ?. Memory Management What if pgm mem > main mem ? Overlays – program controlled.
CS 501: Software Engineering Fall 1999 Lecture 23 Design for Usability I.
Chapter 4 – Thread Concepts
Using Ada-C/C++ Changer as a Converter Automatically convert to C/C++ to reuse or redeploy your Ada code Eliminate the need for a costly and.
Code Optimization Overview and Examples
OPERATING SYSTEMS CS 3502 Fall 2017
The architecture of the P416 compiler
Andreas Hoffmann Andreas Ropers Tim Kogel Stefan Pees Prof
Introduction Edited by Enas Naffar using the following textbooks: - A concise introduction to Software Engineering - Software Engineering for students-
Current Generation Hypervisor Type 1 Type 2.
Game Architecture Rabin is a good overview of everything to do with Games A lot of these slides come from the 1st edition CS 4455.
The Mach System Sri Ramkrishna.
Chapter 9: Virtual Memory
COMBINED PAGING AND SEGMENTATION
Principles of Information Technology
Digital System Verification
Parallel Programming By J. H. Wang May 2, 2017.
Chapter 4 – Thread Concepts
Introduction SOFTWARE ENGINEERING.
ECE 551: Digital System Design & Synthesis
Chapter 2 Processes and Threads Today 2.1 Processes 2.2 Threads
Optimization Code Optimization ©SoftMoore Consulting.
Filesystems.
Code Optimization I: Machine Independent Optimizations
Introduction Edited by Enas Naffar using the following textbooks: - A concise introduction to Software Engineering - Software Engineering for students-
High Coverage Detection of Input-Related Security Faults
Baremetal C Programming for Embedded Systems
CSCI1600: Embedded and Real Time Software
Introduction to Computer Systems
Chapter 9: Virtual-Memory Management
GEOMATIKA UNIVERSITY COLLEGE CHAPTER 2 OPERATING SYSTEM PRINCIPLES
Performance Optimization for Embedded Software
Chapter 9: Virtual Memory
Chapter 8: Memory management
Outline Module 1 and 2 dealt with processes, scheduling and synchronization Next two modules will deal with memory and storage Processes require data to.
Effective and Efficient memory Protection Using Dynamic Tainting
Threads and Concurrency
Mid Term II Review.
Test Case Test case Describes an input Description and an expected output Description. Test case ID Section 1: Before execution Section 2: After execution.
Debugging EECS150 Fall Lab Lecture #4 Sarah Swisher
Debugging EECS150 Fall Lab Lecture #4 Sarah Swisher
Operating System Introduction.
Why Threads Are A Bad Idea (for most purposes)
SystemVerilog and Verification
Implementing Processes, Threads, and Resources
Identification of Variation Points Using Dynamic Analysis
Main Memory Background
Loop-Level Parallelism
Why Threads Are A Bad Idea (for most purposes)
Why Threads Are A Bad Idea (for most purposes)
Think about your view of QA
CSCI1600: Embedded and Real Time Software
Virtual Memory.
Pointer analysis John Rollinson & Kaiyuan Li
Object Oriented Design
Presentation transcript:

Yikes! Why is my SystemVerilog Testbench So Slooooow? Presented by Justin Sprague Senior Applications Engineer Cadence Design Systems

SV Performance and Productivity Follow SW engineering guidelines IEEE 1800 SystemVerilog (SV) enables productivity Classes, dynamic data types, randomization, etc. simplify creation of sophisticated verification environments Accellera UVM simplifies creation of test and reuse of VIP speeding verification … but can quickly yield large, accurate, and sloooow envs SW engineering != HW engineering because the former involves dynamic code and data SV != Verilog meaning new coding concepts needed for performance Justin Sprague, Cadence

Meh, It’s Just Software Attitude leads to accurate, slow SV TB Hardware (Design) Software (Testbench) Static code structure Physical data – registers, memory space Physically limited loops Deterministic behavior Optimize for device performance Dynamic code structure Abstract data – multi-dimensional arrays, dynamic types with no simple physical equivalent Unlimited loops Environment built around randomization Optimize for simulation performance Justin Sprague, Cadence

Loop Invariants Software looping can range into millions of cycles Loop limits are often dynamically set Loop invariants are executed on each cycle yielding identical results Solution: Move invariants outside of the loop int i, a[256], b[256]; int length=4, count=6, l_end; for (i=0; i < length*count; i++) a[i] = b[i]; l_end = length * count; for (i=0; i < l_end; i++) a[i] = b[i] Note: Red box is low performance coding style. Justin Sprague, Cadence

Short-Circuiting Branches Simulators optimize code to branch when minimum conditions are met Optimization follows order of operations so if the left-most operator is the minimum condition then code runs fast Solution: Where possible order the branch terms L2R if (nearly_everytime() || sometimes_happens() || rarely_happens()) code_to_execute size = millions_of_pkts.size(); for (i = 0; i < size; i++) begin data = millions_of_pckts[i].randomize(); live = millions_of_pktsp[i].live if (live == TRUE) inject(data); end Justin Sprague, Cadence

Static Versus Dynamic Classes Static allocation is the object oriented extension to the general programming concept of macros When classes are created and destroyed repeatedly in large loops, memory can fragment and OS can become overhead Solution: Statically allocate classes in these situations or define a pool of classes deep enough to support the maximum working set Justin Sprague, Cadence

Know the Class Hierarchy Undocumented classes can be a performance nightmare Methods and data in base classes exist in all derived classes Reimplementation in derived classes carries dead-code or unused/misused data Memory grows faster than expected, performance slows Solution: Demand documentation! And supply it . Also, only access information via standard interfaces Justin Sprague, Cadence

Track Class and Data Handles Garbage collection in SV works if properly managed Number of references to the class handle falls to zero Simulation engine tracing algorithm detects object graphs that are self-referencing but lack direct user references Poor handle control is the leading cause of memory leaks Dynamic and associative arrays are most susceptible Bugs can be insidiously hard to trace Solution: Add checks to code to detect overflows and be especially wary of global data as multiple threads operating on a single data structure can create unexpected side-effects Justin Sprague, Cadence

Thread Pool Vs. Create/Destroy Execution threads, like data and classes, can experience handle issues and memory fragmentation Solution: Follow similar techniques when implementing very high thread-count environments forever begin @(posedge clk); fork for(int i = 0; i < 32; i++) begin automatic idx = i; wait(bus[idx] == 1); end join Justin Sprague, Cadence

Unforeseen Library Overhead Libraries like UVM enable fast creation of verification envs Be aware of library implementation when scaling-up env Ex. Testbench channel may be clocked so sending many tiny pieces of data may be much more expensive than aggregating the data and sending a single, large structure Solution: Always implement functional code first. If performance issues arise, profile and consider alternate algorithms that better utilize standard library interfaces Justin Sprague, Cadence

Summary - Simulators are built to run legal SystemVerilog 2 algorithms can have equivalent functionality and vastly different performance Coding errors and code awareness can lead to unforeseen and hard to debug performance issues Start from a set of best practices Think about performance from the beginning Understand and apply SW engineering principles Use profiling as an algorithm development tool Justin Sprague, Cadence

Thank You. Questions? Justin Sprague, Cadence