Good Programming Practices for Building Less Memory-Intensive EDA Applications Alan Mishchenko University of California, Berkeley.

Slides:



Advertisements
Similar presentations
Part IV: Memory Management
Advertisements

Intermediate Code Generation
Computer Organization and Architecture
4/14/2017 Discussed Earlier segmentation - the process address space is divided into logical pieces called segments. The following are the example of types.
Module 9: Virtual Memory
Module 10: Virtual Memory Background Demand Paging Performance of Demand Paging Page Replacement Page-Replacement Algorithms Allocation of Frames Thrashing.
Computer Organization and Architecture
Computer Organization and Architecture
CSC 4250 Computer Architectures December 8, 2006 Chapter 5. Memory Hierarchy.
Multiprocessing Memory Management
DAG-Aware AIG Rewriting Alan Mishchenko, Satrajit Chatterjee, Robert Brayton Department of EECS, University of California Berkeley Presented by Rozana.
Data Structures Topic #3. Today’s Agenda Ordered List ADTs –What are they –Discuss two different interpretations of an “ordered list” –Are manipulated.
Virtual Memory BY JEMINI ISLAM. What is Virtual Memory Virtual memory is a memory management system that gives a computer the appearance of having more.
 2004 Deitel & Associates, Inc. All rights reserved. Chapter 9 – Real Memory Organization and Management Outline 9.1 Introduction 9.2Memory Organization.
File System. NET+OS 6 File System Architecture Design Goals File System Layer Design Storage Services Layer Design RAM Services Layer Design Flash Services.
Operating Systems Chapter 8
IT253: Computer Organization Lecture 4: Instruction Set Architecture Tonga Institute of Higher Education.
8.4 paging Paging is a memory-management scheme that permits the physical address space of a process to be non-contiguous. The basic method for implementation.
File System Implementation Chapter 12. File system Organization Application programs Application programs Logical file system Logical file system manages.
Lecture Topics: 11/17 Page tables TLBs Virtual memory flat page tables
OSes: 11. FS Impl. 1 Operating Systems v Objectives –discuss file storage and access on secondary storage (a hard disk) Certificate Program in Software.
8.1 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Paging Physical address space of a process can be noncontiguous Avoids.
CSC 221: Recursion. Recursion: Definition Function that solves a problem by relying on itself to compute the correct solution for a smaller version of.
1 Recursive Data Structure Profiling Easwaran Raman David I. August Princeton University.
RUN-Time Organization Compiler phase— Before writing a code generator, we must decide how to marshal the resources of the target machine (instructions,
Dynamic Array. An Array-Based Implementation - Summary Good things:  Fast, random access of elements  Very memory efficient, very little memory is required.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Virtual Memory Hardware.
Review °Apply Principle of Locality Recursively °Manage memory to disk? Treat as cache Included protection as bonus, now critical Use Page Table of mappings.
1 Stephen Jang Kevin Chung Xilinx Inc. Alan Mishchenko Robert Brayton UC Berkeley Power Optimization Toolbox for Logic Synthesis and Mapping.
Review 1 Polish Notation Prefix Infix Postfix Precedence of Operators Converting Infix to Postfix Evaluating Postfix.
CSE 351 Final Exam Review 1. The final exam will be comprehensive, but more heavily weighted towards material after the midterm We will do a few problems.
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
Memory Management. Why memory management? n Processes need to be loaded in memory to execute n Multiprogramming n The task of subdividing the user area.
A Semi-Canonical Form for Sequential Circuits Alan Mishchenko Niklas Een Robert Brayton UC Berkeley Michael Case Pankaj Chauhan Nikhil Sharma Calypto Design.
CS203 – Advanced Computer Architecture Virtual Memory.
LECTURE 19 Subroutines and Parameter Passing. ABSTRACTION Recall: Abstraction is the process by which we can hide larger or more complex code fragments.
Linked Lists Source: presentation based on notes written by R.Kay, A. Hill and C.Noble ● Lists in general ● Lists indexed using pointer arrays ● Singly.
CS161 – Design and Architecture of Computer
CMSC 611: Advanced Computer Architecture
Virtual memory.
Jonathan Walpole Computer Science Portland State University
CS161 – Design and Architecture of Computer
FileSystems.
Outline Paging Swapping and demand paging Virtual memory.
How will execution time grow with SIZE?
Chapter 9 – Real Memory Organization and Management
26 - File Systems.
Delay Optimization using SOP Balancing
Improving Runtime and Memory Requirements in EDA Applications
Design III Chapter 13 9/20/2018 Crowley OS Chap. 13.
Robert Brayton Alan Mishchenko Niklas Een
Module 9: Virtual Memory
Optimizing Malloc and Free
Standard-Cell Mapping Revisited
Alan Mishchenko University of California, Berkeley
Virtual Memory Hardware
Integrating an AIG Package, Simulator, and SAT Solver
Memory management Explain how memory is managed in a typical modern computer system (virtual memory, paging and segmentation should be described.
Delay Optimization using SOP Balancing
CPU Structure CPU must:
Lecture 4: Instruction Set Design/Pipelining
Chapter 11 Processor Structure and function
Module 9: Virtual Memory
Operating Systems: Internals and Design Principles, 6/E
CSc 453 Interpreters & Interpretation
Improving Runtime and Memory Requirements in EDA Applications
Presentation transcript:

Good Programming Practices for Building Less Memory-Intensive EDA Applications Alan Mishchenko University of California, Berkeley

2 Outline Introduction Introduction What is special about programming for EDA What is special about programming for EDA Why much of industrial code is not efficient Why much of industrial code is not efficient Why saving memory also saves runtime Why saving memory also saves runtime When to optimize for memory When to optimize for memory Simplicity wins Simplicity wins Suggestions for improvement Suggestions for improvement Design custom data-structures Design custom data-structures Store objects in a topological order Store objects in a topological order Make fanout representation optional Make fanout representation optional Use 4-byte integers instead of 8-byte pointers Use 4-byte integers instead of 8-byte pointers Never use linked lists Never use linked lists Conclusions Conclusions

3 EDA Programming Programming for EDA is different from Programming for EDA is different from programming for the web programming for the web programming databases, etc programming databases, etc EDA deals with EDA deals with Very complex computations (NP-hard problems) Very complex computations (NP-hard problems) Very large datasets (designs with 100M+ objects) Very large datasets (designs with 100M+ objects) Programming for EDA requires knowledge of algorithms/data-structures and careful hand- crafting of efficient solutions Programming for EDA requires knowledge of algorithms/data-structures and careful hand- crafting of efficient solutions Finding an efficient solution is often the result of a laborious and time-consuming trial-and-error Finding an efficient solution is often the result of a laborious and time-consuming trial-and-error

4 Why Industrial Code Is Often Bad Heritage code Heritage code Designed long ago by somebody who did not know or did not care or both Designed long ago by somebody who did not know or did not care or both Overdesigned code Overdesigned code Designed for the most general case, which rarely or never happens Designed for the most general case, which rarely or never happens Underdesigned code Underdesigned code Designed for small netlists, while the size of a typical netlist doubles every few years, making scalability an elusive target Designed for small netlists, while the size of a typical netlist doubles every few years, making scalability an elusive target

5 Less Memory = Less Runtime Although not true in general, in most EDA applications dealing with large datasets, smaller memory results in faster code Although not true in general, in most EDA applications dealing with large datasets, smaller memory results in faster code Because most of the EDA computations are memory intensive, the effect of CPU cache misses determines their runtime Because most of the EDA computations are memory intensive, the effect of CPU cache misses determines their runtime Keep this in mind when designing new data-structures Keep this in mind when designing new data-structures

6 When to Optimize Memory? Optimize memory if we store many similar entries (nodes in a graph, timing objects, placement locations, etc) Optimize memory if we store many similar entries (nodes in a graph, timing objects, placement locations, etc) For example, when designing a netlist, which typically stores millions of individual objects, the object data-structure is very important For example, when designing a netlist, which typically stores millions of individual objects, the object data-structure is very important However, if only a few instances of a netlist are used at the same time, the netlist data- structure is less important However, if only a few instances of a netlist are used at the same time, the netlist data- structure is less important

7 Design Custom Data-Structures Figure out what is needed in each application and design a custom data-structure Figure out what is needed in each application and design a custom data-structure The lowest possible memory usage The lowest possible memory usage The fastest possible runtime The fastest possible runtime Simpler and cleaner code Simpler and cleaner code Often good data-structures can be reused elsewhere Often good data-structures can be reused elsewhere Translation to and from a custom data-structure rarely takes more than 3% of runtime Translation to and from a custom data-structure rarely takes more than 3% of runtime Example: In a typical synthesis/mapping application, it is enough to have ‘node’ and there is no need for ‘net’, ‘edge’, ‘pin’, etc Example: In a typical synthesis/mapping application, it is enough to have ‘node’ and there is no need for ‘net’, ‘edge’, ‘pin’, etc

8 Store Objects In a Topo Order Topological order Topological order When fanins (incoming edges) of a node precede the node itself When fanins (incoming edges) of a node precede the node itself Using topological order makes it unnecessary to recompute it when performing local or global changes Using topological order makes it unnecessary to recompute it when performing local or global changes Saves runtime Saves runtime Using topological order reduces CPU cache misses, which occur when computation jumps all over memory Using topological order reduces CPU cache misses, which occur when computation jumps all over memory Saves runtime Saves runtime It is best to have a specialized procedure or command to establish a topo order of the network (graph, etc) It is best to have a specialized procedure or command to establish a topo order of the network (graph, etc)

9 Fanout Representation Traditionally, each object (node) in a netlist has both fanins (incoming edges) and fanouts (outgoing edges) Traditionally, each object (node) in a netlist has both fanins (incoming edges) and fanouts (outgoing edges) In most applications, only fanins are enough In most applications, only fanins are enough Reduces memory ~2x Reduces memory ~2x Reduces runtime Reduces runtime Fanouts can be computed on demand Fanouts can be computed on demand Exercise: Implement computation of required times of all nodes in a combinational netlist without fanouts Exercise: Implement computation of required times of all nodes in a combinational netlist without fanouts If many cases, it’s enough to have “static fanout” If many cases, it’s enough to have “static fanout” If netlist is fixed, fanouts are never added/removed If netlist is fixed, fanouts are never added/removed

10 Use Integers Instead of Pointers In the old days, integer (int) and pointer (void *) used the same amount of memory (4 bytes) In the old days, integer (int) and pointer (void *) used the same amount of memory (4 bytes) In recently years, most of the EDA companies and their customers switched to using 64-bits In recently years, most of the EDA companies and their customers switched to using 64-bits One pointers now takes 8 bytes! One pointers now takes 8 bytes! However, most of the code uses a lot of pointers However, most of the code uses a lot of pointers This leads to a 2x memory increase for no reason This leads to a 2x memory increase for no reason Suggestion: Design your code to store attributes of objects as integers, rather than as pointers Suggestion: Design your code to store attributes of objects as integers, rather than as pointers

11 Avoiding Pointers (example) Node points to its fanins Node points to its fanins Fanins can be integer IDs, instead of pointers Fanins can be integer IDs, instead of pointers Instead of a linked list of node pointers, use an array of integer IDs Instead of a linked list of node pointers, use an array of integer IDs A linked list uses at least 6x more memory A linked list uses at least 6x more memory Iterating through a linked list is slower Iterating through a linked list is slower

12 Integer IDs for Indexing Attributes Each node in the netlist can have an integer ID Each node in the netlist can have an integer ID The node structure can be as simple as possible The node structure can be as simple as possible struct Node { struct Node { int ID; int ID; int nFanins; int nFanins; int * pFanins; int * pFanins; }; }; Any attribute of the node can be represented as an entry in the array with node’s ID used as an index Any attribute of the node can be represented as an entry in the array with node’s ID used as an index Vec Type; Vec Type; Vec Level; Vec Level; Vec Slack; Vec Slack; Attributes can be allocated/freed on demand, which helps control memory usage Attributes can be allocated/freed on demand, which helps control memory usage Light-weight basic data-structure makes often-used computations (such as traversals) very fast Light-weight basic data-structure makes often-used computations (such as traversals) very fast

13 Avoid Linked Lists Each link, in addition to user’s data, has previous and next fields Each link, in addition to user’s data, has previous and next fields Potentially 3x increase in memory usage Potentially 3x increase in memory usage Most of linked lists use pointers Most of linked lists use pointers Potentially 2x increase in memory usage Potentially 2x increase in memory usage Other drawbacks Other drawbacks Allocating numerous links leads to memory fragmentation Allocating numerous links leads to memory fragmentation Most data-structures can be efficiently implemented without linked lists Most data-structures can be efficiently implemented without linked lists

14 Simplicity Wins Whenever possible keep data-structures simple and light-weight Whenever possible keep data-structures simple and light-weight It is better to have on-demand attributes associated with objects, rather than an overly complex object data-structure It is better to have on-demand attributes associated with objects, rather than an overly complex object data-structure

15 Case Study: Storage for Many Similar Entries Same-size entries (for example, AIG or BDD nodes) are best stored in an array Same-size entries (for example, AIG or BDD nodes) are best stored in an array Node’s index is the place in the array where the node is stored Node’s index is the place in the array where the node is stored Different-size entries (for example, nodes in a logic network) are best stored in a custom memory manager Different-size entries (for example, nodes in a logic network) are best stored in a custom memory manager Manager allocates memory in pages (e.g. 1MB / page) Manager allocates memory in pages (e.g. 1MB / page) Each page can store entries of different size Each page can store entries of different size Each entry is assigned an integer number (called ID) Each entry is assigned an integer number (called ID) There is a vector mapping IDs into pointers to memory for each object There is a vector mapping IDs into pointers to memory for each object

16 Conclusion Reviewed several reasons for inefficient memory usage in industrial code Reviewed several reasons for inefficient memory usage in industrial code Offered several suggestions and good coding practices Offered several suggestions and good coding practices Gave a vow to think carefully about memory when designing new data-structures Gave a vow to think carefully about memory when designing new data-structures