Jim Fawcett CSE 691 – Software Modeling and Analysis Fall 2000

Jim Fawcett CSE 691 – Software Modeling and Analysis Fall 2000
Performance Jim Fawcett CSE 691 – Software Modeling and Analysis Fall 2000

Importance Performance is the MOST IMPORTANT thing !!! after correctness, robustness, reusability, … No point in generating incorrect answers very quickly. No point in responding quickly except when we crash. No point in writing responsive code that is so hard to use that no one uses it.

Performance Issues Sources of performance problems (in order of importance) Thrashing and Starvation Blocking Queuing Delays Algorithms Using a data structure matched to the problem Copying Use of heap Creating temporary objects Cost of constructions Code optimizations not including the above – forget about it! Another important issue: Technology, e.g., are we using C++ STL, MFC, ATL, WTL, COM, …

Measurement Accuracy How much of what we measure is measurement overhead? Output to console or file is slow. How granular is our time reference? Win32 clock ticks 100 times per second, at least that’s what clock() measures.

Thrashing and Starvation
Thrashing occurs when some fixed overhead dominates the time available to do something useful. Switching context between threads or processes is a frequent source of thrashing. When two or more processes are using lots of CPU, the OS may switch between them so frequently that switching overhead dominates the available time. Starvation may occur when two concurrent activities are running and one is CPU intensive while the other does a lot of I/O. The I/O intensive processing spends a lot of its time blocked, while the CPU intensive processing is always ready to run.

Blocking Any time a thread has to wait for a resource it becomes blocked and is put to sleep until the resource becomes available. Waiting in the blocked state is efficient. No CPU cycles are consumed polling for resource availability, at least not in the application program. Threads block when: Waiting for I/O completion Waiting for synchronous interprocess communication to complete. Waiting for completion of a synchronous function call serviced by another thread or process. Waiting for another thread at a critical section or resource guarded by a mutex.

Blocking (continued) If a component with “bursty” activity sends data synchronously to a component with more uniform activity in time, the sending process will spend a lot of time blocked. Assume that the receiver has a constant service time which is much longer than the interval between bursts when sender is busy but short enough to handle sender’s average request traffic. Then the sender will spend a lot of time blocked during its busy periods, when it could otherwise be doing useful work.

Queuing Delays A bursty sender can be freed from blocking, waiting for a receiver, by enqueuing its messages into a thread safe queue. This works fine until the sender’s average request interval decreases to nearly equal the receiver’s average service time. When this happens the time a message spends waiting in the queue becomes very large.

Algorithms For non-distributed, single-threaded applications, choosing a good algorithm is the single most important design decision. Choosing a good algorithm is often a matter of one of the following: When should we sort? How should we search? What is a good data structure to use? Seeking the BEST algorithm is often counter-productive. It is very often the case that an optimal solution is complex and hard to implement correctly and keep maintainable. And, it is very often the case that a simple, nearly optimal solution exists, that is easy to implement and easy to maintain. The C++ Standard Template Library uses red-black binary trees as the basis for its associative containers: sets, multisets, maps, and multimaps, rather than the optimal AVL balanced trees, for these reasons.

Data Structures Using a good data structure is similar to using a good algorithm. STL vectors Random access, sequential search, need to move elements for insertion and deletion. Fast access, slow insertion and deletion except at the end. STL lists Sequential access and search. Don’t need to move elements. Fast access, insertion and deletion at the current element. Slow access, insertion, and deletion far from the current element. STL sets and multisets Binary access and search. Don’t need to move elements. May need to move links to balance tree. Fairly fast access, insertion, and deletion anywhere. STL maps and multimaps Same properties as sets. A map is a set of key-value pairs. Fairly fast access to keys, very fast access to value from key.

Copying If data sizes are large, copying can be a severe burden. Whenever possible: Pass by reference. Copies only a (hidden) pointer. Pass by value copies the entire data set. Pass by const reference whenever program semantics allow. Return references whenever the returned object existed before the function call. Partition modules to minimize size of data transferred. Layout classes to minimize communication costs.

Use of the Heap Using the C++ operators new and delete to allocate memory is slow. They are based on malloc so choosing malloc instead is a lose-lose tradeoff: you don’t gain more speed you don’t get type safety New and delete call the allocated object’s constructor and destructor, respectively. Malloc(…) simply returns un-initialized memory. They provide a very general, widely useful service. Overloading the new and delete operators can speed up the allocation process when a lot is known about the types of allocations required, e.g., fixed size objects.

Creating Temporary Objects
C++ will often create temporary objects to allow convenient syntax for the programmer. You need to know when that happens and choose wisely. Resolving type mismatches between function arguments and formal parameters causes temporary objects to be created whenever appropriate constructors exist. Otherwise compilation fails. Same thing is often true for return values being assigned to some other object type. Explicit temporaries are constructed every time you declare a local variable. If the function with a local-user defined type is called 1000 times, that type’s constructor is called 1000 times. You may choose to allow temporaries to make your code readable and maintainable, but in time-critical applications you will want to do some performance measurements to be sure that trade-off is appropriate.

Cost of Constructions Declaring an instance of a user-defined type will incur a construction cost, as discussed on the previous slide. How much that costs depends entirely on how you design its class. How big is the object’s state? Are other, possibly expensive, objects used as data members? Are heap allocations required? Are other resources allocated at construction time? Mutexes and semaphores Files and streams Database locks Transaction locks

Jim Fawcett CSE 691 – Software Modeling and Analysis Fall 2000

Similar presentations

Presentation on theme: "Jim Fawcett CSE 691 – Software Modeling and Analysis Fall 2000"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Jim Fawcett CSE 691 – Software Modeling and Analysis Fall 2000

Similar presentations

Presentation on theme: "Jim Fawcett CSE 691 – Software Modeling and Analysis Fall 2000"— Presentation transcript:

Similar presentations

About project

Feedback