Writing High Performance Delphi Application

Slides:



Advertisements
Similar presentations
Recursion Chapter 14. Overview Base case and general case of recursion. A recursion is a method that calls itself. That simplifies the problem. The simpler.
Advertisements

Part IV: Memory Management
Counting the bits Analysis of Algorithms Will it run on a larger problem? When will it fail?
Parallel Programming with OmniThreadLibrary
Multithreaded Programs in Java. Tasks and Threads A task is an abstraction of a series of steps – Might be done in a separate thread – Java libraries.
Master/Slave Architecture Pattern Source: Pattern-Oriented Software Architecture, Vol. 1, Buschmann, et al.
Fundamentals of Python: From First Programs Through Data Structures
Creating Computer Programs lesson 27. This lesson includes the following sections: What is a Computer Program? How Programs Solve Problems Two Approaches:
1 Advanced Data Structures. 2 Topics Data structures (old) stack, list, array, BST (new) Trees, heaps, union-find, hash tables, spatial, string Algorithm.
Review for Test 2 i206 Fall 2010 John Chuang. 2 Topics  Operating System and Memory Hierarchy  Algorithm analysis and Big-O Notation  Data structures.
Tirgul 8 Universal Hashing Remarks on Programming Exercise 1 Solution to question 2 in theoretical homework 2.
Sorting Algorithms and Analysis Robert Duncan. Refresher on Big-O  O(2^N)Exponential  O(N^2)Quadratic  O(N log N)Linear/Log  O(N)Linear  O(log N)Log.
CPU PROFILING FIND THE BOTTLENECK. WHAT? WHEN? HOW?
Pleasures and Pitfalls of Profiling Primož Gabrijelčič.
Data Structures Introduction Phil Tayco Slide version 1.0 Jan 26, 2015.
Multi-core Programming Thread Profiler. 2 Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads Topics Look at Intel® Thread Profiler features.
Building Multithreaded Solutions with OmniThreadLibrary
Dynamic Memory Allocation Questions answered in this lecture: When is a stack appropriate? When is a heap? What are best-fit, first-fit, worst-fit, and.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
View-Oriented Parallel Programming for multi-core systems Dr Zhiyi Huang World 45 Univ of Otago.
Database Management 9. course. Execution of queries.
Chapter 12 Recursion, Complexity, and Searching and Sorting
CIS250 OPERATING SYSTEMS Memory Management Since we share memory, we need to manage it Memory manager only sees the address A program counter value indicates.
SIMPLE PARALLEL PROGRAMMING WITH PATTERNS AND OMNITHREADLIBRARY PRIMOŽ GABRIJELČIČ SKYPE: GABR42
1 CS 430 Database Theory Winter 2005 Lecture 16: Inside a DBMS.
1 5. Abstract Data Structures & Algorithms 5.6 Algorithm Evaluation.
CSCI1600: Embedded and Real Time Software Lecture 33: Worst Case Execution Time Steven Reiss, Fall 2015.
Simplify Parallel Programming with Patterns Primož Gabrijelčič R&D Manager, FAB.
CS510 Concurrent Systems Jonathan Walpole. RCU Usage in Linux.
Searching Topics Sequential Search Binary Search.
Where Testing Fails …. Problem Areas Stack Overflow Race Conditions Deadlock Timing Reentrancy.
CMPT 120 Topic: Searching – Part 2 and Intro to Time Complexity (Algorithm Analysis)
1 5. Abstract Data Structures & Algorithms 5.6 Algorithm Evaluation.
Segments Introduction: slides minutes
Jim Fawcett CSE 691 – Software Modeling and Analysis Fall 2000
Chapter 3: Windows7 Part 5.
Faster parallel programs with improved FastMM
Advanced Topics in Concurrency and Reactive Programming: Asynchronous Programming Majeed Kassis.
Jim Fawcett CSE681 – Software Modeling & Analysis Fall 2002
Informatica PowerCenter Performance Tuning Tips
Parallel Programming By J. H. Wang May 2, 2017.
Async or Parallel? No they aren’t the same thing!
Sorting by Tammy Bailey
Filesystems.
Jim Fawcett CSE681 – Software Modeling and Analysis Fall 2005
Database Performance Tuning and Query Optimization
Parallel Sorting Algorithms
Main Memory Management
Chapter 3: Windows7 Part 5.
CSCI1600: Embedded and Real Time Software
Data Structures & Algorithms Priority Queues & HeapSort
How I Learned to Love FastMM Internals
Parallel Programming Done Right with OTL and PPL
Welcome to CIS 068 ! Lesson 9: Sorting CIS 068.
Outline Module 1 and 2 dealt with processes, scheduling and synchronization Next two modules will deal with memory and storage Processes require data to.
Introduction to Testing and Maintainable code
Writing High Performance Delphi Applications
Parallel Sorting Algorithms
User-level Distributed Shared Memory
Introduction to Operating Systems
Creating Computer Programs
Chapter 11 Database Performance Tuning and Query Optimization
Analysis of Algorithms
Data Structures Introduction
Chapter 10-1: Dynamic Memory Allocation
CSE 589 Applied Algorithms Spring 1999
Creating Computer Programs
CSCI1600: Embedded and Real Time Software
Dynamic Binary Translators and Instrumenters
CSE 542: Operating Systems
Presentation transcript:

Writing High Performance Delphi Application Primož Gabrijelčič

About me Primož Gabrijelčič http://primoz.gabrijelcic.org programmer, MVP, writer, blogger, consultant, speaker Blog http://thedelphigeek.com Twitter @thedelphigeek Skype gabr42 LinkedIn gabr42 GitHub gabr42 SO gabr Google+ Primož Gabrijelčič

Performance

Performance What is performance? How do we “add it to the program”? There is no silver bullet!

What is performance? Running “fast enough” Raw speed Responsiveness Non-blocking

Improving performance Analyzing algorithms Measuring execution time Fixing algorithms Fine tuning the code Mastering memory manager Writing parallel code Importing libraries

Algorithm Complexity

Algorithm complexity Tells us how algorithm slows down if data size is increased by a factor of n O() O(n), O(n2), O(n log n) … Time and space complexity

O(1) accessing array elements O(log n) searching in ordered list O(n) linear search O(n log n) quick sort (average) O(n2) quick sort (worst), naive sort (bubblesort, insertion, selection) O(cn) recursive Fibonacci, travelling salesman

Comparing complexities Data size O(1) O(log n) O(n) O(n log n) O(n2) O(cn) 1 10 4 43 100 512 8 764 10.000 1029 300 9 2.769 90.000 1090

O(RTL) http://bigocheatsheet.com/ Lists Dictionary Trees access O(1) Search O(n) / O(n log n) Sort O(n log n) Dictionary * O(1) Search by value O(n) Unordered! Spring4D 1.3? Trees * O(log n) Spring4D 1.2.2 http://bigocheatsheet.com/

Measuring Performance

Measuring Manual Automated - Profilers GetTickCount QueryPerformanceCounter TStopwatch Automated - Profilers Sampling Instrumenting Binary Source

Free profilers AsmProfiler Sampling Profiler Instrumenting and sampling 32-bit https://github.com/andremussche/asmprofiler Sampling Profiler Sampling https://www.delphitools.info

Commercial profilers AQTime Nexus Quality Suite ProDelphi Instrumenting and sampling 32- and 64-bit https://smartbear.com/ Nexus Quality Suite Instrumenting https://www.nexusdb.com ProDelphi Instrumenting (source) http://www.prodelphi.de/

Fixing the Algorithm

Fixing the algorithm Find a better algorithm If a part of program is slow, don’t execute it so much If a part of program is slow, don’t execute it at all

“Don’t execute it so much” Don’t update UI thousands of times per second Call BeginUpdate/EndUpdate Don’t send around millions of messages per second

“Don’t execute it at all” UI virtualization Virtual listbox Virtual TreeView Memoization Caching Dynamic programming TGpCache<K,V> O(1) all operations GpLists.pas, https://github.com/gabr42/GpDelphiUnits/

Fine Tuning the Code

Compiler settings

Behind the scenes Strings Arrays Records Classes Interfaces Reference counted, Copy on write Arrays Static Dynamic Reference counted, Cloned Records Initialized if managed Classes Reference counted on ARC Interfaces Reference counted

Calling methods Parameter passing Inlining Dynamic arrays are strange Single pass compiler!

Mastering Memory Manager

58 memory managers in one! Image source: ‘Delphi High Performance’ © 2018 Packt Publishing

Optimizations Reallocation Allocator locking Small blocks: New size = at least 2x old size Medium blocks: New size = at least 1.25x old size Allocator locking Small block only Will try 2 ‘larger’ allocators Problem: Freeing memory allocIdx := find best allocator for the memory block repeat if can lock allocIdx then break; Inc(allocIdx); Dec(allocIdx, 2) until false

Optimizing parallel allocations FastMM4 from GitHub https://github.com/pleriche/FastMM4 DEFINE LogLockContention DEFINE UseReleaseStack

Alternatives ScaleMM TBBMalloc https://github.com/andremussche/scalemm https://www.threadingbuildingblocks.org https://sites.google.com/site/aminer68/intel-tbbmalloc-interfaces-for-delphi-and-delphi-xe- versions-and-freepascal http://tiny.cc/tbbmalloc

Writing Parallel Code

When to parallelize? When other means are exhausted Pushing long operations into background “Unblock” the user interface Supporting multiple clients in parallel Speeding up the algorithm Hard!

Common problems Accessing UI from a background thread

“Never access UI from a background thread!”

“Never access UI from a background thread!”

Common problems Accessing UI from a background thread Reading/writing shared data Structured data Simple data

Synchronization Critical section Spinlock Monitor Readers-writers / MREW / SWMR TMREWSync / TMultiReadExclusiveWriteSynchronizer Terribly slow Slim Reader/Writer (SRW) Windows only http://tiny.cc/winsrw

Synchronization problems Slowdown Deadlocks

Interlocked operations “Microlocking” Faster Limited use

Communication

Communication Windows messages TThread.Queue polling TThread.Synchronize too, but … polling

Parallel Patterns

Patterns Async/Await Join Future Parallel For Pipeline Map Timed task Parallel task Background worker Fork/Join

Async / Await

Future

Parallel For

Pipeline

High Performance in a Nutshell

Steps to faster code Understand the problem. What do you want to achieve? Find the problematic code. Measure! If possible, find a better algorithm. If that fails, fine tune the code. As a last resort, parallelize the solution. Danger, Will Robinson! Find/write faster code in a different language and link it into the application.

It’s Question Time!