Software System Performance CS 560 Lecture 11
Performance Engineering Motivation The practice of applying SWe principles in order to assure the best performance for a product. We want to know at each stage of development the performance attributes of the product being built.
Performance Engineering Performance engineering helps with: Increasing revenue by ensuring optimum system performance. Increasing efficiency of using system resources. Improving resource availability. Reduction of maintenance costs. Avoiding product failure that requires scrapping the system and writing off development efforts.
When Performance Matters Real time systems Computation must be fast enough to support the service provided. EX: Internet routers examining packet headers Real time performance measurements Context switching Time required to save the current process/thread and find the highest priority ready process/thread and restore its context Interrupt Latency Time Amount of time interrupts are disabled Immediate Response Time Time required to process a request immediately without context switching
When Performance Matters Very large computations Processing time may be measured in days. Calculation of weather forecasts must be fast enough for the forecasts to be useful. Simulations Performance considerations Inspect software algorithms for efficiency Consider data pipeline Can some processes be parallelized? Split input data into chunks MapReduce
When Performance Matters User interfaces Where humans have high expectations. Mouse tracking must appear instantaneous. Performance considerations Start-up time factors for applications GPU initialization, Storage initialization, GUI process load time, GUI rendering time Web interfaces must process requests in near real time Send requests through a job scheduler Use client-side and server-side programs to increase performance Client-side languages such as JavaScript make use of Browser processing capabilities
High-performance Computing High-performance computing issues: Large data collections (Ex: 173 trillion GB by 2020, 2.5 billion GB generated every day) How to access data in a meaningful way? How to store data with high availability/reliability? Huge numbers of Servers (Ex: Amazon) ~2m How to connect users with specific resources? Large computations (Ex: 10.6m cores distributed across a large cluster operating at 93 petaflops) How to reduce computation time so that the data is meaningful?
HPC vs. HTC High Performance Computing High Throughput Computing Tightly coupled nodes communicating with low latency Systems enable users to run a single instance of parallel software over many nodes www.top500.org Most powerful (known) HPC System: Sunway TahuLight at 93 PFLOPs (93 x 1015 floating point operations per second) High Throughput Computing Use of many heterogeneous computing resources over long periods of time to complete a computational task. Many times HTC systems are created with unreliable components. Schedulers give work to nodes in the HTC system, and eventually the work is completed.
Performance challenges for all software systems (Documentation) Predict performance problems before a system is created. Hardware bottlenecks? Research/Document Raspberry Pi hardware component performance. (CPU, memory type/interface, SD card type/interface, network interface) Software bottlenecks? Research/Document performance for software languages/frameworks used. Design and build a system that is not vulnerable to performance problems. Is this possible? Identify performance issues and fix problems after each software component is integrated.
Interactions between hardware and software Examples In a distributed system (or Pi with multiple Docker Containers), what messages are passed between nodes? Admin messages (poll for resources, process synchronization, etc.) User generated network traffic How many times must the system read from disk for a single transaction? Is the disk located on the same server as the process needing the data? What buffering and caching is used? Are operations in parallel or sequential? Are other systems competing for a shared resource (CPU, disk)? If so, how are resources shared? Shared fairly? How does the operating system schedule tasks?
Look for bottlenecks Usually, CPU performance is not the limiting factor. Hardware bottlenecks Reading data from disk Shortage of memory (including paging) Moving data from memory to CPU Network capacity (bandwidth) Inefficient software Poorly written algorithms that do not scale well Sequential processing where a parallel approach should be used.
Look for bottlenecks CPU performance is a limiting constraint in certain domains: large data analysis (Ex: searching) mathematical computation (Ex: simulations) compression and encryption Multimedia rendering Image processing
Look for bottlenecks: utilization Utilization is the proportion of the capacity of a service that is used on average. Utilization = avg. service time for a transaction/avg. arrival time of transactions Example: Linux system load, 2 CPU cores. uptime: 19:38:11 up 26 days, 2 users, load average: 1.73, 1.8, 2.05 During the last minute, the CPU was idling 13.5% of the time During the last 5 minutes, the CPU was idling 10% of the time During the last 15 minutes, the CPU was overloaded 2.5% of the time
Predicting system performance Direct measurement on subsystem (benchmark) File I/O CPU Network Simulation All require detailed understanding of the interaction between software and hardware systems.
Direct measurement on subsystem – SysBench (Documentation) Sysbench is a modular, cross-platform, multi-threaded benchmark tool used to quickly gain information about system performance. On the base OS and in the Docker Container: sudo apt-get install sysbench Test (on the base OS, and in the Docker Container) and create supplemental documentation for: File I/O performance CPU performance Database service performance Execute each test at least three times and take average.
Direct measurement on subsystem – SysBench (Documentation) Test the following (on the base OS, and in the Docker Container): File I/O performance sysbench --test=fileio --file-total-size=X prepare sysbench --test=fileio --file-total-size=X --file-test-mode=rndrw -- init-rng=on --max-time=300 --max-requests=0 run Where X={128MB, 256MB, 512MB, 1GB, 2GB, 4GB, 8GB} Graph (line graph) the results (the Kb/sec value for each test case) for the base OS and the Docker Container. Generate documentation for the graph, why it curves, etc. Execute the cleanup command after each test run: sysbench --test=fileio --file-total-size=X cleanup
Direct measurement on subsystem – SysBench (Documentation) Test the following (on the base OS, and in the Docker Container): CPU performance sysbench –num-threads=X –test=cpu –cpu-max-prime=15000 run Where X={1, 2, 4, 8, 16, 32} Graph (line graph) the results (the total time values) for the base OS and the Docker Container. Generate documentation for the graph, why it curves, etc.
Direct measurement on subsystem – SysBench (Documentation) Test the following (in the environment where your database is located): Database Service Performance sysbench --test=oltp --oltp-table-size=100000 --mysql-db=test --mysql- user=root --mysql-password=yourrootsqlpassword prepare sysbench --test=oltp --oltp-table-size=100000 --mysql-db=test --mysql- user=root --mysql-password=yourrootsqlpassword --max-time=60 --oltp- read-only=on --max-requests=0 --num-threads=X run Where X={1, 2, 4, 8, 16, 32} Graph (line graph) the results (the transactions per second value) Generate documentation for the graph, why it curves, etc. Execute the cleanup command after each test run: sysbench --test=oltp --mysql-db=test --mysql-user=root --mysql- password=yourrootsqlpassword cleanup
simulations Build a computer program that models the system as set of states and events advance simulated time determine which events occurred update state and event list repeat Discrete time simulation: Time is advanced in fixed steps (Ex: 1 ms)
Measurements on Software modules (documentation) Creating timers to measure function performance: DatabaseCallMethod(String[] d){ for 1 to String.length //format values in String[] //connect and update database with String data }//end method
Measurements on Software modules (documentation) Creating timers to measure function performance: DatabaseCallMethod(String[] d){ for 1 to String.length //format values in String[] //connect and update database with String data }//end method DatabaseCallMethod(data);
Measurements on Software modules (documentation) Creating timers to measure function performance: DatabaseCallMethod(String[] d){ for 1 to String.length //format values in String[] //connect and update database with String data }//end method start = getSystemTime(); DatabaseCallMethod(data); end = getSystemTime(); MethodRunTime = end – start;
Fixing bad performance If a system performs badly, begin by identifying the cause: Instrumentation: Add timers to the code. This will reveal delays in a specific part of the system. Test loads: Run the system with varying loads (high transaction rates, large input files, many users, etc.). This may reveal the characteristics of when the system runs badly. Design and code reviews: Team review of system design, program design, and suspect sections of code. This may reveal an algorithm that is running very slowly, e.g., a sort, locking procedure, etc. Find the underlying cause and fix it or the problem will return!
Predicting performance change: moore’s law Original version: The density of transistors in an integrated circuit will double every year. Gordon Moore, Intel, 1965 Intel 4004 (one core): 1971 2,300 transistors Intel Xeon Broadwell-E5 (22 core): 2016 7,200,000,000 transistors Current version: Performance of silicon chips doubles every 24 months.
Moore’s law: rules of thumb Silicon chips: cost/performance improves 30% / year in 12 years = 20x in 24 years = 500x Storage media: cost/performance improves 40% / year in 12 years = 50x in 24 years = 3,000x These assumptions are conservative. During some periods, the increases have been considerably greater. Note: Recently, the rate of performance increase in individual components, such as CPUs, has slowed down, but the overall rate of increase has been maintained by placing many CPU cores on a single chip.
False assumptions from the past Be careful about the assumptions that you make Here are some past assump1ons that caused problems: Unix file system will never exceed 2 GB. AppleTalk networks will never have more than 256 hosts (8 bits). GPS software will not last more than 1024 weeks. Two bits are sufficient to represent a year (Y2K bug). Etc., etc., .....
Software Performance Requirements (documentation) Performance specifications should, at minimum, cover the following concepts: In detail, what is the performance test scope? What subsystems, interfaces, components, etc. are in and out of scope of the specific performance test? For the user interfaces, how many concurrent users are expected for each? Specify peak vs. normal usage What does the target system (hardware) look like? Specify all hardware configurations
Software Performance Requirements (documentation) Performance specifications should, at minimum, cover the following concepts: What is the Application Workload Mix for each component? Ex: 20% login, 40% search, 30% item select, 10% checkout What is the System Workload Mix? What processes are running? Where are they running if the system is distributed What are the time requirements for any/all processes? Specify peak vs. normal
Software Performance Requirements (documentation) The performance requirements should include test cases to demonstrate performance of: Individual components Interfaces Component User Groups of components And the system as whole
Software Performance Requirements (documentation) Tasks needed for performance documentation: Gathering performance requirements Decide whether to use internal or external resources to perform tests. Ex: Sysbench (internal) requires resources to run Ex: Web browser (external) requires no resources to run Developing a plan to test for performance Specify testing tools Specify test data Develop proof of concept test scripts for each component/interface
Software Performance Requirements (documentation) Tasks needed for performance documentation: Configure the testing environment Initial test run – to check correctness of testing script Execute tests (repeatedly, to get average) Record and analyze the results (Pass/Fail) Investigate corrective action if test fails Re-tune components/interfaces/system and repeat process