Download presentation
Presentation is loading. Please wait.
Published byBerenice Fletcher Modified over 9 years ago
1
ALG0183 Algorithms & Data Structures Lecture 4 Experimental Algorithmics 8/25/20091 ALG0183 Algorithms & Data Structures by Dr Andy Brooks Case study article: “What Do We Learn from Experimental Algorithmics? “, Camil Demetrescu and Giuseppe F. Italiano, Lecture Notes in Computer Science, Volume 1893, pages 36-51, 2000 (ISBN978-3-540-67901-1).
2
Experimental Algorithmics Experiments are performed to evaluate the relative performance of two or more algorithms to find the best one to use in an application. Problems are taken from the real world. Problem generators may also be used to create artificial test data. – The simplest problem generator is a program which generates a list of random numbers. Experiments may also be performed to discover the benefits of parallel algorithms, to study the influence of a machine´s memory hierarchy (e.g. cache use), to find out where bottlenecks occur, to determine how much a heuristic helps, etc. 8/25/2009 ALG0183 Algorithms & Data Structures by Dr Andy Brooks 2 experiment/tilraun real world/raunheimur random number generator/slembitalnagjafi cache/skyndiminni heuristic/brjóstvitsaðferð
3
Why experiment? Many algorithms have been theoretically analysed for asymptotic worst-case behaviour. – “Big-Oh”. Many algorithms have not been theoretically analysed for their asymptotic bounds. In the absence of a theoretical analysis, we must experiment to determine the growth function. – There are, however, several other reasons for performing experiments. 8/25/2009 ALG0183 Algorithms & Data Structures by Dr Andy Brooks 3
4
Reasons for experiments: confirming theoretical analysis. “The best-fit curve is plotted with a cubic polynomial. As shown on the graph, the actual running-time reasonably follows the theoretical analysis of the complexity of the algorithm.” 8/25/2009 ALG0183 Algorithms & Data Structures by Dr Andy Brooks 4 http://www.ibluemojo.com/school/rna_folding.html RNA Folding with Nossinov-Jacobson Algorithm
5
Reasons for experiments: large constants in growth functions. A theoretical analysis might provide asymptotic behaviour, but the actual value of the constants in growth functions are unknown. – The constants in a growth function may be so large that no implementation will run to completion on a practical timescale. – An algorithm with poor asymptotic behaviour might outperform another algorithm over a very large range of input size because the other algorithm has large constants in its growth function. 8/25/2009 ALG0183 Algorithms & Data Structures by Dr Andy Brooks 5
6
Reasons for experiments: worse-case may not occur or occur rarely. Algorithms can behave better than worse-case in practical situations. Relying on worse-case bounds of the growth function can underestimate an algorithm´s practical utility. 8/25/2009 ALG0183 Algorithms & Data Structures by Dr Andy Brooks 6
7
Reasons for experiments: check implementations earlier rather than later. A theoretical analysis of the asymptotic behaviour of a very, very complex algorithm might be built upon the theoretical analysis of an earlier, very complex algorithm, which might be built upon the theoretical analysis of an earlier, complex algorithm. – Trying to understand and code several layers of previously unimplemented algorithms can be extremely difficult. 8/25/2009 ALG0183 Algorithms & Data Structures by Dr Andy Brooks 7
8
Reasons for experiments: helps establish the correctness of the code. Code is the most accurate representation of an algorithm. To help establish that the algorithm is correct, execute the code and compare expected with actual results over a range of input sizes and over a range of structure in the input. If buggy code is found for certain problem instances (certain combinations of size and structure), these problem instances make good test cases for later implementations. 8/25/2009 ALG0183 Algorithms & Data Structures by Dr Andy Brooks 8
9
http://www.sorting-algorithms.com/bubble-sort There are four different sizes of input available for testing: – 20, 30, 40, and 50. There are four different input structures available for testing: – random, nearly sorted, reversed, and few unique. 8/25/2009 ALG0183 Algorithms & Data Structures by Dr Andy Brooks 9
10
Reasons for experiments: exploit an ad-hoc heuristic or local hack in code. Sometimes coding an ad-hoc heuristic or local hack can dramatically improve performance. – Experimenting with the code is the only way to discover the scale of any improvement in performance. 8/25/2009 ALG0183 Algorithms & Data Structures by Dr Andy Brooks 10
11
Reasons for experiments: as a smoke test of reasonable behaviour. Large constants in growth functions might mean that running time is unacceptable for practical purposes. Experimental growth curves might be discontinuous, indicating that bottlenecks occur for small changes to a problem instance. Experimental growth curves might be discontinuous or unrepeatable indicating that the machine´s memory hierarchy and loading (e.g. cache use) or network environment play an important role in determining the problem instances that can be solved in practice. 8/25/2009 ALG0183 Algorithms & Data Structures by Dr Andy Brooks 11 Replication/Endurtekning
12
Generalisability of experimental results Unfortunately, experimental results may not be generalisable i.e. applicable to other contexts. There are many threats. 8/25/2009 ALG0183 Algorithms & Data Structures by Dr Andy Brooks 12
13
Threat to generalisability: machine factors. CPU speed affects program execution times. Data bus speed affects program execution times. Memory hierarchy (e.g. cache use) affects program execution times. 8/25/2009 ALG0183 Algorithms & Data Structures by Dr Andy Brooks 13 http://processorfinder.intel.com/
14
Threat to generalisability: compiler optimisation level. Which compiler was used and how much built-in code optimisation did it perform? Were any code optimisation options used at compile time? 8/25/2009 ALG0183 Algorithms & Data Structures by Dr Andy Brooks 14 optimisation: common subexpression elimination In the expression “(a+b)-(a+b)/4,” “(a+b)” need be calculated only once. optimisation: register allocation The most frequently used variables should be kept in processor registers. “Register allocation and spilling via graph coloring” by Gregory Chaitin, in SIGPLAN '82: Proceedings of the 1982 SIGPLAN symposium on Compiler Construction. Was any dead code removed by static analysis?
15
Threat to generalisability: time measurements. What is the granularity of the time measurement? – Is the measurement accurate to ± 1µsec or ±10 msec? What time is actually being measured and reported on? – clock time in the real world? – time used by the user´s process? Is the time to perform start-up and wind-down I/O included? – Loading several Megabytes of data into RAM might take different times on different machines. 8/25/2009 ALG0183 Algorithms & Data Structures by Dr Andy Brooks 15 measurement error/mælingarvilla
16
http://java.sun.com/j2se/1.5.0/docs/api/index.html http://java.sun.com/j2se/1.5.0/docs/api/index.html class System, method currentTimeMillis 8/25/2009 ALG0183 Algorithms & Data Structures by Dr Andy Brooks 16
17
Threat to generalisability: programming experience and knowledge. Programmers differ in programming experience and the knowledge they have of a particular language, its API, and ways to tune performance in that language. – http://www.javaperformancetuning.com/index.shtml Different programmers can implement an algorithm in different ways which may lead them to different conclusions about algorithm performance. – Are the different implementations really different algorithms? (Question for dicussion.) 8/25/2009 ALG0183 Algorithms & Data Structures by Dr Andy Brooks 17
18
Threat to generalisability: poor coverage of problem instances. A wide range of input size should be explored. – 100-1000 (steps of 100), 1000-10000 (steps of 1000), 10000-100000 (steps of 10000),... Several input structures should be explored. – For example, sorting algorithms should consider: random, nearly sorted, reversed, and few unique. Real-life as well as randomly generated problem instances should be explored. 8/25/2009 ALG0183 Algorithms & Data Structures by Dr Andy Brooks 18 Don´t forget the operating system.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.