1Causal Performance Models Causal Models for Performance Analysis of Computer Systems Jan Lemeire TELE lab May 24 th 2006
Pag. 2 Causal Performance Models Philosophy Statistics/Causality Machine Learning Performance Modeling
Pag. 3 Causal Performance Models What can be learnt about the world from observations? We have to look for regularities & model them
Pag. 4 Causal Performance Models MDL-approach to Learning Occam’s Razor “Among equivalent models choose the simplest one.” Minimum Description Length (MDL) “Select model that describes data with minimal #bits.” model = shortest program that outputs data length of program = Kolmogorov Complexity Learning = finding regularities = compression
Pag. 5 Causal Performance Models Randomness vs. Regularity random string=incompressible=maximal information regularity of repetition allows compression Separation by the Two-part code
Pag. 6 Causal Performance Models Ex.: Numberplate Recognition Noise fiercely hinders recognition algorithms Two-part code: + Shortest program? ‘MWV735’ + letter style drop size variance + drop frequency + random information Separation!
Pag. 7 Causal Performance Models Conclusions Part I Extensions to Shannon (information content of a message): Algorithmic Information Theory & Kolmogorov Complexity Fundamental! But not practical… No algorithm can exist that outputs the shortest program and Kolmogorov Complexity of an object.
Pag. 8 Causal Performance Models II Model of Multivariate Systems Variables Probabilistic model of joint distribution with minimal description length? Experimental data
Pag. 9 Causal Performance Models 1 variable Average code length = Shannon entropy of P(x) Multiple variables With help of other, P(x i |x 1 …x i-1 ) (CPD) Factorization Mutual information decreases entropy of variable
Pag. 10 Causal Performance Models Conditional Independence Two variables A and B are independent if: P(A|B)=P(A) Qualitative property: Quality of my speech is independent of chance of rain today P(rain|speech)=P(rain) ?
Pag. 11 Causal Performance Models A. Conditional independencies Reduction of factorization complexity Bayesian Network Minimal factorization = MDL B. Faithfulness Joint Distribution Directed Acyclic Graph Conditional independencies d-separation Theorem: if faithful graph exists, it is the minimal factorization.
Pag. 12 Causal Performance Models C. Causal Interpretation Definition through interventions, otherwise only correlation V-structure <> Markov Chain Motivation: Causal models describe all relational regularities in a canonical form
Pag. 13 Causal Performance Models Reductionism Causality = reductionism Building block = P(X i |parents i ) Unique, minimal, independent Whole theory based on it, like asymmetry of causality Intervention = change of block
Pag. 14 Causal Performance Models But… Engineers use causal models all the time!
Pag. 15 Causal Performance Models Incompressible (random distribution) Causal model is MDL of joint distribution if Contribution 1: MDL interpretation of causal models
Pag. 16 Causal Performance Models Learning Algorithms Construct causal model from experimental data Directly related variables cannot become independent by conditioning on other variables Undirected graph V-structures determine orientation Directed graph
Pag. 17 Causal Performance Models Part III: When do causal models become incorrect? By other regularities!
Pag. 18 Causal Performance Models A. Lower-level regularities Compression of the distributions
Pag. 19 Causal Performance Models B. Better description form Pattern in figure Causal model? Other models are better Why? Graph is compressible & blocks (CPDs) are related
Pag. 20 Causal Performance Models C. Interfere with independencies X and Y independent by cancellation of X → U → Y and X → V → Y dependency of both paths = regularity
Pag. 21 Causal Performance Models Deterministic relations Y=f(X 1, X 2 ) Y becomes unexpectedly independent from Z conditioned on X 1 and X 2 Solution: augmented model - add regularity to model - adapt inference algorithms Learning algorithm: variables possibly contain equivalent information Choose simplest relation
Pag. 22 Causal Performance Models Moral Occam’s Razor works Describe all regularities Contribution 2: Faithful representation of deterministic relations
Pag. 23 Causal Performance Models Part IV: Performance Analysis High-Performance computing 1 processor parallel system Performance Questions: Performance prediction System-dependency? Parameter-dependency? Reasons of bad performance? Effect of Optimizations?
Pag. 24 Causal Performance Models Causal models (cf. COMO lab) Representation form Close to reality Learning algorithms TETRAD tool
Pag. 25 Causal Performance Models No magic bullet!! Complexity of real data Mix of continuous and discrete variables Non-linear relations Deterministic relations Context-specific variables and relations Frederik Verbist Joris Borms
Pag. 26 Causal Performance Models Causal Performance Model Computation time of a quicksort algorithm Contribution 3: Formal definition of causal performance models
Pag. 27 Causal Performance Models Integrated in statistical analysis Statistical characteristics Regression analysis Iterative process 1.Perform additional experiments 2.Extract additional characteristics 3.Indicate exceptions 4.Analyze the divergences of the data points with the current hypotheses Contribution 4: Performance modeling tool (EPDA)
Pag. 28 Causal Performance Models Results so far 1. Learning of non-trivial models Iterative algorithm for solving differential equation in parallel (Aztec benchmark Library) Now: expert can input background knowledge
Pag. 29 Causal Performance Models 2. Point-to-point communications flight time = latency + message size/bandwidth ??
Pag. 30 Causal Performance Models 3. Explanations for outliers 4. Effects of optimizations …
Pag. 31 Causal Performance Models Conclusions Theoretical foundations for performance models Practical use: a lot of tuning integration, tests, extensions, … Occam’s Razor works Choice of simplest model models close to ‘reality’ but what is reality? Atomic description of regularities that we observe? Papers, references and demos: