1Causal Performance Models Causal Models for Performance Analysis of Computer Systems Jan Lemeire TELE lab May 24 th 2006.

Slides:

Advertisements

Similar presentations

Learning Causal Models of Multivariate Systems and the Value of it for the Performance Modeling of Computer Programs Jan Lemeire December 19 th 2007 Supervisor:

Advertisements

When are Graphical Causal Models not Good Models? CAPITS 2008 Jan Lemeire September 12 th 2008.

CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu Lecture 27 – Overview of probability concepts 1.

Feature Selection as Relevant Information Encoding Naftali Tishby School of Computer Science and Engineering The Hebrew University, Jerusalem, Israel NIPS.

Bayesian Network and Influence Diagram A Guide to Construction And Analysis.

Pattern Recognition and Machine Learning

Chapter 6 The Structural Risk Minimization Principle Junping Zhang Intelligent Information Processing Laboratory, Fudan University.

Dynamic Bayesian Networks (DBNs)

EE462 MLCV Lecture Introduction of Graphical Models Markov Random Fields Segmentation Tae-Kyun Kim 1.

Chapter 8-3 Markov Random Fields 1. Topics 1. Introduction 1. Undirected Graphical Models 2. Terminology 2. Conditional Independence 3. Factorization.

Chapter 6 Information Theory

Bayesian Learning Rong Jin. Outline MAP learning vs. ML learning Minimum description length principle Bayes optimal classifier Bagging.

1Causality & MDL Causal Models as Minimal Descriptions of Multivariate Systems Jan Lemeire June 15 th 2006.

Bayesian Network Representation Continued

Learning From Data Chichang Jou Tamkang University.

Machine Learning CMPT 726 Simon Fraser University

Bayesian Learning Rong Jin.

Today Logistic Regression Decision Trees Redux Graphical Models

Computer vision: models, learning and inference Chapter 10 Graphical Models.

Causal Models, Learning Algorithms and their Application to Performance Modeling Jan Lemeire Parallel Systems lab November 15 th 2006.

Causal Modeling for Anomaly Detection Andrew Arnold Machine Learning Department, Carnegie Mellon University Summer Project with Naoki Abe Predictive Modeling.

CORRELATIO NAL RESEARCH METHOD. The researcher wanted to determine if there is a significant relationship between the nursing personnel characteristics.

Basic Concepts in Information Theory

1 Part 2 Automatically Identifying and Measuring Latent Variables for Causal Theorizing.

©2003/04 Alessandro Bogliolo Background Information theory Probability theory Algorithms.

Machine Learning CUNY Graduate Center Lecture 21: Graphical Models.

2. Mathematical Foundations

Bayes’ Nets  A Bayes’ net is an efficient encoding of a probabilistic model of a domain  Questions we can ask:  Inference: given a fixed BN, what is.

A Brief Introduction to Graphical Models

沈致远. Test error(generalization error): the expected prediction error over an independent test sample Training error: the average loss over the training.

Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Hebrew University.

Soft Computing Lecture 17 Introduction to probabilistic reasoning. Bayesian nets. Markov models.

On Data Mining, Compression, and Kolmogorov Complexity. C. Faloutsos and V. Megalooikonomou Data Mining and Knowledge Discovery, 2007.

Bayesian Learning By Porchelvi Vijayakumar. Cognitive Science Current Problem: How do children learn and how do they get it right?

Statistics and Quantitative Analysis U4320 Segment 8 Prof. Sharyn O’Halloran.

Learning Lateral Connections between Hidden Units Geoffrey Hinton University of Toronto in collaboration with Kejie Bao University of Toronto.

COMMUNICATION NETWORK. NOISE CHARACTERISTICS OF A CHANNEL 1.

Quantifying Knowledge Fouad Chedid Department of Computer Science Notre Dame University Lebanon.

Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory,

Coding Theory Efficient and Reliable Transfer of Information

Announcements Project 4: Ghostbusters Homework 7

INTERVENTIONS AND INFERENCE / REASONING. Causal models  Recall from yesterday:  Represent relevance using graphs  Causal relevance ⇒ DAGs  Quantitative.

Mathematical Foundations Elementary Probability Theory Essential Information Theory Updated 11/11/2005.

Approximate Inference: Decomposition Methods with Applications to Computer Vision Kyomin Jung ( KAIST ) Joint work with Pushmeet Kohli (Microsoft Research)

Dependency Networks for Collaborative Filtering and Data Visualization UAI-2000 발표 : 황규백.

Lecture 2: Statistical learning primer for biologists

Bayesian networks and their application in circuit reliability estimation Erin Taylor.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Occam’s Razor No Free Lunch Theorem Minimum.

Machine Learning – Lecture 11

1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.

Pattern Recognition and Machine Learning

Stats Term Test 4 Solutions. c) d) An alternative solution is to use the probability mass function and.

Introduction on Graphic Models

Today Graphical Models Representing conditional dependence graphically

Information complexity - Presented to HCI group. School of Computer Science. University of Oklahoma.

Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.

SEAC-3 J.Teuhola Information-Theoretic Foundations Founder: Claude Shannon, 1940’s Gives bounds for:  Ultimate data compression  Ultimate transmission.

Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability Primer Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability.

SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.

Slide Slide 1 Chapter 10 Correlation and Regression 10-1 Overview 10-2 Correlation 10-3 Regression 10-4 Variation and Prediction Intervals 10-5 Multiple.

Chapter 12. Probability Reasoning Fall 2013 Comp3710 Artificial Intelligence Computing Science Thompson Rivers University.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Introduction to Information theory

Reasoning Under Uncertainty in Expert System

Markov Properties of Directed Acyclic Graphs

Center for Causal Discovery: Summer Short Course/Datathon

CAP 5636 – Advanced Artificial Intelligence

Chapter 14 February 26, 2004.

Presentation transcript:

1Causal Performance Models Causal Models for Performance Analysis of Computer Systems Jan Lemeire TELE lab May 24 th 2006

Pag. 2 Causal Performance Models Philosophy Statistics/Causality Machine Learning Performance Modeling

Pag. 3 Causal Performance Models What can be learnt about the world from observations? We have to look for regularities & model them

Pag. 4 Causal Performance Models MDL-approach to Learning Occam’s Razor “Among equivalent models choose the simplest one.” Minimum Description Length (MDL) “Select model that describes data with minimal #bits.” model = shortest program that outputs data length of program = Kolmogorov Complexity Learning = finding regularities = compression

Pag. 5 Causal Performance Models Randomness vs. Regularity random string=incompressible=maximal information regularity of repetition allows compression Separation by the Two-part code

Pag. 6 Causal Performance Models Ex.: Numberplate Recognition Noise fiercely hinders recognition algorithms Two-part code: + Shortest program? ‘MWV735’ + letter style drop size variance + drop frequency + random information Separation!

Pag. 7 Causal Performance Models Conclusions Part I Extensions to Shannon (information content of a message): Algorithmic Information Theory & Kolmogorov Complexity Fundamental! But not practical… No algorithm can exist that outputs the shortest program and Kolmogorov Complexity of an object.

Pag. 8 Causal Performance Models II Model of Multivariate Systems Variables Probabilistic model of joint distribution with minimal description length? Experimental data

Pag. 9 Causal Performance Models 1 variable Average code length = Shannon entropy of P(x) Multiple variables With help of other, P(x i |x 1 …x i-1 ) (CPD) Factorization Mutual information decreases entropy of variable

Pag. 10 Causal Performance Models Conditional Independence Two variables A and B are independent if: P(A|B)=P(A) Qualitative property: Quality of my speech is independent of chance of rain today P(rain|speech)=P(rain) ?

Pag. 11 Causal Performance Models A. Conditional independencies Reduction of factorization complexity Bayesian Network Minimal factorization = MDL B. Faithfulness Joint Distribution  Directed Acyclic Graph Conditional independencies  d-separation Theorem: if faithful graph exists, it is the minimal factorization.

Pag. 12 Causal Performance Models C. Causal Interpretation Definition through interventions, otherwise only correlation V-structure <> Markov Chain Motivation: Causal models describe all relational regularities in a canonical form

Pag. 13 Causal Performance Models Reductionism Causality = reductionism Building block = P(X i |parents i ) Unique, minimal, independent Whole theory based on it, like asymmetry of causality Intervention = change of block

Pag. 14 Causal Performance Models But… Engineers use causal models all the time!

Pag. 15 Causal Performance Models Incompressible (random distribution) Causal model is MDL of joint distribution if Contribution 1: MDL interpretation of causal models

Pag. 16 Causal Performance Models Learning Algorithms Construct causal model from experimental data Directly related variables cannot become independent by conditioning on other variables Undirected graph V-structures determine orientation Directed graph

Pag. 17 Causal Performance Models Part III: When do causal models become incorrect? By other regularities!

Pag. 18 Causal Performance Models A. Lower-level regularities Compression of the distributions

Pag. 19 Causal Performance Models B. Better description form Pattern in figure Causal model? Other models are better Why? Graph is compressible & blocks (CPDs) are related

Pag. 20 Causal Performance Models C. Interfere with independencies X and Y independent by cancellation of X → U → Y and X → V → Y dependency of both paths = regularity

Pag. 21 Causal Performance Models Deterministic relations Y=f(X 1, X 2 ) Y becomes unexpectedly independent from Z conditioned on X 1 and X 2 Solution: augmented model - add regularity to model - adapt inference algorithms Learning algorithm: variables possibly contain equivalent information Choose simplest relation

Pag. 22 Causal Performance Models Moral Occam’s Razor works Describe all regularities Contribution 2: Faithful representation of deterministic relations

Pag. 23 Causal Performance Models Part IV: Performance Analysis High-Performance computing 1 processor parallel system Performance Questions: Performance prediction System-dependency? Parameter-dependency? Reasons of bad performance? Effect of Optimizations?

Pag. 24 Causal Performance Models Causal models (cf. COMO lab) Representation form Close to reality Learning algorithms TETRAD tool

Pag. 25 Causal Performance Models No magic bullet!! Complexity of real data Mix of continuous and discrete variables Non-linear relations Deterministic relations Context-specific variables and relations Frederik Verbist Joris Borms

Pag. 26 Causal Performance Models Causal Performance Model Computation time of a quicksort algorithm Contribution 3: Formal definition of causal performance models

Pag. 27 Causal Performance Models Integrated in statistical analysis Statistical characteristics Regression analysis Iterative process 1.Perform additional experiments 2.Extract additional characteristics 3.Indicate exceptions 4.Analyze the divergences of the data points with the current hypotheses Contribution 4: Performance modeling tool (EPDA)

Pag. 28 Causal Performance Models Results so far 1. Learning of non-trivial models Iterative algorithm for solving differential equation in parallel (Aztec benchmark Library) Now: expert can input background knowledge

Pag. 29 Causal Performance Models 2. Point-to-point communications flight time = latency + message size/bandwidth ??

Pag. 30 Causal Performance Models 3. Explanations for outliers 4. Effects of optimizations …

Pag. 31 Causal Performance Models Conclusions Theoretical foundations for performance models Practical use: a lot of tuning integration, tests, extensions, … Occam’s Razor works Choice of simplest model models close to ‘reality’ but what is reality? Atomic description of regularities that we observe? Papers, references and demos: