Model Fitting Jean-Yves Le Boudec 0. Contents 1 Virus Infection Data We would like to capture the growth of infected hosts (explanatory model) An exponential.

Slides:



Advertisements
Similar presentations
Regression Eric Feigelson Lecture and R tutorial Arcetri Observatory April 2014.
Advertisements

3.3 Hypothesis Testing in Multiple Linear Regression
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Computational Statistics. Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement.
Lecture (11,12) Parameter Estimation of PDF and Fitting a Distribution Function.
Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
Week11 Parameter, Statistic and Random Samples A parameter is a number that describes the population. It is a fixed number, but in practice we do not know.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Ch11 Curve Fitting Dr. Deshi Ye
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 10: The Bayesian way to fit models Geoffrey Hinton.
1 Chapter 2 Simple Linear Regression Ray-Bing Chen Institute of Statistics National University of Kaohsiung.
Forecasting JY Le Boudec 1. Contents 1.What is forecasting ? 2.Linear Regression 3.Avoiding Overfitting 4.Differencing 5.ARMA models 6.Sparse ARMA models.
Tests Jean-Yves Le Boudec. Contents 1.The Neyman Pearson framework 2.Likelihood Ratio Tests 3.ANOVA 4.Asymptotic Results 5.Other Tests 1.
The Simple Linear Regression Model: Specification and Estimation
1 Summarizing Performance Data Confidence Intervals Important Easy to Difficult Warning: some mathematical content.
Chapter 10 Simple Regression.
Confidence intervals. Population mean Assumption: sample from normal distribution.
Probability Densities
Statistics & Modeling By Yan Gao. Terms of measured data Terms used in describing data –For example: “mean of a dataset” –An objectively measurable quantity.
Descriptive statistics Experiment  Data  Sample Statistics Experiment  Data  Sample Statistics Sample mean Sample mean Sample variance Sample variance.
Environmentally Conscious Design & Manufacturing (ME592) Date: May 5, 2000 Slide:1 Environmentally Conscious Design & Manufacturing Class 25: Probability.
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
2008 Chingchun 1 Bootstrap Chingchun Huang ( 黃敬群 ) Vision Lab, NCTU.
Chapter 11 Multiple Regression.
Tests Jean-Yves Le Boudec. Contents 1.The Neyman Pearson framework 2.Likelihood Ratio Tests 3.ANOVA 4.Asymptotic Results 5.Other Tests 1.
Statistics 200b. Chapter 5. Chapter 4: inference via likelihood now Chapter 5: applications to particular situations.
1 BA 555 Practical Business Analysis Review of Statistics Confidence Interval Estimation Hypothesis Testing Linear Regression Analysis Introduction Case.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
1 Simple Linear Regression 1. review of least squares procedure 2. inference for least squares lines.
Standard error of estimate & Confidence interval.
1 Summarizing Performance Data Confidence Intervals Important Easy to Difficult Warning: some mathematical content.
Statistical Methods For Engineers ChE 477 (UO Lab) Larry Baxter & Stan Harding Brigham Young University.
Simple Linear Regression Models
Stats for Engineers Lecture 9. Summary From Last Time Confidence Intervals for the mean t-tables Q Student t-distribution.
Traffic Modeling.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
1 Statistical Distribution Fitting Dr. Jason Merrick.
Linear Regression Andy Jacobson July 2006 Statistical Anecdotes: Do hospitals make you sick? Student’s story Etymology of “regression”
Chapter 11 Linear Regression Straight Lines, Least-Squares and More Chapter 11A Can you pick out the straight lines and find the least-square?
Statistical Methods II&III: Confidence Intervals ChE 477 (UO Lab) Lecture 5 Larry Baxter, William Hecker, & Ron Terry Brigham Young University.
1 Multiple Regression A single numerical response variable, Y. Multiple numerical explanatory variables, X 1, X 2,…, X k.
ELEC 303 – Random Signals Lecture 18 – Classical Statistical Inference, Dr. Farinaz Koushanfar ECE Dept., Rice University Nov 4, 2010.
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 13-1 Introduction to Regression Analysis Regression analysis is used.
1 Summarizing Performance Data Confidence Intervals Important Easy to Difficult Warning: some mathematical content.
Chapter 8: Simple Linear Regression Yang Zhenlin.
Review of Probability. Important Topics 1 Random Variables and Probability Distributions 2 Expected Values, Mean, and Variance 3 Two Random Variables.
Sampling and estimation Petter Mostad
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 11: Models Marshall University Genomics Core Facility.
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
Logistic regression. Recall the simple linear regression model: y =  0 +  1 x +  where we are trying to predict a continuous dependent variable y from.
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
Statistics Sampling Distributions and Point Estimation of Parameters Contents, figures, and exercises come from the textbook: Applied Statistics and Probability.
1 Review of Probability and Random Processes. 2 Importance of Random Processes Random variables and processes talk about quantities and signals which.
1 Ka-fu Wong University of Hong Kong A Brief Review of Probability, Statistics, and Regression for Forecasting.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
STA302/1001 week 11 Regression Models - Introduction In regression models, two types of variables that are studied:  A dependent variable, Y, also called.
Confidence Intervals Cont.
The simple linear regression model and parameter estimation
Chapter 4 Basic Estimation Techniques
Inference for Least Squares Lines
Statistical Data Analysis - Lecture /04/03
Probability Theory and Parameter Estimation I
STAT120C: Final Review.
Statistical Methods For Engineers
6-1 Introduction To Empirical Models
Modelling data and curve fitting
Warmup To check the accuracy of a scale, a weight is weighed repeatedly. The scale readings are normally distributed with a standard deviation of
Simple Linear Regression
Presentation transcript:

Model Fitting Jean-Yves Le Boudec 0

Contents 1

Virus Infection Data We would like to capture the growth of infected hosts (explanatory model) An exponential model seems appropriate How can we fit the model, in particular, what is the value of  ? 2

Least Square Fit of Virus Infection Data 3 Least square fit  = Mean doubling time 1.34 hours Prediction at +6 hours: hosts

Least Square Fit of Virus Infection Data In Log Scale 4 Least square fit  = 0.39 Mean doubling time 1.77 hours Prediction at +6 hours: hosts

Compare the Two 5 LS fit in natural scale LS fit in log scale

Which Fitting Method should I use ? Which optimization criterion should I use ? The answer is in a statistical model. Model not only the interesting part, but also the noise For example 6  =

How can I tell which is correct ? 7  = 0.39

Look at Residuals = validate model 8

9

Least Square Fit = Gaussian iid Noise Assume model (homoscedasticity) The theorem says: minimize least squares = compute MLE for this model This is how we computed the estimates for the virus example 10

Least Square and Projection Skrivañ war an daol petra zo: data point, predicted response and estimated parameter for virus example 11 Data point Predicted response Estimated parameter Manifold Where the data point would lie if there would be no noise

Confidence Intervals 12

13

Robustness to « Outliers » 14

A Simple Example Least Square L1 Norm Minimization 15

Mean Versus Median 16

2. Linear Regression Also called « ANOVA » (Analysis of Variance ») = least square + linear dependence on parameter A special case where computations are easy 17

Example 4.3 What is the parameter  ? Is it a linear model ? How many degrees of freedom ? What do we assume on  i ? What is the matrix X ? 18

19

Does this model have full rank ? 20

Some Terminology x i are called explanatory variable Assumed fixed and known y i are called response variables They are « the data » Assumed to be one sample output of the model 21

Least Square and Projection 22 Data point Predicted response Estimated parameter Manifold Where the data point would lie if there would be no noise

Solution of the Linear Regression Model 23

Least Square and Projection The theorem gives H and K 24 residuals Predicted response Estimated parameter Manifold Where the data point would lie if there would be no noise data

The Theorem Gives  with Confidence Interval 25

SSR Confidence Intervals use the quantity s s 2 is called « Sum of Squared Residuals » 26 residuals Predicted response data

Validate the Assumptions with Residuals 27

Residuals Residuals are given by the theorem 28 residuals Predicted response data

Standardized Residuals The residuals e i are an estimate of the noise terms  i They are not (exactly) normal iid The variance of e i is ???? A: 1- H i,i Standardized residuals are not exactly normal iid either but their variance is 1 29

Which of these two models could be a linear regression model ? A: both Linear regression does not mean that y i is a linear function of x i Achtung: There is a hidden assumption Noise is iid gaussian -> homoscedasticity 30

31

3. Linear Regression with L1 norm minimization = L1 norm minimization + linear dependency on parameter More robust Less traditional 32

This is convex programming 33

34

Confidence Intervals No closed form Compare to median ! Boostrap: How ? 35

36

4. Choosing a Distribution Know a catalog of distributions, guess a fit Shape Kurtosis, Skewness Power laws Hazard Rate Fit Verify the fit visually or with a test (see later) 37

Distribution Shape Distributions have a shape By definition: the shape is what remains the same when we Shift Rescale Example: normal distribution: what is the shape parameter ? Example: exponential distribution: what is the shape parameter ? 38

Standard Distributions In a given catalog of distributions, we give only the distributions with different shapes. For each shape, we pick one particular distribution, which we call standard. Standard normal: N(0,1) Standard exponential: Exp(1) Standard Uniform: U(0,1) 39

Log-Normal Distribution 40

41

Skewness and Curtosis 42

Power Laws and Pareto Distribution 43

Complementary Distribution Functions Log-log Scales 44 Pareto LognormalNormal

Zipf’s Law 45

46

Hazard Rate Interpretation: probability that a flow dies in next dt seconds given still alive Used to classify distribs Aging Memoriless Fat tail Ex: normal ? Exponential ? Pareto ? Log Normal ? 47

The Weibull Distribution Standard Weibull CDF: Aging for c > 1 Memoriless for c = 1 Fat tailed for c <1 48

Fitting A Distribution Assume iid Use maximum likelihood Ex: assume gaussian; what are parameters ? Frequent issues Censoring Combinations 49

Censored Data We want to fit a log normal distrib, but we have only data samples with values less than some max Lognormal is fat tailed so we cannot ignore the tail Idea: use the model and estimate F0 and a (truncation threshold) 50

51

Combinations We want to fit a log normal distrib to the body and pareto to the tail Model: MLE satisfies 52

53

5. Heavy Tails Recall what fat tail is Heavier than fat: 54

Heavy Tail means Central Limit does not hold Central limit theorem: a sum of n independent random variables with finite second moment tends to have a normal distribution, when n is large explains why we can often use normal assumption But it does not always hold. It does not hold if random variables have infinite second moment. 55

Central Limit Theorem for Heavy Tails 56 One Sample of points Pareto p = 1 normal qqplothistogramcomplementary d.f. log-log

57 1 sample, pointsaverage of 1000 samples p=1 p=1.5 p=2 p=2.5 p=3

Convergence for heavy tailed distributions 58

Importance of Second Moment 59

RWP with Heavy Tail Stationary ? 60

Evidence of Heavy Tail 61

Testing Heavy Tail Assume you have very large data set Else no statement can be made One can look at empirical cdf in log scale 62

Taqqu’s method A better method (numerically safer is as follows). Aggregate data multiple times 63

We should have and If  ≈ log ( m 2 / m 1 ) then measure p =  /  p est = average of all p’s 64

65 Example log ( 2) / p log ( 2)

Evidence of Heavy Tail 66 p = 1.08 ± 0.1

A Load Generator: Surge Designed to create load for a web server Used in next lab Sophisticated load model It is an example of a benchmark, there are many others – see lecture 67

User Equivalent Model Idea: find a stochastice model that represents user well User modelled as sequence of downloads, followed by “think time” Tool can implement several “user equivalents” Used to generate real work over TCP connections 68

Characterization of UE 69 Weibull dsitributions

Successive file requests are not independent Q: What would be the distribution if they were independent ? A: geometric 70

Fitting the distributions Done by Surge authors with aest tool + ad-hoc (least quare fit of histogram) What other method could one use ? A: maximum likelihood with numerical optimization – issue is non iid-ness 71