CS433: Modeling and Simulation Dr. Anis Koubâa Al-Imam Mohammad bin Saud University 15 October 2010 Lecture 05: Statistical Analysis Tools.

Slides:



Advertisements
Similar presentations
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
Advertisements

Hypothesis: It is an assumption of population parameter ( mean, proportion, variance) There are two types of hypothesis : 1) Simple hypothesis :A statistical.
Outline input analysis input analyzer of ARENA parameter estimation
Statistics review of basic probability and statistics.
Correlation and regression Dr. Ghada Abo-Zaid
Correlation and Regression
Simple Linear Regression
Chapter 15 (Ch. 13 in 2nd Can.) Association Between Variables Measured at the Interval-Ratio Level: Bivariate Correlation and Regression.
Chapter 12 Linear Regression and Correlation
Chapter 7: Statistical Applications in Traffic Engineering
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
Chapter 10 Simple Regression.
Statistics II: An Overview of Statistics. Outline for Statistics II Lecture: SPSS Syntax – Some examples. Normal Distribution Curve. Sampling Distribution.
The Simple Regression Model
Final Review Session.
SIMPLE LINEAR REGRESSION
Introduction to Probability and Statistics Linear Regression and Correlation.
Aaker, Kumar, Day Seventh Edition Instructor’s Presentation Slides
SIMPLE LINEAR REGRESSION
BCOR 1020 Business Statistics
Correlation and Regression Analysis
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Linear Regression/Correlation
Correlation & Regression Math 137 Fresno State Burger.
Correlation & Regression
SIMPLE LINEAR REGRESSION
The Chi-Square Distribution 1. The student will be able to  Perform a Goodness of Fit hypothesis test  Perform a Test of Independence hypothesis test.
Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides
Introduction to Linear Regression and Correlation Analysis
Correlation Scatter Plots Correlation Coefficients Significance Test.
Correlation and Linear Regression
CENTRE FOR INNOVATION, RESEARCH AND COMPETENCE IN THE LEARNING ECONOMY Session 2: Basic techniques for innovation data analysis. Part I: Statistical inferences.
Regression Analysis (2)
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
Chapter 13 Statistics © 2008 Pearson Addison-Wesley. All rights reserved.
Ms. Khatijahhusna Abd Rani School of Electrical System Engineering Sem II 2014/2015.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
© 2008 Pearson Addison-Wesley. All rights reserved Chapter 1 Section 13-6 Regression and Correlation.
Introduction to Probability and Statistics Chapter 12 Linear Regression and Correlation.
BIOL 582 Lecture Set 17 Analysis of frequency and categorical data Part II: Goodness of Fit Tests for Continuous Frequency Distributions; Tests of Independence.
1 Statistical Distribution Fitting Dr. Jason Merrick.
Tests for Random Numbers Dr. Akram Ibrahim Aly Lecture (9)
Chi-squared Tests. We want to test the “goodness of fit” of a particular theoretical distribution to an observed distribution. The procedure is: 1. Set.
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 6 Probability Distributions Section 6.2 Probabilities for Bell-Shaped Distributions.
Introduction to Probability and Statistics Thirteenth Edition Chapter 12 Linear Regression and Correlation.
MEGN 537 – Probabilistic Biomechanics Ch.5 – Determining Distributions and Parameters from Observed Data Anthony J Petrella, PhD.
EMIS 7300 SYSTEMS ANALYSIS METHODS FALL 2005 Dr. John Lipp Copyright © Dr. John Lipp.
Chi- square test x 2. Chi Square test Symbolized by Greek x 2 pronounced “Ki square” A Test of STATISTICAL SIGNIFICANCE for TABLE data.
Learning Objectives Copyright © 2002 South-Western/Thomson Learning Statistical Testing of Differences CHAPTER fifteen.
GG 313 Lecture 9 Nonparametric Tests 9/22/05. If we cannot assume that our data are at least approximately normally distributed - because there are a.
Correlation & Regression Analysis
Review of Probability. Important Topics 1 Random Variables and Probability Distributions 2 Expected Values, Mean, and Variance 3 Two Random Variables.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 11 Analyzing the Association Between Categorical Variables Section 11.2 Testing Categorical.
1 Chi-square Test Dr. T. T. Kachwala. Using the Chi-Square Test 2 The following are the two Applications: 1. Chi square as a test of Independence 2.Chi.
MEGN 537 – Probabilistic Biomechanics Ch.5 – Determining Distributions and Parameters from Observed Data Anthony J Petrella, PhD.
REGRESSION AND CORRELATION SIMPLE LINEAR REGRESSION 10.2 SCATTER DIAGRAM 10.3 GRAPHICAL METHOD FOR DETERMINING REGRESSION 10.4 LEAST SQUARE METHOD.
Chapter 11: Categorical Data n Chi-square goodness of fit test allows us to examine a single distribution of a categorical variable in a population. n.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Chapter 11 Linear Regression and Correlation. Explanatory and Response Variables are Numeric Relationship between the mean of the response variable and.
Estimating standard error using bootstrap
Basic Estimation Techniques
Basic Estimation Techniques
Linear Regression/Correlation
Discrete Event Simulation - 5
Regression Lecture-5 Additional chapters of mathematics
SIMPLE LINEAR REGRESSION
Simple Linear Regression
SIMPLE LINEAR REGRESSION
Presentation transcript:

CS433: Modeling and Simulation Dr. Anis Koubâa Al-Imam Mohammad bin Saud University 15 October 2010 Lecture 05: Statistical Analysis Tools

Textbook Reading  Section 7.5  Goodness-of-Fit Tests for Distributions, page134  Chi-Square Test 134  Kolmogorov-Smirnov (K-S) Test 137  Section 3.6  Correlation, page 32

Goals of Today  Know how to compare between two distributions  Know how to evaluate the relationship between two random variable

Outline  Comparing Distributions: Tests for Goodness-of-Fit  Chi-Square Distribution (for discrete models: PMF)  Kolmogorov-Smirnov Test (for continuous models: CDF)  Evaluating the relationship  Linear Regression  Correlation

Goodness-of-fit  Statistical Tests enables to compare between two distributions, also known as Goodness-of-Fit.  The goodness-of-fit of a statistical model describes how well it fits a set of observations.  Measures of goodness of fit typically summarize the discrepancy between observed values and the values expected under the model in question  Goodness-of-fit means how well a statistical model fits a set of observations جودة المطابقة

Pearson’s  ²-Tests Chi-Square Tests for Discrete Models The Pearson's chi-square test enables to compare two probability mass functions of two distribution. If the difference value (Error) is greater than the critical value, the two distribution are said to be different or the first distribution does not fit (well) the second distribution. If the difference if smaller that the critical value, the first distribution fits well the second distribution

(Pearson's ) Chi-Square test  Pearson's chi-square is used to assess two types of comparison:  tests of goodness of fit: it establishes whether or not an observed frequency distribution differs from a theoretical distribution.  tests of independence. it assesses whether paired observations on two variables are independent of each other.  For example, whether people from different regions differ in the frequency with which they report that they support a political candidate.  If the chi-square probability is less or equal to 0.05 then we say that  both distributions are equal (goodness-of-fit) or that  the row variable is unrelated (that is, only randomly related) to the column variable (test of independence).

Chi-Square Distribution

Chi-Square Distribution

(Pearson's ) Chi-Square test  The chi-square test, in general, can be used to check whether an empirical distribution follows a specific theoretical distribution.  Chi-square is calculated by finding the difference between each observed (O) and theoretical or expected (E) frequency for each possible outcome, squaring them, dividing each by the theoretical frequency, and taking the sum of the results.  For n data outcomes (observations), the chi-square statistic is defined as:  O i = an observed frequency for a given outcome;  E i = an expected (theoretical) frequency for a given outcome;  n = the number of possible outcomes of each event;

(Pearson's ) Chi-Square test A chi-square probability of 0.05 or less is the criteria to accept or reject the test of difference between the empirical and theoretical distributions.

Chi-Square test: General Algorithm  We say that the observed distribution (empirical) fits well the expected distribution (theoretical) if:  (k – 1 – c) is the degree of freedom, where  k is the number of possible outcome and  c is the number of estimated parameters. 1-  is the confidence level (basically, we use  = 0.05)

Chi-Square test: Example Uniform distribution in [0.. 9] PASS

(KS-Test) Kolmogorov – Smirnov Test for Continuous Models  In statistics, the Kolmogorov–Smirnov test (K–S test) quantifies a distance between the empirical distribution function of the sample and the cumulative distribution function of the expected distribution, or between the empirical distribution functions of two samples.  It can be used for both continuous and discrete models  Basic idea: compute the maximum distance between two cumulative distribution functions and compare it to critical value.  If the maximum distance is smaller than the critical value, the first distribution fits the second distribution  If the maximum distance is greater than the critical value, the first distribution does not fit the second distribution

Kolmogorov – Smirnov test  In statistics, the Kolmogorov – Smirnov test is used to determine  whether two one-dimensional probability distributions differ, or  whether an probability distribution differs from a hypothesized distribution, in either case based on finite samples.  The Kolmogorov-Smirnov test statistic measures the largest vertical distance between an empirical cdf calculated from a data set and a theoretical cdf.  The one-sample KS-test compares the empirical distribution function with a cumulative distribution function.  The main applications are testing goodness-of-fit with the normal and uniform distributions.

Kolmogorov–Smirnov Statistic  Let X1, X2, …, Xn be iid random variables in with the CDF equal to F(x).  The empirical distribution function F n (x) based on sample X1, X2, …, Xn is a step function defined by:  The Kolmogorov-Smirnov test statistic for a given function F(x) is

Kolmogorov–Smirnov Statistic  The Kolmogorov-Smirnov test statistic for a given function F(x) is Facts  By the Glivenko-Cantelli theorem, if the sample comes from a distribution F(x), then D n converges to 0 almost surely.  In other words, If X1, X2, …, Xn really come from the distribution with CDF F(X), the distance D n should be small

D max Example

Example: Grade Distribution ?  We would like to know the distribution of the Grades of students.  First, determine the empirical distribution  Second, compare to Normal and Poisson distributions  Data Sample: 50 Grades in a course and computed the empirical distribution  Mean = 63  Standard Deviation = 15

Example: Grade Distribution ?

D max,Poisson = D max,Normal = 0.119

Kolmogorov–Smirnov Acceptance Criteria  Rejection Criteria: We consider that the two distributions are not equal if the empirical CDF is too far from the theoritical CDF of the proposed distribution  This means: We reject if D n is too large.  But the question is: What does large mean ? For which values of D n should we accept the distribution?

In the 1930’s, Kolmogorov and Smirnov showed that So, for large sample sizes, you could assume  level test : find the value of t such. So, the test is accepted if Kolmogorov–Smirnov test Critical value

 For small samples, people have worked out and tabulated critical values, but there is no nice closed form solution. J. Pomeranz (1973) J. Durbin (1968)  For Large Samples: Good approximations for n>40: Kolmogorov–Smirnov test

Example: Grade Distribution ?  For our example, we have n = 50  The critical value for a  = 0.05 ACCEPT

Example: Grade Distribution ?  If we get the same distribution for n = 100  The critical value for a  = 0.05 ACCEPT REJECT

Linear Regression: Least Square Method In statistics, linear regression is a form of regression analysis in which the relationship between one or more independent variables and another variable, called dependent variable, is modeled by a least squares function, called linear regression equation. This function is a linear combination of one or more model parameters, called regression coefficients. A linear regression equation with one independent variable represents a straight line. The results are subject to statistical analysis.

The Method of Least Squares  The equation of the best-fitting line is calculated using a set of n pairs (x i, y i ).  We choose our estimates a and b to estimate a and b so that the vertical distances of the points from the line, are minimized. SSE: Sum of Square of Errors

Least Squares Estimators

Example The table shows the math achievement test scores for a random sample of n = 10 college freshmen, along with their final calculus grades. Student Math test, x Calculus grade, y Use your calculator to find the sums and sums of squares.

Example

Correlation Analysis In probability theory and statistics, correlation (often measured as a correlation coefficient) indicates the strength and direction of a linear relationship between two random variables. In general statistical usage, correlation or co-relation refers to the departure of two variables from independence. In this broad sense there are several coefficients, measuring the degree of correlation, adapted to the nature of the data.

Correlation Analysis coefficient of correlationThe strength of the relationship between x and y is measured using the coefficient of correlation: The sign of r indicates the direction of the relationship; r near 0 indicates no linear relationship, r near 1 or -1 indicates a strong linear relationship. A test of the significance of the correlation coefficient is identical to the test of the slope .

Example The table shows the heights and weights of n = 10 randomly selected college football players. Player Height, x Weight, y Use your calculator to find the sums and sums of squares.

Football Players r =.8261 Strong positive correlation As the player’s height increases, so does his weight. r =.8261 Strong positive correlation As the player’s height increases, so does his weight.

Some Correlation Patterns r = 0; No correlation r =.931; Strong positive correlation r = 1; Linear relationship r = -.67; Weaker negative correlation