STAT 203 Elementary Statistical Methods
Review of Basic Concepts Population and Samples Variables and Data Data Representation (Frequency Distn Tables, Graphs and Charts) Descriptive Measures Measures of Central Tendency (mean, median) Measures of Variation(standard deviation, range etc) Five Number Summaries AA-K 2014/152
Examining Relationships of Two Numerical Variables In many applications, we are not only interested in understanding variables of themselves, but also interested in examining the relationships among variables. Predictions are always required in business, economics and the physical sciences from historical or available data AA-K 2014/153
Examples Final Exam Score, Study time and class attendance Production; overhead cost, level of production, and the number of workers Real Estate; value of a home, size(square feet), area Economics; Demand and supply Business; Dividend yield and Earnings per share AA-K 2014/154
Terminology Dependent Variable (response variable or y- variable) Independent Variable (predictor variable or x- variable): AA-K 2014/155
Graphical Scatter plots (Useful for 2variables ) A scatter plot is a graph of plotted data pairs x and y. Matrix plot (Useful for more than 2 variables) It presents the individual scatter plots in a form of a matrix AA-K 2014/156
Example Consider some historic data for a production plant: Production Units (In 10,000s): Overhead Costs (In $1000s): Construct a scatter plot for y verses x AA-K 2014/157
Example (cont) AA-K 2014/158
Example of a matrix plot AA-K 2014/159
Linear Correlation Coefficient (r) Used as a computational approach to determine the relationship between 2 variables Pearson’s product moment correlation coefficient (PPMC) Spearman’s rank correlation coefficient Kendall’s correlation coefficient (τ) AA-K 2014/1510
Pearson’s product moment correlation coefficient (PPMC) AA-K 2014/1511
Properties of the Correlation Coefficient AA-K 2014/1512
Properties of the Correlation Coefficient (cont) AA-K 2014/1513
Some scatter plots AA-K 2014/1514
NOTE The correlation only measures “linear” relationship. Therefore, when the correlation is close to 0, it indicates that the two variables have a very weak linear relationship. It does not mean that the two variables may not be related in some different functional form (like quadratic, cubic, S-shaped, etc.) AA-K 2014/1515
Example of a quadratic relationship between X and Y AA-K 2014/1516
Simple Linear Regression AA-K 2014/1517
Simple Linear Regression AA-K 2014/1518
Exact (Deterministic) Relationship AA-K 2014/1519
Graph of an Exact Linear relationship x y AA-K 2014/1520
Non-exact relationship Data encountered in real life and many business applications do not have an exact relationship. Exact relationships are an exception rather than the rule Real life data are more likely to look like the graph below; AA-K 2014/1521
Graph of a Non-Exact Relationship x y AA-K 2014/1522
Assumptions for SLR There is a linear relationship (as determined) between the 2 variables from the scatter plot The dependent values of Y are mutually independent. For each value of x corresponding to Y-values are normally distributed The standard deviations of the Y-values for each value of x are the same (homoscedasticity) AA-K 2014/1523
Best-Fitting Line AA-K 2014/1524
Least-Square Criterion AA-K 2014/1525
Least-Square Criterion AA-K 2014/1526
Least Square Criterion AA-K 2014/1527
AA-K 2014/1528
Computation of Error Sum of Squares (SSE) AA-K 2014/1529
Mean Square Error (MSE) AA-K 2014/1530
Computation of Total Sum of Squares AA-K 2014/1531
Computation of Regression Sum of Squares (SSR) AA-K 2014/1532
Regression Mean Square AA-K 2014/1533
AA-K 2014/1534
Analysis of Variance (ANOVA) AA-K 2014/1535
Analysis of Variance for SLR SourceSum of Squares Degrees of Freedom Mean SquaresF RegressionSSRp-1MSR=SSR/p-1MSR/MSE ResidualSSEn-pMSR=SSE/n-p TotalSSTn-1 AA-K 2014/1536
Homework 1 Q1. The following data are annual disposable income (in $1000) and the total annual consumption (in $1000) for 12 families selected at random from a large metropolitan area. AA-K 2014/1537 Income Consum ption
Homework 1 (cont) i. Draw a scatter plot for the data and comment on the relationship between the 2 variables ii. Calculate the correlation coefficient and comment on the relationship between the variables. iii. Fit a simple linear regression of consumption on income for the data. iv. Interpret the least squares regression coefficient estimates in the context of the problem AA-K 2014/1538
Homework 1 (cont) v. Estimate the consumption of a family whose annual income is $ Would you consider the prediction as an Extrapolation? Why? vi. Draw an appropriate ANOVA table for the regression of Y on X vii. Compute the coefficient of determination R- square and interpret the value. AA-K 2014/1539
Homework 1 (cont) Q2 1.Hanna Properties is a real estate company which specializes in custom-home resale in Phoenix, Arizona. The following is a sample of the size (in hundreds of square feet) and price (in thousands of dollars) data for nine custom homes currently listed for sale. AA-K 2014/1540 size price
Homework 1 (cont) a) Draw a scatter plot for the data and comment on the relationship between the 2 variables b) Calculate the correlation coefficient between the size and price of custom homes at Hanna properties and comment on the relationship between the variables. c) Fit a simple linear regression of price on size for the data. d) Interpret the least squares regression coefficient estimates in the context of the problem AA-K 2014/1541
Homework 1 (cont) e) Estimate the price of a custom home from Hanna Properties if the size of the home is 3200 sq. ft? Would you consider the prediction as an Extrapolation? Why? f) Draw an appropriate ANOVA table for the regression of Y on X g) Compute the coefficient of determination R- square and interpret the value. AA-K 2014/1542