x y z The data as seen in R [1,] 58035 354.559 46 population city manager compensation [2,] 120100 351.593 998 [3,] 102743 339.815 615 [4,] 117242 321.533.

Slides:



Advertisements
Similar presentations
Lecture 10 F-tests in MLR (continued) Coefficients of Determination BMTRY 701 Biostatistical Methods II.
Advertisements

BA 275 Quantitative Business Methods
Regresión Lineal Simple PlantaCapVol Rosario Coyoacán Acueducto de Guadalupe San Juan de Aragón Ciudad Deportiva
ANOVA ANALYSIS Eighth-Grade Pupils in the Netherlands.
Regression with ARMA Errors. Example: Seat-belt legislation Story: In February 1983 seat-belt legislation was introduced in UK in the hope of reducing.
1 Lecture 2: ANOVA, Prediction, Assumptions and Properties Graduate School Social Science Statistics II Gwilym Pryce
Multiple Regression Predicting a response with multiple explanatory variables.
Zinc Data SPH 247 Statistical Analysis of Laboratory Data.
Linear Regression Exploring relationships between two metric variables.
SPH 247 Statistical Analysis of Laboratory Data 1April 23, 2010SPH 247 Statistical Analysis of Laboratory Data.
DJIA1 Beneath the Calm Waters: A Study of the Dow Index Group 5 members Project Choice: Hyo Joon You Data Retrieval: Stephen Meronk Statistical Analysis:
Examining Relationship of Variables  Response (dependent) variable - measures the outcome of a study.  Explanatory (Independent) variable - explains.
Nemours Biomedical Research Statistics April 2, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.
7/2/ Lecture 51 STATS 330: Lecture 5. 7/2/ Lecture 52 Tutorials  These will cover computing details  Held in basement floor tutorial lab,
Crime? FBI records violent crime, z x y z [1,] [2,] [3,] [4,] [5,]
Some Analysis of Some Perch Catch Data 56 perch were caught in a freshwater lake in Finland Their weights, lengths, heights and widths were recorded It.
Simple Linear Regression: An Introduction Dr. Tuan V. Nguyen Garvan Institute of Medical Research Sydney.
Multiple Regression Analysis. General Linear Models  This framework includes:  Linear Regression  Analysis of Variance (ANOVA)  Analysis of Covariance.
Regression Transformations for Normality and to Simplify Relationships U.S. Coal Mine Production – 2011 Source:
Checking Regression Model Assumptions NBA 2013/14 Player Heights and Weights.
How to plot x-y data and put statistics analysis on GLEON Fellowship Workshop January 14-18, 2013 Sunapee, NH Ari Santoso.
Regression Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
BIOL 582 Lecture Set 19 Matrices, Matrix calculations, Linear models using linear algebra.
PCA Example Air pollution in 41 cities in the USA.
9/14/ Lecture 61 STATS 330: Lecture 6. 9/14/ Lecture 62 Inference for the Regression model Aim of today’s lecture: To discuss how we assess.
SWC Methodology - TWG February 19, 2015 Settlement Document Subject to I.R.E. 408.
Analysis of Covariance Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
 Combines linear regression and ANOVA  Can be used to compare g treatments, after controlling for quantitative factor believed to be related to response.
7.1 - Motivation Motivation Correlation / Simple Linear Regression Correlation / Simple Linear Regression Extensions of Simple.
No Intercept Regression and Analysis of Variance.
23-1 Analysis of Covariance (Chapter 16) A procedure for comparing treatment means that incorporates information on a quantitative explanatory variable,
Managerial Economics Demand Estimation. Scatter Diagram Regression Analysis.
Testing Multiple Means and the Analysis of Variance (§8.1, 8.2, 8.6) Situations where comparing more than two means is important. The approach to testing.
Use of Weighted Least Squares. In fitting models of the form y i = f(x i ) +  i i = 1………n, least squares is optimal under the condition  1 ……….  n.
Regression and Analysis Variance Linear Models in R.
Exercise 8.25 Stat 121 KJ Wang. Votes for Bush and Buchanan in all Florida Counties Palm Beach County (outlier)
Lecture 9: ANOVA tables F-tests BMTRY 701 Biostatistical Methods II.
Regression Model Building LPGA Golf Performance
Byron Gangnes Econ 427 lecture 3 slides. Byron Gangnes A scatterplot.
Using R for Marketing Research Dan Toomey 2/23/2015
FACTORS AFFECTING HOUSING PRICES IN SYRACUSE Sample collected from Zillow in January, 2015 Urban Policy Class Exercise - Lecy.
Exercise 1 The standard deviation of measurements at low level for a method for detecting benzene in blood is 52 ng/L. What is the Critical Level if we.
Tutorial 4 MBP 1010 Kevin Brown. Correlation Review Pearson’s correlation coefficient – Varies between – 1 (perfect negative linear correlation) and 1.
Lecture 7: Multiple Linear Regression Interpretation with different types of predictors BMTRY 701 Biostatistical Methods II.
Statistics for Business and Economics 8 th Edition Chapter 11 Simple Regression Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Ch.
Environmental Modeling Basic Testing Methods - Statistics III.
Lecture 6: Multiple Linear Regression Adjusted Variable Plots BMTRY 701 Biostatistical Methods II.
Determining Factors of GPA Natalie Arndt Allison Mucha MA /6/07.
Lecture 6: Multiple Linear Regression Adjusted Variable Plots BMTRY 701 Biostatistical Methods II.
Lecture 3 Linear Models II Olivier MISSA, Advanced Research Skills.
Linear Models Alan Lee Sample presentation for STATS 760.
EPP 245 Statistical Analysis of Laboratory Data 1April 23, 2010SPH 247 Statistical Analysis of Laboratory Data.
Stat 1510: Statistical Thinking and Concepts REGRESSION.
Tutorial 5 Thursday February 14 MBP 1010 Kevin Brown.
The Effect of Race on Wage by Region. To what extent were black males paid less than nonblack males in the same region with the same levels of education.
Nemours Biomedical Research Statistics April 9, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.
© 2001 Prentice-Hall, Inc.Chap 13-1 BA 201 Lecture 19 Measure of Variation in the Simple Linear Regression Model (Data)Data.
1 Analysis of Variance (ANOVA) EPP 245/298 Statistical Analysis of Laboratory Data.
Lecture 11: Simple Linear Regression
Chapter 12 Simple Linear Regression and Correlation
Résolution de l’ex 1 p40 t=c(2:12);N=c(55,90,135,245,403,665,1100,1810,3000,4450,7350) T=data.frame(t,N,y=log(N));T; > T t N y
CHAPTER 7 Linear Correlation & Regression Methods
Correlation and regression
Linear Regression.
Console Editeur : myProg.R 1
Chapter 12 Simple Linear Regression and Correlation
Regression Transformations for Normality and to Simplify Relationships
Multi Linear Regression Lab
Obtaining the Regression Line in R
Estimating the Variance of the Error Terms
Presentation transcript:

x y z The data as seen in R [1,] population city manager compensation [2,] [3,] [4,] [5,] [6,] [7,] [8,] [9,] [10,] [11,] [12,] [13,] [14,] [15,] [16,] [17,] [18,] [19,] [20,] [21,]

y. The decimal point is 2 digit(s) to the right of the | 2 | | | | 55 log10(x). |The decimal point is at the | 4 | 02 4 | | | 9 6 | 0

OLS. S(A,B) =  (Y i - A - BX i ) 2 min =  E i 2 "sum of residuals-squared" SSR Residuals. -0 | | | | 679 S E 2 = SSR/(n-2) = 45.11

Total sum of squares (TSS) TSS =  (Y i - ) 2 = i = A + BX i Residual sum of squares (RSS) RSS =  (Y i - i ) 2 = Regression sum of squares (RegSS) RegSS =  ( i - ) 2 = 6173

Analysis of variance TSS = RegSS + RSS Analysis of Variance Table. via anova(lm(y~log10(x)) Response: y Df Sum Sq Mean Sq F value Pr(>F) log10(x) Residuals

Call: lm(formula = y ~ log10(x)) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) log10(x) Signif. codes: 0 ‘***’ ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: on 19 degrees of freedom Multiple R-squared: , Adjusted R-squared: , r = S XY /(S X S Y )=.371 F-statistic: on 1 and 19 DF, p-value:

Conclusion. Not much in the way of a linear relationship Might have taken log10(x) as the explanatory and y as the explanatory. Why not? "Correlation does not imply causation" Perhaps there is an important missing explanatory

Crime? FBI records violent crime values per thousand, z x y z [1,] [2,] [3,] [4,] [5,] [6,] [7,] [8,] [9,]

Multiple regression. S(A, B 1, B 2 ) =  (Y i - A - B 1 X i1 - B 2 X i2 ) 2 min =  E i 2 "sum of residuals-squared" Analysis of variance TSS = RegSS + RSS multiple R-squared = RegSS/TSS

Total sum of squares (TSS) TSS =  (Y i - ) 2 = i = A + B 1 X i1 + B 2 X i2 Residual sum of squares (RSS) RSS =  (Y i - i ) 2 = Regression sum of squares (RegSS) RegSS =  ( i - ) 2 = 6173

Analysis of variance TSS = RegSS + RSS zz = z/x ANOVA Table. via anova(lm(y~log10(x) + zz) Response: y Df Sum Sq Mean Sq F value Pr(>F) log10(x) zz Residuals

Call: lm(formula = y ~ (x) + zz) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) log10(x) zz Residual standard error: on 18 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 2 and 18 DF, p-value:

Conclusion. Including violent crime does not appear to have helped too much But R-squared did rise

Normal equations. Y i = A + BX i + E i S(A,B) =  (Y i - A - BX i ) 2 (-2)  (Y i - A - BX i ) = 0  E i = 0 (-2)  X i (Y i - A - BX i ) = 0  X i E i = 0