Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Basic Ideas in Probability and Statistics for Experimenters: Part II: Quantitative Work, Regression.

Slides:



Advertisements
Similar presentations
Hypothesis Testing Steps in Hypothesis Testing:
Advertisements

Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.
Inference for Regression
Regression Analysis Simple Regression. y = mx + b y = a + bx.
Simple Linear Regression. G. Baker, Department of Statistics University of South Carolina; Slide 2 Relationship Between Two Quantitative Variables If.
Correlation and regression
Simple Linear Regression
Chapter 12 Simple Linear Regression
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 ECSE-4690: Experimental Networking Informal Quiz: Prob/Stat Shiv Kalyanaraman:
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Basic Ideas in Probability and Statistics for Experimenters: Part I: Qualitative Discussion Shivkumar.
Correlation and Regression. Spearman's rank correlation An alternative to correlation that does not make so many assumptions Still measures the strength.
Chapter 10 Simple Regression.
Correlation and Simple Regression Introduction to Business Statistics, 5e Kvanli/Guynes/Pavur (c)2000 South-Western College Publishing.
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 ECSE-4963: Experimental Networking Exam 1: SOLUTIONS Time: 60 min (strictly enforced) Points:
The Simple Regression Model
SIMPLE LINEAR REGRESSION
1 Econ 240A Power Outline Review Projects 3 Review: Big Picture 1 #1 Descriptive Statistics –Numerical central tendency: mean, median, mode dispersion:
Statistics: Data Presentation & Analysis Fr Clinic I.
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 ECSE-4963: Experimental Networking Exam 1 Time: 60 min (strictly enforced) Points: 50 YOUR NAME:
Simple Linear Regression Analysis
REGRESSION AND CORRELATION
SIMPLE LINEAR REGRESSION
Correlation and Regression Analysis
Simple Linear Regression and Correlation
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Relationships Among Variables
SIMPLE LINEAR REGRESSION
Physics 114: Lecture 15 Probability Tests & Linear Fitting Dale E. Gary NJIT Physics Department.
Introduction to Linear Regression and Correlation Analysis
Regression Analysis Regression analysis is a statistical technique that is very useful for exploring the relationships between two or more variables (one.
Linear Regression Inference
CPE 619 Simple Linear Regression Models Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
Simple Linear Regression Models
Correlation and Regression
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
Stats for Engineers Lecture 9. Summary From Last Time Confidence Intervals for the mean t-tables Q Student t-distribution.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
Introduction to Linear Regression
Production Planning and Control. A correlation is a relationship between two variables. The data can be represented by the ordered pairs (x, y) where.
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
LECTURE 9 Tuesday, 24 FEBRUARY STA291 Fall Administrative 4.2 Measures of Variation (Empirical Rule) 4.4 Measures of Linear Relationship Suggested.
Review Lecture 51 Tue, Dec 13, Chapter 1 Sections 1.1 – 1.4. Sections 1.1 – 1.4. Be familiar with the language and principles of hypothesis testing.
[1] Simple Linear Regression. The general equation of a line is Y = c + mX or Y =  +  X.  > 0  > 0  > 0  = 0  = 0  < 0  > 0  < 0.
Chapter 12 Simple Linear Regression n Simple Linear Regression Model n Least Squares Method n Coefficient of Determination n Model Assumptions n Testing.
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
Ex St 801 Statistical Methods Inference about a Single Population Mean (CI)
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Basic Ideas in Probability and Statistics for Experimenters: Part I: Qualitative Discussion Shivkumar.
Simple Linear Regression and Correlation (Continue..,) Reference: Chapter 17 of Statistics for Management and Economics, 7 th Edition, Gerald Keller. 1.
Chapter 13 Linear Regression and Correlation. Our Objectives  Draw a scatter diagram.  Understand and interpret the terms dependent and independent.
The simple linear regression model and parameter estimation
Regression Analysis AGEC 784.
Inference for Regression (Chapter 14) A.P. Stats Review Topic #3
Linear Regression Models
Correlation and Regression
Statistical Methods For Engineers
Correlation and Regression
Simple Linear Regression
SIMPLE LINEAR REGRESSION
CHAPTER 12 More About Regression
Simple Linear Regression
Product moment correlation
SIMPLE LINEAR REGRESSION
Presentation transcript:

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Basic Ideas in Probability and Statistics for Experimenters: Part II: Quantitative Work, Regression Shivkumar Kalyanaraman Rensselaer Polytechnic Institute Based in part upon slides of Prof. Raj Jain (OSU) He uses statistics as a drunken man uses lamp-posts – for support rather than for illumination … A. Lang

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 2 q Quantitative examples of sample statistics, confidence interval for mean q Introduction to Regression q Also do the informal quiz handed out q Reference: Chap 12, 13 (Jain), Chap 2-3 (Box,Hunter,Hunter), q q Regression Applet: html html q Overview

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 3 Independence q P(x  y) = 1/18(2x + y) for x = 1,2; and y = 1,2 and zero otherwise. Are the variables X and Y independent? Can you speculate why they are independent or dependent? [Hint: P(X) and P(Y) can be formed by summing the above distribution (aka a joint distribution), since the combinations for specific values of x and y are mutually exclusive]

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 4 Independence q P(x  y) = 1/30(x 2 y) for x = 1,2; and y = 1,2,3 and zero otherwise. Are the variables X and Y independent? Can you speculate why they are independent or dependent?

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 5 Recall: Sampling Distribution Uniform distribution looks nothing like bell shaped (gaussian)! Large spread (  )! But the sampling distribution looks gaussian with smaller Standard deviation! Sample mean ~ N( ,  /(n) 0.5 ) i.e. the standard deviation of the sample mean (aka standard error) decreases with larger samples (n)

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 6 Confidence Interval  Sample mean: xbar ~ N( ,  /(n) 0.5 ) q The 100(1-  )% confidence interval is given by (if n > 30): {xbar - z (1-  /2) s/ (n) 0.5, xbar + z (1-  /2) s/ (n) 0.5 } q z  ~ N(0, 1); I.e. it is the unit normal distribution  P { (y -  )/  <= z  ) =   Eg 90% CI: {xbar - z (0.95) s/ (n) 0.5, xbar + z (0.95) s/ (n) 0.5 } q Refer to z-tables on pg 629, 630 (table A.2 or A.3) z=1z=2

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 7 Meaning of Confidence Interval

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 8 Ex: Sample Statistics, Confidence Interval q Given: n=32 random RTT samples (in ms): q {31, 42, 28, 51, 28, 44, 56, 39, 39, 27, 41, 36, 31, 45, 38, 29, 34, 33, 28, 45, 49, 53, 19, 37, 32, 41, 51, 32, 39, 48, 59, 42} q 1. Find: sample mean (xbar), median, mode, sample standard deviation (s), C.o.V., SIQR and 90% confidence interval (CI) & 95% CI for the population mean q 2. Interpret your statistics qualitatively. I.e. what do they mean? q Hint: Refer to the formulas in pg 197 and pg 219 of Jain’s text (esp for s). Latter is reproduced in one of the slides

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 9

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 10 t-distribution: for confidence intervals given few samples (6 <= n < 30) q Idea: t-distribution with n-1 degrees of freedom approximates normal distribution for larger n (n >= 6). q t-distribution is a poor approximation for lower degrees of freedom (I.e. smaller number of samples than 6!)

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 11 t-distribution: confidence intervals q The 100(1-  )% confidence interval is given by (if n <= 30): {xbar – t {1-  /2, n-1} s/ (n) 0.5, xbar + t {1-  /2, n-1} s/ (n) 0.5 } q Use t-distribution tables in pg 631, table A.4 q Eg: for n = 7, 90% CI: {xbar – t {0.95, 6} s/ (n) 0.5, xbar + t {0.95, 6} s/ (n) 0.5 }

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 12 Confidence Interval with few samples q Given: n=10 random RTT samples (in ms): {31, 42, 28, 51, 28, 44, 56, 39, 39, 27} q 1. Find: sample mean (xbar), sample standard deviation (s) and 90% confidence interval (CI) & 95% CI for the population mean q 2. Interpret this result relative to the earlier result using the normal distribution. q Hint: Refer to the formulas in pg 197 and pg 219 of Jain’s text (esp for s).

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 13 Linear Regression q Goal: determine the relationship between two random variables X and Y. q Example: X= height and Y=weight of a sample of adults. q Linear regression attempts to explain this relationship with a straight line fit to the data, I.e. a linear model. q The linear regression model postulates that Y= a+bX+e q Where the "residual“ or “error” e is a random variable with mean = zero. q The coefficients a and b are determined by the condition that the sum of the square residuals (I.e. the “energy” of residuals) is as small as possible (I.e. minimized).

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 14 Demo: Regression Applet q

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 15 Regression Theory q Model: y est = b 0 + b 1 x i q Error: e i = y i – y est q Sum of Squared Errors (SSE):  e i 2 =  (y i – b 0 + b 1 x i ) 2 q Mean error (e avg ):  e i =  (y i – b 0 + b 1 x i ) q Linear Regression problem: Minimize: SSE Subject to the constraint: e avg = 0 q Solution: (I.e. regression coefficients) q b 1 = s xy 2 /s x 2 = {  xy - x avg y avg }/{  x 2 – n(x avg ) 2 } q b 0 = y avg – b 1 x avg

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 16 Practical issues: Check linearity hypothesis with scatter diagram!

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 17 Practical issues: Check randomness & zero mean hypothesis for residuals!

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 18 Practical issues: Does the regression indeed explain the variation? q Coefficient of determination (R 2 ) is a measure of the value of the regression (I.e. variation explained by the regression relative to simple second order statistics). q But it can be misleading if scatter plot is not checked.

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 19 Non-linear regression: Make linear through transformation of samples!