MATH1005 STATISTICS M.Harahap@maths.usyd.edu.au http://mahritaharahap.wordpress.com/teaching-areas Tutorial 3: Bivariate Data.

Slides:



Advertisements
Similar presentations
Chapter 3 Examining Relationships Lindsey Van Cleave AP Statistics September 24, 2006.
Advertisements

Lesson 10: Linear Regression and Correlation
Forecasting Using the Simple Linear Regression Model and Correlation
Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.
Chapter 4 The Relation between Two Variables
1-1 Regression Models  Population Deterministic Regression Model Y i =  0 +  1 X i u Y i only depends on the value of X i and no other factor can affect.
Correlation and Regression Analysis
Linear Regression and Correlation
Correlation and Regression. Correlation What type of relationship exists between the two variables and is the correlation significant? x y Cigarettes.
© 2000 Prentice-Hall, Inc. Chap Forecasting Using the Simple Linear Regression Model and Correlation.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Correlation & Regression
Correlation and Regression
Linear Regression and Correlation
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
Regression and Correlation Methods Judy Zhong Ph.D.
Chapter 14 – Correlation and Simple Regression Math 22 Introductory Statistics.
Chapter 6 & 7 Linear Regression & Correlation
Notes Bivariate Data Chapters Bivariate Data Explores relationships between two quantitative variables.
Notes Bivariate Data Chapters Bivariate Data Explores relationships between two quantitative variables.
Basic Concepts of Correlation. Definition A correlation exists between two variables when the values of one are somehow associated with the values of.
Section 9-1: Inference for Slope and Correlation Section 9-3: Confidence and Prediction Intervals Visit the Maths Study Centre.
Aim: Review for Exam Tomorrow. Independent VS. Dependent Variable Response Variables (DV) measures an outcome of a study Explanatory Variables (IV) explains.
Examining Bivariate Data Unit 3 – Statistics. Some Vocabulary Response aka Dependent Variable –Measures an outcome of a study Explanatory aka Independent.
Chapter Bivariate Data (x,y) data pairs Plotted with Scatter plots x = explanatory variable; y = response Bivariate Normal Distribution – for.
3.3 Correlation: The Strength of a Linear Trend Estimating the Correlation Measure strength of a linear trend using: r (between -1 to 1) Positive, Negative.
Chapter 4 Summary Scatter diagrams of data pairs (x, y) are useful in helping us determine visually if there is any relation between x and y values and,
5.4 Line of Best Fit Given the following scatter plots, draw in your line of best fit and classify the type of relationship: Strong Positive Linear Strong.
Essential Statistics Chapter 51 Least Squares Regression Line u Regression line equation: y = a + bx ^ –x is the value of the explanatory variable –“y-hat”
CHAPTER 5 CORRELATION & LINEAR REGRESSION. GOAL : Understand and interpret the terms dependent variable and independent variable. Draw a scatter diagram.
CORRELATION ANALYSIS.
26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.
SOCW 671 #11 Correlation and Regression. Uses of Correlation To study the strength of a relationship To study the direction of a relationship Scattergrams.
BUSINESS MATHEMATICS & STATISTICS. Module 6 Correlation ( Lecture 28-29) Line Fitting ( Lectures 30-31) Time Series and Exponential Smoothing ( Lectures.
Introduction Many problems in Engineering, Management, Health Sciences and other Sciences involve exploring the relationships between two or more variables.
Chapter 11 Linear Regression and Correlation. Explanatory and Response Variables are Numeric Relationship between the mean of the response variable and.
Chapter 2 Bivariate Data Scatterplots.   A scatterplot, which gives a visual display of the relationship between two variables.   In analysing the.
26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.
26134 Business Statistics Autumn 2017
Inference about the slope parameter and correlation
Scatter Plots and Correlation
Chapter 4 Basic Estimation Techniques
MATH 2311 Section 5.1 & 5.2.
REGRESSION (R2).
Data Analysis Module: Correlation and Regression
26134 Business Statistics Week 5 Tutorial
Basic Estimation Techniques
Correlation and Regression Basics
26134 Business Statistics Week 6 Tutorial
CHAPTER 10 Correlation and Regression (Objectives)
Correlation and Regression Basics
Simple Linear Regression
1) A residual: a) is the amount of variation explained by the LSRL of y on x b) is how much an observed y-value differs from a predicted y-value c) predicts.
Regression Analysis PhD Course.
Basic Estimation Techniques
2. Find the equation of line of regression
The Weather Turbulence
Investigation 4 Students will be able to identify correlations in data and calculate and interpret standard deviation.
Correlation and Regression
CORRELATION ANALYSIS.
BA 275 Quantitative Business Methods
STEM Fair Graphs.
Section 1.4 Curve Fitting with Linear Models
MATH1005 STATISTICS Mahrita Harahap Tutorial 2: Numerical Summaries and Boxplot.
Linear Regression and Correlation
Linear Regression and Correlation
Algebra Review The equation of a straight line y = mx + b
Correlation & Trend Lines
Statistics 101 CORRELATION Section 3.2.
Bivariate Data.
Presentation transcript:

MATH1005 STATISTICS M.Harahap@maths.usyd.edu.au http://mahritaharahap.wordpress.com/teaching-areas Tutorial 3: Bivariate Data

In statistics we usually want to statistically analyse a population but collecting data for the whole population is usually impractical, expensive and unavailable. That is why we collect samples from the population (sampling) and make inferences about the population parameters using the statistics of the sample (inferencing) with some level of accuracy (confidence level). A population is a collection of all possible individuals, objects, or measurements of interest. A sample is a subset of the population of interest.

Regression The linear regression line characterises the relationship between two numerical variables. Using regression analysis on data can help us draw insights about that data. It helps us understand the impact of one of the variables on the other. It examines the relationship between one independent variable (predictor/explanatory) and one dependent variable (response/outcome) . The linear regression line equation is based on the equation of a line in mathematics. β0+β1X

Y: Outcome variable Response Variable Dependent Variable The outcome to be measured/predicted. X: Predictor Variable Explanatory Variable Independent Variable The variable one can control.

Correlation Correlation measures the association between two numerical variables with the strength of the relationship measured by the correlation coefficient r. A statistic that quantifies a linear relation between two variables Falls between -1.00 and 1.00 The sign of the number indicates the direction of relationship. The value of the number indicates the strength of the relation. NOTE: Regression examines the relationship between one independent variable and one dependent variable. That is the slope of the linear regression. Correlation indicates the association between two metric variables with the strength and direction of the relationship measured by the correlation coefficient.

Strength & Direction of Correlation DIRECTION: POSITIVE NEGATIVE STRENGTH: PERFECT STRONG MODERATE WEAK

R2 Coefficient of Determination R-squared gives us the proportion of the total variability in the response variable (Y) that is “explained” by the least squares regression line based on the predictor variable (X). It is usually stated as a percentage. — —Interpretation: On average, R2% of the variation in the dependent variable can be explained by the independent variable through the regression model.

> Result <- Olympics100mW$Result > Olympics100mW[order(Result),] Year Athlete Medal Country Result 1988 Florence Griffith-Joyner GOLD USA 10.54 2012 Shelly-Ann Fraser-Pryce GOLD JAM 10.75 # The reigning champion is Florence Griffith-Joyner from the USA with a time of 10.54s at the 1988 Seoul Olympics.

# The scatter plot on the right indicates a linear regression might be appropriate which is further suggested by the correlation coefficient r = -0.8736502 and that 76% of the variability of Results is explained by Years.

# The boxplot shows 1 outlier (9502mins in 1945 by Rani). # Take a logarithm transformation of Time to get rid of the outlier, and use this as your subsequent y variable.