Vector geometry: A visual tool for statistics Sylvain Chartier Laboratory for Computational Neurodynamics and Cognition Centre for Neural Dynamics.

Slides:



Advertisements
Similar presentations
Geometric Representation of Regression. ‘Multipurpose’ Dataset from class website Attitude towards job –Higher scores indicate more unfavorable attitude.
Advertisements

Kin 304 Regression Linear Regression Least Sum of Squares
Tests of Significance for Regression & Correlation b* will equal the population parameter of the slope rather thanbecause beta has another meaning with.
Factor Analysis and Principal Components Removing Redundancies and Finding Hidden Variables.
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1 ~ Curve Fitting ~ Least Squares Regression Chapter.
Surface normals and principal component analysis (PCA)
Multiple Regression. Outline Purpose and logic : page 3 Purpose and logic : page 3 Parameters estimation : page 9 Parameters estimation : page 9 R-square.
Independent Sample T-test Formula
458 Fitting models to data – II (The Basics of Maximum Likelihood Estimation) Fish 458, Lecture 9.
Regionalized Variables take on values according to spatial location. Given: Where: A “structural” coarse scale forcing or trend A random” Local spatial.
Ignore parts with eye-ball estimation & computational formula
Distribution Summaries Measures of central tendency Mean Median Mode Measures of spread Standard Deviation Interquartile Range (IQR)
Introduction The graph of an equation in x and y is the set of all points (x, y) in a coordinate plane that satisfy the equation. Some equations have graphs.
MACHINE LEARNING 6. Multivariate Methods 1. Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Motivating Example  Loan.
Lecture 2: Geometry vs Linear Algebra Points-Vectors and Distance-Norm Shang-Hua Teng.
PSY 307 – Statistics for the Behavioral Sciences Chapter 7 – Regression.
Find hypotenuse length in a triangle EXAMPLE 1
Measures of Variability: Range, Variance, and Standard Deviation
EXAMPLE 1 Find hypotenuse length in a triangle o o o Find the length of the hypotenuse. a. SOLUTION hypotenuse = leg 2 = 8 2 Substitute
Relationships Among Variables
Separate multivariate observations
Correlation and Regression
@ 2012 Wadsworth, Cengage Learning Chapter 5 Description of Behavior Through Numerical 2012 Wadsworth, Cengage Learning.
Eigenvectors and Eigenvalues
CPE 619 Simple Linear Regression Models Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama.
Analysis of Variance: Some Review and Some New Ideas
Simple Linear Regression Models
Introduction to Regression Analysis. Two Purposes Explanation –Explain (or account for) the variance in a variable (e.g., explain why children’s test.
Section 11.6 Pythagorean Theorem. Pythagorean Theorem: In any right triangle, the square of the length of the hypotenuse equals the sum of the squares.
Oceanography 569 Oceanographic Data Analysis Laboratory Kathie Kelly Applied Physics Laboratory 515 Ben Hall IR Bldg class web site: faculty.washington.edu/kellyapl/classes/ocean569_.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
Statistics and Linear Algebra (the real thing). Vector A vector is a rectangular arrangement of number in several rows and one column. A vector is denoted.
Statistical Analysis. Statistics u Description –Describes the data –Mean –Median –Mode u Inferential –Allows prediction from the sample to the population.
MANAGERIAL ECONOMICS 11 th Edition By Mark Hirschey.
Go to Table of Content Single Variable Regression Farrokh Alemi, Ph.D. Kashif Haqqi M.D.
Hypothesis testing and effect sizes Sylvain Chartier Laboratory for Computational Neurodynamics and Cognition Centre for Neural Dynamics.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Understanding Your Data Set Statistics are used to describe data sets Gives us a metric in place of a graph What are some types of statistics used to describe.
–The shortest distance is the one that crosses at 90° the vector u Statistical Inference on correlation and regression.
Quality Control: Analysis Of Data Pawan Angra MS Division of Laboratory Systems Public Health Practice Program Office Centers for Disease Control and.
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 5. Measuring Dispersion or Spread in a Distribution of Scores.
X, Y X axis Y axis Let’s just start with a point on a plane surface like this sheet of paper. Now coordinate “x” describes how far to the right, and “y”
Variability Introduction to Statistics Chapter 4 Jan 22, 2009 Class #4.
Multiple Regression Analysis Regression analysis with two or more independent variables. Leads to an improvement.
Chi Square Test for Goodness of Fit Determining if our sample fits the way it should be.
1 Statistics & R, TiP, 2011/12 Multivariate Methods  Multivariate data  Data display  Principal component analysis Unsupervised learning technique 
Geometry 7-6 Circles, Arcs, Circumference and Arc Length.
Lesson 5.1 Evaluation of the measurement instrument: reliability I.
Introduction Dispersion 1 Central Tendency alone does not explain the observations fully as it does reveal the degree of spread or variability of individual.
1.2 Quadratic Equations. Quadratic Equation A quadratic equation is an equation equivalent to one of the form ax² + bx + c = 0 where a, b, and c are real.
Introduction to Vectors and Matrices
Analysis and Interpretation: Multiple Variables Simultaneously
Midpoint and Distance in the Coordinate Plane
Regression Analysis AGEC 784.
CH 5: Multivariate Methods
[non right-angled triangles]
7.4 Special Right Triangles
Quantitative Methods Simple Regression.
BPK 304W Correlation.
Correlation and Regression
Central Tendency.
Analysis of Variance: Some Review and Some New Ideas
5.7: THE PYTHAGOREAN THEOREM (REVIEW) AND DISTANCE FORMULA
HW# : Complete the last slide
Chapter 9: Differences among Groups
Lecture 2: Geometry vs Linear Algebra Points-Vectors and Distance-Norm
Vectors geometry: Playing with arrows
Introduction to Vectors and Matrices
Ch 4.1 & 4.2 Two dimensions concept
Pythagoras’ Theorem.
Presentation transcript:

Vector geometry: A visual tool for statistics Sylvain Chartier Laboratory for Computational Neurodynamics and Cognition Centre for Neural Dynamics

Vector geometry How using a vector (arrow) we can represent concepts of –Mean, variance (standard deviation), normalization and standardization. How using two vectors we can represent concepts of –Correlation and regression.

A datum (16) (0)

(16) (8) Principal of independence of observation : perfectly opposed direction (0) Two data

(16) (8) (16,8) (0) Two data (0, 0)

(16,8) (0, 0) Two data

Starting point: Zero (16,8) Finish point Starting point (0,0)

x = (x 1, x 2 ) Finish point Starting point Starting point: Mean

x = (16, 8) Finish point Starting point (12, 12) Starting point: Mean

One group

Many groups

Degrees of freedom

We remove the effect of the mean We centralized the data = (4, -4) Finish point Starting point (mean) (12, 12) (0, 0) x = (16, 8)

We remove the effect of the mean (many groups)

What is the real dimensionality? We remove the effect of the mean (many groups)

We remove the effect of the man If we have two data, we will get one dimension. If we have three data, we will get two dimensions... If we have n data, we will get n-1 dimensions.  In other words, degrees of freedom represent the true dimensionality of the data..

Variance

(1.5, -1.5) (-0.5, 0,5) (2.5, -2.5) What is the difference between these three (composed of two data each) ?  Length (distance)  The higher the variability, the longer the length will be.

What is the difference between these three groups? How do we measure the length (distance)?  Pythagoras  Hypotenuse of a triangle  ? =  (4^2+3^2) =  25 = 5 4 (4,3) 3 ?5

What is the difference between these three groups? Therefore, the point (4,3) is at a distance of 5 from its starting point. (4,3) 5 = sum of squares = variance×(n-1)

What is the difference between these three groups? What is the length of this three lines? 1 ? 1 1 ? ? A) C) B) 22 33  The dimensionality inflates the variability.  In order to a have measure that can take into account for the dimensionality, what do we need to do?

What is the difference between these three groups? We divide the length of the data set by its true dimensionality = (quadratic) distance (from the mean) corrected by the (true) dimensionality of the data.

Normalization et standardization

Normalization vs Standardization To normalize is equivalent as to bring a given vector x (arrow) centered (mean = 0) at a length of 1.. Normalization: z = x  by its length z T z = 1 Standardization: z x = x  SD z x T z x = n-1 => z x = z*  (n-1)

Two groups

One group of three participants

Two groups of three participants

They can be represented by a plane

Two groups of three participants They can be represented by a plane

Two groups of three participants They can be represented by a plane

Two groups of three participants They can be represented by a plane This is true whatever the number of participants

Correlation and regression

Relation between two vectors If two groups (u and v) has the same data, then the two vectors are superposed on each other. As the two vectors distinguish from each other, the angle between them will increase.

If the angle reaches 90 degrees, then they share nothing in common. Relation between two vectors

The cosine of the angle is the coefficient of correlation Relation between two vectors

–The shortest distance is the one that crosses at 90° the vector u Relation between two vectors Regression: b e

–By substitution, we can isolate the b 1 coefficient. Relation between two vectors Regression: The formula to obtain the regression coefficients can be obtained directly from the geometry If we generalized to any situation (multiple, multivariate)