1 Choosing independent variables The main idea of variables selection is to reduce the number of independent variables. The goal is to identify the independent.

Slides:



Advertisements
Similar presentations
Simple Linear Regression and Correlation by Asst. Prof. Dr. Min Aung.
Advertisements

Spearman’s Rank Correlation Coefficient
4/14/ lecture 81 STATS 330: Lecture 8. 4/14/ lecture 82 Collinearity Aims of today’s lecture: Explain the idea of collinearity and its connection.
2013/12/10.  The Kendall’s tau correlation is another non- parametric correlation coefficient  Let x 1, …, x n be a sample for random variable x and.
Linear regression and correlation
Correlation and Regression
Correlation Correlation is the relationship between two quantitative variables. Correlation coefficient (r) measures the strength of the linear relationship.
1 Multiple Regression Interpretation. 2 Correlation, Causation Think about a light switch and the light that is on the electrical circuit. If you and.
PSY 307 – Statistics for the Behavioral Sciences
Basics of ANOVA Why ANOVA Assumptions used in ANOVA
Statistics 350 Lecture 21. Today Last Day: Tests and partial R 2 Today: Multicollinearity.
Basics of ANOVA Why ANOVA Assumptions used in ANOVA
Farrokh Alemi, Ph.D. Kashif Haqqi M.D.
PSY 307 – Statistics for the Behavioral Sciences
Hypothesis Testing and T-Tests. Hypothesis Tests Related to Differences Copyright © 2009 Pearson Education, Inc. Chapter Tests of Differences One.
Presentation 12 Chi-Square test.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
Basics of ANOVA Why ANOVA Assumptions used in ANOVA Various forms of ANOVA Simple ANOVA tables Interpretation of values in the table Exercises.
Relationships between Variables. Two variables are related if they move together in some way Relationship between two variables can be strong, weak or.
Sections 9-1 and 9-2 Overview Correlation. PAIRED DATA Is there a relationship? If so, what is the equation? Use that equation for prediction. In this.
1 Chapter 9. Section 9-1 and 9-2. Triola, Elementary Statistics, Eighth Edition. Copyright Addison Wesley Longman M ARIO F. T RIOLA E IGHTH E DITION.
Chapter 13 Statistics © 2008 Pearson Addison-Wesley. All rights reserved.
The Scientific Method Interpreting Data — Correlation and Regression Analysis.
Linear Regression When looking for a linear relationship between two sets of data we can plot what is known as a scatter diagram. x y Looking at the graph.
HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Section 10.7.
Statistical Analysis A Quick Overview. The Scientific Method Establishing a hypothesis (idea) Collecting evidence (often in the form of numerical data)
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
User Study Evaluation Human-Computer Interaction.
1 © 2010 Pearson Education, Inc. All rights reserved © 2010 Pearson Education, Inc. All rights reserved Chapter 9 Matrices and Determinants.
Example 1: page 161 #5 Example 2: page 160 #1 Explanatory Variable - Response Variable - independent variable dependent variable.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
Lecture Slide #1 Logistic Regression Analysis Estimation and Interpretation Hypothesis Tests Interpretation Reversing Logits: Probabilities –Averages.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Chapter 13 Multiple Regression
1 Chapter 10 Correlation. Positive and Negative Correlation 2.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
9.1B – Computing the Correlation Coefficient by Hand
Chapter 22: Building Multiple Regression Models Generalization of univariate linear regression models. One unit of data with a value of dependent variable.
Measures of Association: Pairwise Correlation
7.1 Draw Scatter Plots and Best Fitting Lines Pg. 255 Notetaking Guide Pg. 255 Notetaking Guide.
Chapter 7 Calculation of Pearson Coefficient of Correlation, r and testing its significance.
Correlation. u Definition u Formula Positive Correlation r =
Learning Objectives After this section, you should be able to: The Practice of Statistics, 5 th Edition1 DESCRIBE the shape, center, and spread of the.
1 MVS 250: V. Katch S TATISTICS Chapter 5 Correlation/Regression.
Correlation Assumptions: You can plot a scatter graph You know what positive, negative and no correlation look like on a scatter graph.
Go to Table of Content Correlation Go to Table of Content Mr.V.K Malhotra, the marketing manager of SP pickles pvt ltd was wondering about the reasons.
MATH 1040 – FINAL PROJECT. Research Question  For SLCC (Utah) Summer 2013 students, is the number of concurrent class credits a student takes in high.
Pearson’s Correlation The Pearson correlation coefficient is the most widely used for summarizing the relation ship between two variables that have a straight.
Chapter 11: Categorical Data n Chi-square goodness of fit test allows us to examine a single distribution of a categorical variable in a population. n.
Chi Square Test of Homogeneity. Are the different types of M&M’s distributed the same across the different colors? PlainPeanutPeanut Butter Crispy Brown7447.
Scatter Plots and Correlation
REGRESSION (R2).
How regression works The Right Questions about Statistics:
CHAPTER 10 Correlation and Regression (Objectives)
Correlation and Regression
2. Find the equation of line of regression
Spearman Rank Order Correlation Example
CHAPTER 29: Multiple Regression*
Multiple Regression Models
Investigation 4 Students will be able to identify correlations in data and calculate and interpret standard deviation.
two variables two sets of data
Two Categorical Variables: The Chi-Square Test
Statistical Inference about Regression
Solve Systems of Linear Equations Substitution
STATISTICS Topic 1 IB Biology Miss Werba.
Unit 2 Quantitative Interpretation of Correlation
Objectives Vocabulary
7.1 Draw Scatter Plots and Best Fitting Lines
Correlation & Trend Lines
COMPARING VARIABLES OF ORDINAL OR DICHOTOMOUS SCALES: SPEARMAN RANK- ORDER, POINT-BISERIAL, AND BISERIAL CORRELATIONS.
Presentation transcript:

1 Choosing independent variables The main idea of variables selection is to reduce the number of independent variables. The goal is to identify the independent variables that are decently correlated with dependent variable and possibly not correlated among themselves. Otherwise the independent variables might be of linear relationship which may seriously damage the model.

2 Choosing independent variables Three popular methods of choosing independent variables are: Hellwig's method Graphs analysis method Correlation matrix method.

3 Hellwig’s method Three steps: 1.Number of combinations : 2 m -1 2.Individual capacity of every independent variable in the combination : 3.Integral capacity of information for every combination :

4 Hellwig’s method 1. Number of combinations In Hellwig’s method the number of combinations is provided by the formula 2 m –1 where m is the number of independent variables.

5 Hellwig’s method 2. Individual capacity of each independent variable in the combination is given by the formula: where: h kj – individual capacity of information for j-th variable in k-th combination

6 Hellwig’s method 2. Individual capacity of each independent variable in the combination is given by the formula: where: r 0j – correlation coefficient between j-th variable (independent) and dependent variable

7 Hellwig’s method 2. Individual capacity of each independent variable in the combination is given by the formula: where: r ij – correlation coefficient between i-th and j-th variable (both independent)

8 Hellwig’s method 2. Individual capacity of each independent variable in the combination is given by the formula: where: I k – the set of numbers of variables in k-th combination

9 Hellwig’s method 3. Integral capacity of information for every combination The next step is to calculate H k – integral capacity of information for each combination as the sum of individual capacities of information within each combination:

10 Hellwig’s method Q: HOW TO CHOOSE INDEPENDENT VARIABLES? A: LOOK AT INTEGRAL CAPACITIES OF INFORMATION. THE GREATEST Hk MEANS THAT VARIABLES FROM THIS COMBINATION SHOULD BE INCLUDED IN THE MODEL.

11 Let’s choose independent variables, using Hellwig's method. Example See details for calculations

12 Example First we need to have vector and matrix of correlation coefficients.  Correlation coefficients between every independent variable X1, X2 and X3 and dependent variable Y are provided in vector R 0.

13 Example First we need to have vector and matrix of correlation coefficients.  Correlation matrix R includes correlation coefficients between independent variables.

14 Example 1. Number of combinations We have 3 independent variables X1, X2 and X3. Thus we may have 2 m -1 = = 8-1= 7 combinations of independent variables. {X 1 } {X 2 } {X 3 } {X 1, X 2 } {X 1, X 3 } {X 2, X 3 } {X 1, X 2, X 3 }

15 Example 2. Individual capacity of independent variable in the combination 1

16 Example 2. Individual capacity of independent variable in the combination 2

17 Example 2. Individual capacity of independent variable in the combination 3

18 Example 2. Individual capacity of every independent variable in the combination 4

19 Example 2. Individual capacity of independent variables in the combination 5

20 Example 2. Individual capacity of every independent variables in the combination 6

21 Example

22 Example 3. Integral capacity of information for each combination The greatest integral capacity is for combination C4. Independent variables - X1, X2 - will be included in model.

23 Graph analysis method Three steps 1.Calculating r* 2.Modification of correlation matrix 3.Drawing the graph

24 Graph analysis method Q: HOW TO CHOOSE INDEPENDENT VARIABLES? A: LOOK AT THE GRAPHS. THE NUMBER OF GROUPS MEANS THE NUMBER OF VARIABLES INCLUDED IN THE MODEL. IF THERE’S SEPARATED (ISOLATED) VARIABLE, YOU SHOULD INCLUDE IT IN THE MODEL. FROM EACH GROUP, THE VARIABLE WITH THE GREATEST NUMBER OF LINKS SHOULD BE INCLUDED IN MODEL. IF THERE’S TWO VARIABLES WITH THE GREATEST NUMBER OF LINKS, YOU SHOULD TAKE THE VARIABLE WHICH IS MORE STRONGLY CORRELATED WITH DEPENDENT VARIABLE.

25 Graph analysis method 1.Calculating r* We start with calculating critical value of r* using the formula: where t α is provided in the table of t-Student distribution at the significance level α and the degrees of freedom n-2 (sometimes r* can be given, so there’s no need to calculate it).

26 Graph analysis method 2. Modification of correlation matrix The correlation coefficients for which are statistically irrelevant and we replace them with nulls in correlation matrix. 3. Drawing the graph Using modified correlation matrix we draw the graphs with bulbs representing the variables and the links representing correlation coefficients of statistical significance.

27 Example Let’s have an example (the same one as for Hellwig’s method, n=7)

28 Example 1. Calculating r* (n=7, t α, n-2 =t 0,05,5 =2,571)

29 Example 2.Modification of correlation matrix

30 Example 3. Drawing the graph Conclusion: Model will consist of X1 (as isolated variable) and x2 (cause is more strongly correlated with dependent variable – you may check it in R 0 vector).

31 Correlation matrix method 1.Calculate r* We start with calculating critical value of r* using the formula: where t α is provided in the table of t-Student distribution at the significance level α and the degrees of freedom n-2 (sometimes r* can be given, so there’s no need to calculate it).

32 2. To eliminate X i variables weakly correlated withY 3. To choose X s where [X s is the best source of information] 4. To eliminate X i variables strongly correlated with Xs

33 Example Let’s have an example (the same one as for Hellwig’s method and graph analysis metod, n=7)

34 Example 1. Calculating r* (n=7, t α, n-2 =t 0,05,5 =2,571)

35 2. To eliminate X i variables weakly correlated withY None of the variables will be eliminated

36 3. To choose X s where

37 4. To eliminate X i variables strongly correlated with Xs None of the variables will be eliminated.

38 3.(repeated). To choose another X s where

39 4. (repeated)To eliminate X i variables strongly correlated with Xs X3 will be eliminated.

40 4. Eliminate X3 (all correlation coefficients for X3) OUR FINAL CHOICE - X1 and X2

41 In this example level of significane can be changed – this will give us different results (you may check it if you want). DON’T EXPECT TO GET THE SAME RESULTS FROM THESE THREE METHODS…