Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sit in your permanent seat

Similar presentations


Presentation on theme: "Sit in your permanent seat"— Presentation transcript:

1 Sit in your permanent seat
QM222 Class 4 Section D1 Reviewing descriptive statistics and distributions, making scatter diagrams, and correlation coefficients Sit in your permanent seat QM222 Fall 2016 Section D1

2 Today we will.. Review of descriptive statistics (with Excel)
Scatter diagrams in Excel and Stata Correlation in Excel and Stata QM222 Fall 2016 Section D1

3 Assignment 1 What is the data set you plan to use?
What is main variable or variables in this data set that you plan to predict or explain? What specific question or questions will your project address? What company, governmental body or other organization would be interested in knowing the answer to this question? QM222 Fall 2016 Section D1

4 Review QM222 Fall 2016 Section D1

5 Descriptive Statistics -- review
We discussed means, medians, and when they will give different results. We discussed measures of spread-outness (dispersion) like the standard deviation, and the value at different percentiles (10%, 25%, 50%, 75% 90%) QM222 Fall 2016 Section D1

6 Distributions Distributions graph the likelihood of each X value on the Y- axis v. the X variable itself. There are similar to histograms, except that: In distributions, the intervals are tiny The Y-axis is the % of cases, not the # of cases Therefore the area beneath a distribution adds to 1 (100%). QM222 Fall 2016 Section D1

7 Normal Distribution A “Normal distribution” looks like a symmetric bell curve Symmetric means that the right side of the mean is a mirror image of the left side Bell curves look like a bell. Notation here: μ is the mean, and σ is the standard deviation Approximately 68% (or around 2/3rds) of the observations are within one standard deviation of the mean. Approximately 95% of the observations are within two standard deviations of the mean. Do problem sets on your own – it is the best way to learn the material. Mistakes on problem sets are not excessively penalized There may be a pop quiz on the problem set in section when it is due (with p=.5) QM222 Fall 2016 Section D1

8 Excel team practice in Descriptive Statistics
Open the file on sites.bu.edu/qm222projectcourse/other materials/data and other materials used in class: Class 2 ACS Business Major Earnings 2012 Hints: =AVERAGE() =MEDIAN() =STDEV() =MIN(), =(MAX) =PERCENTILE(range, 0.20) (for example) Or, in Excel Data--In Data Analysis- Descriptive Statistics , you can get all of these statistics. Answer this Q: Is this distribution “normal”? List several ways you know. Excel Formula Value Mean Median Standard deviation Range 5th percentile 95th percentile QM222 Fall 2016 Section D1

9 Descriptive statistics in Stata
. sum Earnings Variable | Obs Mean Std. Dev. Min Max Earnings | sum Earnings, detail Earnings Percentiles Smallest 1% 0 0 5% % Obs % Sum of Wgt % Mean Largest Std. Dev % % Variance 4.58e+09 95% Skewness % Kurtosis QM222 Fall 2016 Section D1

10 Relationship between 2 variables
QM222 Fall 2016 Section D1

11 Scatterplots can tell us
The direction (sign) of relationship between two variables (is the slope positive or negative?) The form of the relationship: linear vs. curved The strength of relationship If there are outliers QM222 Fall 2016 Section D1

12 Example: The Midwest seems to have the best SAT math scores
Example: The Midwest seems to have the best SAT math scores. But is this because fewer high schoolers in the Midwest take the SAT? QM222 Fall 2016 Section D1

13 Example: The Midwest seems to have the best SAT math scores
Example: The Midwest seems to have the best SAT math scores. But is this because fewer high schoolers in the Midwest take the SAT? QM222 Fall 2016 Section D1

14 Use a scatter plot! Each dot represents one “observation”, one data point
QM222 Fall 2016 Section D1

15 Use a scatter plot! Each dot represents one “observation”, one data point
What is an observation in this data set? A state. If I made a line, would the slope be positive or negative? Negative Would a line or a curve fit better? Probably a curve. Is the relationship strong? Hmmm…. kind of Are there outliers? Not really far out ones. QM222 Fall 2016 Section D1

16 Making scatter diagrams in Excel
In class exercise: Class 4: Open UniversityAdmissions_SAT.xlsx (a data set from NYC) in on sites.bu.edu/qm222projectcourse/other materials/data and other materials used in class Place the two columns you want in your graph side-by-side. The variable you want on the x-axis should be on the left. Make sure the top row of each column has a descriptive label for the variable. On the Insert tab, click the picture of a scatter diagram and then click on the first scatter with only markers and with no connecting lines. What does each observation represent? Make a scatter diagram with the school’s math mean score on the Y-axis and the school’s reading score on the X-axis. QM222 Fall 2016 Section D1

17 Your scatter diagram from Excel…
QM222 Fall 2016 Section D1

18 Making a scatter diagram in Stata
graph twoway scatter MathematicsMean ReadingMean QM222 Fall 2016 Section D1

19 We’d also like a numerical measure of how closely two variables move together: the Correlation coefficient The correlation (coefficient) tells us two things: The direction of association: When X goes up, does Y go up or down? The strength of the association: How closely related are Y and X, or, how strong is the link? It doesn’t tell us if the relationship is linear or curved – In fact, it assumes that the relationship is linear. QM222 Fall 2016 Section D1

20 Correlation coefficient: notation r or ρ
A positive correlation coefficient means: that when we see a higher value for one variable, we also tend to see a higher value for the other variable. A negative correlation coefficient means that when we see a higher value for one variable, we tend to see a lower value for the other variable. QM222 Fall 2016 Section D1

21 Correlation coefficient
A correlation coefficient that is zero means that there is no correlation If you did a scatter of X and Y, the dots would seem to have no relationship. QM222 Fall 2016 Section D1

22 The correlation coefficient is between 1 & -1 Closer to |1| means a stronger association
When r = 1 there is perfect positive correlation; if you did a scatter of X and Y, the dots would all lie exactly on an upward sloping line. When r = -1 there is perfect negative correlation; if you did a scatter of X and Y, the dots would all lie exactly on a downward sloping line. When r = 0 there is no correlation; if you did a scatter of X and Y, the dots would seem to have no relationship with each other. If you were to fit a line to the dots, it would be flat (since Y doesn’t change as X changes). QM222 Fall 2016 Section D1

23 How do you think the correlation coefficients compare in Figure A and Figure B below?
QM222 Fall 2016 Section D1

24 How do you think the correlation coefficients compare in Figure A and Figure B below?
Both are positive. Figure B fits more tightly around the line – its correlation coefficient is closer to 1. The fact that one is steeper doesn’t affect the correlation. QM222 Fall 2016 Section D1

25 Correlation in Excel To get the correlation (between 2 variables in Excel, =CORREL(range X, range Y) (Or, in Excel Data--In Data Analysis- Correlation, you can get the correlation between a all variables in a range.) In-Class exercise using UniversityAdmissions_SAT.xlsx: 1. Get the correlation between the math and reading school mean scores. 2. Get the correlation between the number of test takers and the reading mean scores. QM222 Fall 2016 Section D1

26 In Stata correlate MathematicsMean ReadingMean NumberofTestTakers
(obs=78) | Mathem~n Readin~n Number~s Mathematic~n | ReadingMean | NumberofTe~s | QM222 Fall 2016 Section D1

27 Interpreting the values of correlation
Measured correlations are almost never exactly 0, 1, or –1 A claim that two variables are uncorrelated typically means that the correlation is “near” 0 No absolute standard for what is a strong correlation, what is a weak correlation, and what is no correlation QM222 Fall 2016 Section D1

28 Correlation v. relation
The correlation coefficient measures the strength of linear relationship. A low value is not enough to conclude a lack of a strong link between the two variables. This picture has a near zero correlation … The two variables are very related, but it’s not a line with a single slope, but. QM222 Fall 2016 Section D1

29 Correlation does not mean Causation (i.e. one thing causes another)
QM222 Fall 2016 Section D1

30 Why correlation does not imply causation
Possible explanations for correlation between x and y: X causes Y a change in X will change Y. Y causes X a change in Y will change X X causes Y AND Y causes X this is known as simultaneity Another variable(s) cause both X and Y this is called a confounding factor QM222 Fall 2016 Section D1

31 Let’s go through the examples in the video… which is it?:
A. X causes Y B. Y causes X C. X causes Y AND Y causes X (simultaneity) D. Another variable(s) cause both X and Y (confounding factor) Ice cream (X) causes drownings (Y). Married men live longer than single men. Infants who sleep with the lights on tend to grow up short-sighted. Self esteem causes good grades. QM222 Fall 2016 Section D1

32 Assignment 2 paraphrased (from sites.bu.edu/qm222projectcourse)
What specific question or questions will your project address? What company, governmental body or other organization would be interested in knowing the answer to this question? What data source(s) are you using? In your data, what does each observation represent? What is the dependent variable(s) you plan to focus on? (Need the name from the dataset or how you are going to make it from other variables from the data set. What is the main explanatory variable(s) that you will focus on? (Need name from dataset or how you are making it, as above.)   What additional, possibly confounding variables, can you measure that you planning to include in your analysis? (Again, use the specific variable name in the dataset.) QM222 Fall 2016 Section D1


Download ppt "Sit in your permanent seat"

Similar presentations


Ads by Google