Sit in your permanent seat

Slides:



Advertisements
Similar presentations
1 Business 260: Managerial Decision Analysis Professor David Mease Lecture 1 Agenda: 1) Course web page 2) Greensheet 3) Numerical Descriptive Measures.
Advertisements

PSYCHOLOGY: Themes and Variations Weiten and McCann Appendix B : Statistical Methods Copyright © 2007 by Nelson, a division of Thomson Canada Limited.
Statistics: For what, for who? Basics: Mean, Median, Mode.
Research & Statistics Looking for Conclusions. Statistics Mathematics is used to organize, summarize, and interpret mathematical data 2 types of statistics.
TYPES OF STATISTICAL METHODS USED IN PSYCHOLOGY Statistics.
Descriptive Statistics: Presenting and Describing Data.
Relationships If we are doing a study which involves more than one variable, how can we tell if there is a relationship between two (or more) of the.
LESSON 5 - STATISTICS & RESEARCH STATISTICS – USE OF MATH TO ORGANIZE, SUMMARIZE, AND INTERPRET DATA.
CHAPTER 11 Mean and Standard Deviation. BOX AND WHISKER PLOTS  Worksheet on Interpreting and making a box and whisker plot in the calculator.
Week 2 Normal Distributions, Scatter Plots, Regression and Random.
Outline Sampling Measurement Descriptive Statistics:
Thursday, May 12, 2016 Report at 11:30 to Prairieview
Math 201: Chapter 2 Sections 3,4,5,6,7,9.
Scatter Plots and Correlation Coefficients
Sit in your permanent seat
QM222 Nov. 9 Section D1 Visualizing Using Graphs More on your project Test returned QM222 Fall 2016 Section D1.
Sit in your permanent seat
Scatterplots Chapter 6.1 Notes.
CHAPTER 3 Describing Relationships
Describing Relationships
QM222 Class 3 Section A1 Descriptive Statistics and Distributions
LECTURE 13 Thursday, 8th October
QM222 Class 13 Section D1 Omitted variable bias (Chapter 13.)
Chapter 3: Describing Relationships
QM222 A1 More on Excel QM222 Fall 2017 Section A1.
Chapter 3: Describing Relationships
Descriptive Statistics: Presenting and Describing Data
Description of Data (Summary and Variability measures)
Statistics for the Social Sciences
Chapter 3: Describing Relationships
Chapter 2 Looking at Data— Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Scatterplots and Correlation
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 3 Scatterplots and Correlation.
3.1: Scatterplots & Correlation
Correlation.
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
MBA 510 Lecture 2 Spring 2013 Dr. Tonya Balan 4/20/2019.
Summarizing Bivariate Data
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
AP Stats Agenda Text book swap 2nd edition to 3rd Frappy – YAY
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Business and Economics 7th Edition
Chapter 3: Describing Relationships
Introduction to Excel 2007 Part 1: Basics and Descriptive Statistics Psych 209.
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
Presentation transcript:

Sit in your permanent seat QM222 Class 4 Section D1 Reviewing descriptive statistics and distributions, making scatter diagrams, and correlation coefficients Sit in your permanent seat QM222 Fall 2016 Section D1

Today we will.. Review of descriptive statistics (with Excel) Scatter diagrams in Excel and Stata Correlation in Excel and Stata QM222 Fall 2016 Section D1

Assignment 1 What is the data set you plan to use? What is main variable or variables in this data set that you plan to predict or explain? What specific question or questions will your project address? What company, governmental body or other organization would be interested in knowing the answer to this question? QM222 Fall 2016 Section D1

Review QM222 Fall 2016 Section D1

Descriptive Statistics -- review We discussed means, medians, and when they will give different results. We discussed measures of spread-outness (dispersion) like the standard deviation, and the value at different percentiles (10%, 25%, 50%, 75% 90%) QM222 Fall 2016 Section D1

Distributions Distributions graph the likelihood of each X value on the Y- axis v. the X variable itself. There are similar to histograms, except that: In distributions, the intervals are tiny The Y-axis is the % of cases, not the # of cases Therefore the area beneath a distribution adds to 1 (100%). QM222 Fall 2016 Section D1

Normal Distribution A “Normal distribution” looks like a symmetric bell curve Symmetric means that the right side of the mean is a mirror image of the left side Bell curves look like a bell. Notation here: μ is the mean, and σ is the standard deviation Approximately 68% (or around 2/3rds) of the observations are within one standard deviation of the mean. Approximately 95% of the observations are within two standard deviations of the mean. Do problem sets on your own – it is the best way to learn the material. Mistakes on problem sets are not excessively penalized There may be a pop quiz on the problem set in section when it is due (with p=.5) QM222 Fall 2016 Section D1

Excel team practice in Descriptive Statistics Open the file on sites.bu.edu/qm222projectcourse/other materials/data and other materials used in class: Class 2 ACS Business Major Earnings 2012 Hints: =AVERAGE() =MEDIAN() =STDEV() =MIN(), =(MAX) =PERCENTILE(range, 0.20) (for example) Or, in Excel Data--In Data Analysis- Descriptive Statistics , you can get all of these statistics. Answer this Q: Is this distribution “normal”? List several ways you know.   Excel Formula Value Mean Median Standard deviation Range 5th percentile 95th percentile QM222 Fall 2016 Section D1

Descriptive statistics in Stata . sum Earnings Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- Earnings | 2200 78376.55 67653.98 0 382000 . sum Earnings, detail Earnings ------------------------------------------------------------- Percentiles Smallest 1% 0 0 5% 13913.5 0 10% 23044 0 Obs 2200 25% 40000 0 Sum of Wgt. 2200 50% 60984 Mean 78376.55 Largest Std. Dev. 67653.98 75% 93754.5 382000 90% 147483 382000 Variance 4.58e+09 95% 201000 382000 Skewness 2.424376 99% 382000 382000 Kurtosis 10.11437 . QM222 Fall 2016 Section D1

Relationship between 2 variables QM222 Fall 2016 Section D1

Scatterplots can tell us The direction (sign) of relationship between two variables (is the slope positive or negative?) The form of the relationship: linear vs. curved The strength of relationship If there are outliers QM222 Fall 2016 Section D1

Example: The Midwest seems to have the best SAT math scores Example: The Midwest seems to have the best SAT math scores. But is this because fewer high schoolers in the Midwest take the SAT? QM222 Fall 2016 Section D1

Example: The Midwest seems to have the best SAT math scores Example: The Midwest seems to have the best SAT math scores. But is this because fewer high schoolers in the Midwest take the SAT? QM222 Fall 2016 Section D1

Use a scatter plot! Each dot represents one “observation”, one data point QM222 Fall 2016 Section D1

Use a scatter plot! Each dot represents one “observation”, one data point What is an observation in this data set? A state. If I made a line, would the slope be positive or negative? Negative Would a line or a curve fit better? Probably a curve. Is the relationship strong? Hmmm…. kind of Are there outliers? Not really far out ones. QM222 Fall 2016 Section D1

Making scatter diagrams in Excel In class exercise: Class 4: Open UniversityAdmissions_SAT.xlsx (a data set from NYC) in on sites.bu.edu/qm222projectcourse/other materials/data and other materials used in class Place the two columns you want in your graph side-by-side. The variable you want on the x-axis should be on the left. Make sure the top row of each column has a descriptive label for the variable. On the Insert tab, click the picture of a scatter diagram and then click on the first scatter with only markers and with no connecting lines. What does each observation represent? Make a scatter diagram with the school’s math mean score on the Y-axis and the school’s reading score on the X-axis. QM222 Fall 2016 Section D1

Your scatter diagram from Excel… QM222 Fall 2016 Section D1

Making a scatter diagram in Stata graph twoway scatter MathematicsMean ReadingMean QM222 Fall 2016 Section D1

We’d also like a numerical measure of how closely two variables move together: the Correlation coefficient The correlation (coefficient) tells us two things: The direction of association: When X goes up, does Y go up or down? The strength of the association: How closely related are Y and X, or, how strong is the link? It doesn’t tell us if the relationship is linear or curved – In fact, it assumes that the relationship is linear. QM222 Fall 2016 Section D1

Correlation coefficient: notation r or ρ A positive correlation coefficient means: that when we see a higher value for one variable, we also tend to see a higher value for the other variable. A negative correlation coefficient means that when we see a higher value for one variable, we tend to see a lower value for the other variable. QM222 Fall 2016 Section D1

Correlation coefficient A correlation coefficient that is zero means that there is no correlation If you did a scatter of X and Y, the dots would seem to have no relationship. QM222 Fall 2016 Section D1

The correlation coefficient is between 1 & -1 Closer to |1| means a stronger association When r = 1 there is perfect positive correlation; if you did a scatter of X and Y, the dots would all lie exactly on an upward sloping line. When r = -1 there is perfect negative correlation; if you did a scatter of X and Y, the dots would all lie exactly on a downward sloping line. When r = 0 there is no correlation; if you did a scatter of X and Y, the dots would seem to have no relationship with each other. If you were to fit a line to the dots, it would be flat (since Y doesn’t change as X changes). QM222 Fall 2016 Section D1

How do you think the correlation coefficients compare in Figure A and Figure B below? QM222 Fall 2016 Section D1

How do you think the correlation coefficients compare in Figure A and Figure B below? Both are positive. Figure B fits more tightly around the line – its correlation coefficient is closer to 1. The fact that one is steeper doesn’t affect the correlation. QM222 Fall 2016 Section D1

Correlation in Excel To get the correlation (between 2 variables in Excel, =CORREL(range X, range Y) (Or, in Excel Data--In Data Analysis- Correlation, you can get the correlation between a all variables in a range.) In-Class exercise using UniversityAdmissions_SAT.xlsx: 1. Get the correlation between the math and reading school mean scores. 2. Get the correlation between the number of test takers and the reading mean scores. QM222 Fall 2016 Section D1

In Stata correlate MathematicsMean ReadingMean NumberofTestTakers (obs=78) | Mathem~n Readin~n Number~s -------------+--------------------------- Mathematic~n | 1.0000 ReadingMean | 0.8831 1.0000 NumberofTe~s | 0.0712 -0.0033 1.0000 QM222 Fall 2016 Section D1

Interpreting the values of correlation Measured correlations are almost never exactly 0, 1, or –1 A claim that two variables are uncorrelated typically means that the correlation is “near” 0 No absolute standard for what is a strong correlation, what is a weak correlation, and what is no correlation QM222 Fall 2016 Section D1

Correlation v. relation The correlation coefficient measures the strength of linear relationship. A low value is not enough to conclude a lack of a strong link between the two variables. This picture has a near zero correlation … The two variables are very related, but it’s not a line with a single slope, but. QM222 Fall 2016 Section D1

Correlation does not mean Causation (i.e. one thing causes another) https://www.youtube.com/watch?v=8B271L3NtAw QM222 Fall 2016 Section D1

Why correlation does not imply causation Possible explanations for correlation between x and y: X causes Y a change in X will change Y. Y causes X a change in Y will change X X causes Y AND Y causes X this is known as simultaneity Another variable(s) cause both X and Y this is called a confounding factor QM222 Fall 2016 Section D1

Let’s go through the examples in the video… which is it?: A. X causes Y B. Y causes X C. X causes Y AND Y causes X (simultaneity) D. Another variable(s) cause both X and Y (confounding factor) Ice cream (X) causes drownings (Y). Married men live longer than single men. Infants who sleep with the lights on tend to grow up short-sighted. Self esteem causes good grades. QM222 Fall 2016 Section D1

Assignment 2 paraphrased (from sites.bu.edu/qm222projectcourse) What specific question or questions will your project address? What company, governmental body or other organization would be interested in knowing the answer to this question? What data source(s) are you using? In your data, what does each observation represent? What is the dependent variable(s) you plan to focus on? (Need the name from the dataset or how you are going to make it from other variables from the data set. What is the main explanatory variable(s) that you will focus on? (Need name from dataset or how you are making it, as above.)   What additional, possibly confounding variables, can you measure that you planning to include in your analysis? (Again, use the specific variable name in the dataset.) QM222 Fall 2016 Section D1