QM222 Nov. 9 Section D1 Visualizing Using Graphs More on your project Test returned QM222 Fall 2016 Section D1.

Slides:



Advertisements
Similar presentations
1 Business 90: Business Statistics Professor David Mease Sec 03, T R 7:30-8:45AM BBC 204 Lecture 5 = More of Chapter “Presenting Data in Tables and Charts”
Advertisements

1 Doing Statistics for Business Doing Statistics for Business Data, Inference, and Decision Making Marilyn K. Pelosi Theresa M. Sandifer Chapter 5 Analyzing.
Types of Data Displays Based on the 2008 AZ State Mathematics Standard.
CS1100: Computer Science and Its Applications Creating Graphs and Charts in Excel.
Week 4 LSP 120 Joanna Deszcz. 3 Types of Graphs used in QR  Pie Charts Very limited use Category sets must make a whole  XY Graphs or Line Graphs Use.
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 12 Describing Data.
Variable  An item of data  Examples: –gender –test scores –weight  Value varies from one observation to another.
Time series Model assessment. Tourist arrivals to NZ Period is quarterly.
June 21, Objectives  Enable the Data Analysis Add-In  Quickly calculate descriptive statistics using the Data Analysis Add-In  Create a histogram.
Relationships If we are doing a study which involves more than one variable, how can we tell if there is a relationship between two (or more) of the.
Using Google Sheets To help with data. Sheets is a spreadsheet program that can interface with Docs, or Slides A spreadsheet program has cells (little.
Correlation and Regression Stats. T-Test Recap T Test is used to compare two categories of data – Ex. Size of finch beaks on Baltra island vs. Isabela.
2. Graphing Sci. Info Skills.
Chapter 2 Linear regression.
Warm-up Get a sheet of computer paper/construction paper from the front of the room, and create your very own paper airplane. Try to create planes with.
Sit in your permanent seat
Sit in your permanent seat
Analysis of Time Series Data
EMPA Statistical Analysis
Some tips on which visuals to use (and which not to use) and when
QM222 Class 11 Section D1 1. Review and Stata: Time series data, multi-category dummies, etc. (chapters 10,11) 2. Capturing nonlinear relationships (Chapter.
PROJECT ON MS-EXCEL.
We know about inserting numbers in Excel and how to sum and average numbers. Insert these numbers and in Cell A9, find the average of the numbers. In.
QM222 Class 10 Section D1 1. Goodness of fit -- review 2
QM222 Nov. 7 Section D1 Multicollinearity Regression Tables What to do next on your project QM222 Fall 2016 Section D1.
Introduction to Excel 2007 January 29, 2008.
CHAPTER 7 LINEAR RELATIONSHIPS
QM222 Class 13 Section D1 Omitted variable bias (Chapter 13.)
QM222 Class 16 & 17 Today’s New topic: Estimating nonlinear relationships QM222 Fall 2017 Section A1.
Graphics GrowingKnowing.com © 2013.
Tutorial 4: Enhancing a Workbook with Charts and Graphs
Module 11 Math 075. Module 11 Math 075 Bivariate Data Proceed similarly as univariate distributions … What is univariate data? Which graphical models.
QM222 A1 More on Excel QM222 Fall 2017 Section A1.
Analyzing and Interpreting Quantitative Data
QM222 A1 Visualizing data using Excel graphs
How could data be used in an EPQ?
QM222 A1 Nov. 27 More tips on writing your projects
QM222 A1 How to proceed next in your project Multicollinearity
Ms jorgensen Unit 1: Statistics and Graphical Representations
Proposal: Preliminary Results and Discussion
Regression and Residual Plots
Module 6: Presenting Data: Graphs and Charts
Edexcel: Large Data Set Activities
Chapter 8 Part 2 Linear Regression
DAY 3 Sections 1.2 and 1.3.
CPSC 531: System Modeling and Simulation
Data Presentation Carey Williamson Department of Computer Science
Using Charts in a Presentation
Chapter 2 Looking at Data— Relationships
Thinking About Psychology The Science of Mind and Behavior 3e
Graphs with SPSS.
CHAPTER 3 Describing Relationships
Analyzing Bivariate Data
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Topic 7: Visualization Lesson 1 – Creating Charts in Excel
Statistical Reasoning
Carey Williamson Department of Computer Science University of Calgary
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Higher National Certificate in Engineering
CHAPTER 3 Describing Relationships
Charts A chart is a graphic or visual representation of data
Introduction to Excel 2007 Part 3: Bar Graphs and Histograms
Introduction to Excel 2007 Part 1: Basics and Descriptive Statistics Psych 209.
CHAPTER 3 Describing Relationships
Presentation transcript:

QM222 Nov. 9 Section D1 Visualizing Using Graphs More on your project Test returned QM222 Fall 2016 Section D1

Use Tables to report several regressions Your different regressions will have different combinations of variables. Why present more than 1 regression? -To develop your ideas. -Or for different dependent variables (list in column title.) QM222 Fall 2016 Section D1

Review: What to do next on your project After you hand in assignment 6 and are clear on what you think your regressions and your answers to the question will be, sign up for an appointment (and a presentation, if you can predict) at https://docs.google.com/spreadsheets/d/1pcfaSpsS6TISPccPs Jf7ykuhTIzqsWN7OqURMO_BLM4/edit#gid=0 DO NOT ERASE OTHER NAMES! QM222 Fall 2016 Section D1

Assignment 6 Ideally, have by Friday. Post your current data set under Stata data set (if you can). Run additional multiple regressions. Specifically: Think hard about whether there are additional omitted variables (i.e. confounding factors) that you can measure that are likely to be biasing your key coefficient(s). If you can find data on them, add them into the regressions. (If you really cannot think of anything beyond what you have, just write that.) Identify at least one omitted variable that you cannot measure, reason out the sign of the omitted variable bias and explain here (Ass.6) in 1-3 sentences why and in what direction it will bias your key coefficient. QM222 Fall 2016 Section D1

Assignment 6 cont. If you have any numeric explanatory (X) variable, add a quadratic term in addition to your other variables to test if this nonlinear specification fits better. (If you are good at math and prefer to add a different nonlinear variable or to make your dependent variable non-linear, be my guest.) Explain here (Ass.6) what you learn from this result (1-3 sentences). Explain/show (e.g. with graph) what you learn from this. If you have a numeric explanatory (X) variable that is very skewed, think about whether top-coding or taking the log of that variables is appropriate instead. QM222 Fall 2016 Section D1

Not in Assignment 6 If you have a very skewed Y (dependent) variable Try top-coding it (if you think that once it reaches a quite high level, it doesn’t matter how much higher it gets) Try changing it into an indicator variable Try estimating the median Y, replacing regress with qreg. QM222 Fall 2016 Section D1

Assignment 6 cont. Think about if you can and should use an interaction term. (This will be most useful if you think that different groups have different slopes.) Try at least one out in a multiple regression (with all your other variables as well). Copy and paste here (PS 6) Explain here what you learn from this interaction term result (1-3 sentences). QM222 Fall 2016 Section D1

More generally, ask yourself if your regressions are really answering the question…. I like sophisticated approaches if you are using them correctly, if they are the most appropriate way to answer your question. QM222 Fall 2016 Section D1

Assignment 6 cont. Decide which is the best regression or set of regressions that you will use in your project. Update your Current Project Status including replacing/adding these regressions to Question 7. Also answer Question 9, which asks for the conclusions of your project, as it now stands. The more fully you answer Questions 7 and 9, the better feedback I can give you at your required meeting #2. QM222 Fall 2016 Section D1

NEW: Thinking about interpretation: What variables are important? Sometimes, a variable can have a large t-stat but not really make a large difference to your predicted Y. To see if it does make a large difference, make this calculation: X coefficient * (highest X value in dataset – lowest X value) That tells you the maximum that the variation in X can change the Y or X coefficient * (95th percentile X value – 5th percentile X value) Another approach: Run the regression with and without your variable and see how the adjusted R-square changes. QM222 Fall 2015 Section D1

Data Visualizations Making Informative Graphs in Excel This is more useful and easier to learn making graphs in Excel. You can copy any output from Stata into Excel and divide it into columns by choosing Data →Text to Columns Here we use the excel file Questrom Starting Salaries (under Other materials --Data and other materials used in class) QM222 Fall 2016 Section D1

QM222 Fall 2016 Section Sections E1 & G1 Some pictures are worth 1000 words. Some are worthless or even harmful. Making graphs in Excel is easy. If you have a table, Excel will suggest graphs for you. However, it is your job to make sure that your graphs clearly convey the information that you want the viewer to get. ALWAYS ASK: CAN THE VIEWER EASILY UNDERSTAND THE DATA SHOWN IN THE GRAPH? DOES THE GRAPH MAKE THE POINT YOU WANT IT TO MAKE? Is it formatted in a way that is user-friendly? e.g. have clear titles, label axes, etc. Is it the right type of graph to make your point? e.g. should I use a bar chart, line graph or what? QM222 Fall 2016 Section Sections E1 & G1

Often, Excel will suggest graphs for you: e. g Often, Excel will suggest graphs for you: e.g. Highlight the starting salaries and click on insert – recommended charts Don’t always choose their recommendation– You want to make sure that your graphs clearly convey the information that you want the viewer to get. Is it the right type of graph to make your point? E.g. should I use a bar chart, line graph or what? Is it formatted in a way that is user-friendly? e.g. have clear titles, label axes, etc. QM222 Fall 2016 Section D1

General Tips for User-friendly Graphs Use a descriptive title Label both axes so viewers understand exactly what they are. Choose good unit intervals always asking yourself, “What most clearly conveys the information?” Choices you make include tick marks, width between bars, thickness of lines etc. Minimize chart-junk… Don’t include distracting information that doesn’t convey information.. Minimize eye movement: Try to make it possible for the viewer to get the message without having to look all over the graph or look back and forth. Do not use graphics that mislead. QM222 Fall 2016 Section Sections E1 & G1

Here is a chart of the distribution of Questrom graduates by concentration It’s a column chart (also called a bar chart more generally). How can you make this chart more informative and user friendly? Let’s do it together. Change the title (and make it darker.) Label the axes (and make it darker.). Get rid of the useless legend. QM222 Fall 2016 Section D1

A more user-friendly, informative graph? Do you think the vertical ticks and lines, and the width of the columns v. the spaces between make this chart easy to use? If not, right click columns and format data series. Here we decreased the gap. QM222 Fall 2016 Section D1

Wouldn’t it be clearer if you first ranked the columns or bars, graphing from highest to lowest? I’ve further decreased the gap width. You could instead make a sideways bar chart. Which chart do you prefer and why? QM222 Fall 2016 Section D1

What is the best type of graph for what purpose? If the question is: What proportion (or count) is in each category? A bar chart is probably best. Pie charts should be avoided in most situations (except with few categories). Why? QM222 Fall 2016 Section Sections E1 & G1

What is the best type of graph for what purpose? If the question is: How did a variable change – either over time, or as another variable changes? Typically a line chart works best here. Which shows the trend most clearly? QM222 Fall 2016 Section Sections E1 & G1

What is the best type of graph for what purpose? If the question is: How do average values vary across categories? Here, too, the bar chart seems best. Ordering is a good idea (from small to large or vice versa.) QM222 Fall 2016 Section Sections E1 & G1

What is the best type of graph for what purpose? If the question is: How do proportions change over time (or across categories?) You might try a 100% stacked column chart (on left) or a 100% stacked area chart (on right). Which answers the question more clearly? QM222 Fall 2016 Section Sections E1 & G1

What is the best type of graph for what purpose? Clustered column/bar charts are good for conveying different proportions or values across 2 kinds of categories. QM222 Fall 2016 Section D1

Another example of a bar chart with categories Source: Pew Research QM222 Fall 2016 Section D1

Scatterplots can tell us The direction (sign) of relationship between two variables (is the slope positive or negative?) The form of the relationship: linear vs. curved The strength of relationship If there are outliers QM222 Fall 2016 Section D1

Making scatter diagrams in Excel In this example, I am using a dataset on average NYC public schools’ SAT scores (on the E1 G1 website as UniversityAdmissions_SAT) . What does each observation represent? I want to make a scatter diagram with the school’s math mean score on the Y-axis and the school’s reading score on the X-axis. Place the two columns you want in your graph side-by-side. The variable you want on the x-axis should be on the left. Make sure the top row of each column has a descriptive label for the variable. On the Insert tab, click the picture of a scatter diagram and then click on the first scatter with only markers and with no connecting lines. QM222 Fall 2016 Section D1

Some Bad Charts QM222 Fall 2016 Section D1

What’s Wrong with this graph? 3-D graphs should be avoided. QM222 Fall 2016 Section D1

How about this one? Even 3-D bars should be avoided. HT: Victoria QM222 Fall 2016 Section D1

NO CHART JUNK Too much chart junk, too many details that take away from the clarity. QM222 Fall 2016 Section D1

Questrom starting salaries: Some questions What kind of chart would you use to answer the questions: How have Questrom average salaries changed over time? Has the distribution of concentrations of students have changed over time? To actually do this, it will be helpful to know first about Pivot Tables QM222 Fall 2016 Section D1

Pivot Tables One of Excel’s features most commonly used in business and in summer internships Allow us to quickly compute statistics across categories: Counts Sums Means (Averages) Max/Min Product Std deviation, variance Percentage of total Making these tables are really data analysis. You can also make charts from these pivot table (in PCs) OPEN DATA SET QUESTROM STARTING SALARIES CLASS 5

QM222 Fall 2015 Section Sections C1 & F1 Pivot Tables Make sure every column has a label. Highlight the data range including labels. PCs: Click Insert – Pivot table (icon on far left) Macs: Pivot Tables are either in the Analysis or Data group A new worksheet opens with a blank pivot table fields list box. QM222 Fall 2015 Section Sections C1 & F1

QM222 Fall 2015 Section Sections C1 & F1 Pivot tables: View of empty pivot table and pivot table field list box (list of variables here from previous version of dataset.) QM222 Fall 2015 Section Sections C1 & F1

QM222 Fall 2015 Section Sections C1 & F1 Tip Notice the PivotTable Field box disappears when a cell outside the created Pivot Table is selected But it will re-appear again when you select a cell within the Pivot Table.  QM222 Fall 2015 Section Sections C1 & F1

QM222 Fall 2016 Section Sections E1 & G1 In Class Exercise OPEN DATA SET QUESTROM STARTING SALARIES Create a pivot table that describes how average salaries have changed over time. Graph these results. Create a pivot table that shows how the distribution of concentrations of students have changed over time? Graph these results QM222 Fall 2016 Section Sections E1 & G1

Mean=81.7 Median=81 QM222 Fall 2016 Section D1