1 i247: Information Visualization and Presentation Marti Hearst Graphing and Basic Statistics.

Slides:



Advertisements
Similar presentations
Lesson 10: Linear Regression and Correlation
Advertisements

7.1 Seeking Correlation LEARNING GOAL
Correlation and Linear Regression.
Chapter Thirteen McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. Linear Regression and Correlation.
Chapter 3 Bivariate Data
1 Objective Investigate how two variables (x and y) are related (i.e. correlated). That is, how much they depend on each other. Section 10.2 Correlation.
Education 793 Class Notes Joint Distributions and Correlation 1 October 2003.
Correlation & Regression Chapter 15. Correlation statistical technique that is used to measure and describe a relationship between two variables (X and.
CORRELATON & REGRESSION
Chapter 2: Looking at Data - Relationships /true-fact-the-lack-of-pirates-is-causing-global-warming/
Correlation and Regression Analysis
Correlation MARE 250 Dr. Jason Turner.
Correlation A correlation exists between two variables when one of them is related to the other in some way. A scatterplot is a graph in which the paired.
RESEARCH STATISTICS Jobayer Hossain Larry Holmes, Jr November 6, 2008 Examining Relationship of Variables.
10-2 Correlation A correlation exists between two variables when the values of one are somehow associated with the values of the other in some way. A.
1 Simple Linear Regression Linear regression model Prediction Limitation Correlation.
Introduction to Excel 2007 Part 1: Basics and Descriptive Statistics Psych 209.
Correlation and Linear Regression
Types of Graphs Creating a Graph With Microsoft Excel.
STATISTICS ELEMENTARY C.M. Pascual
Correlation and Regression
Linear Regression and Correlation
Relationship of two variables
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 12 Describing Data.
Correlation.
Chapter 3 Describing Bivariate Data General Objectives: Sometimes the data that are collected consist of observations for two variables on the same experimental.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Chapter 14 – Correlation and Simple Regression Math 22 Introductory Statistics.
1 Chapter 9. Section 9-1 and 9-2. Triola, Elementary Statistics, Eighth Edition. Copyright Addison Wesley Longman M ARIO F. T RIOLA E IGHTH E DITION.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Section 10-1 Review and Preview.
© The McGraw-Hill Companies, Inc., 2000 Business and Finance College Principles of Statistics Lecture 10 aaed EL Rabai week
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
LECTURE UNIT 7 Understanding Relationships Among Variables Scatterplots and correlation Fitting a straight line to bivariate data.
Probabilistic and Statistical Techniques 1 Lecture 24 Eng. Ismail Zakaria El Daour 2010.
M22- Regression & Correlation 1  Department of ISM, University of Alabama, Lesson Objectives  Know what the equation of a straight line is,
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Lecture Slides Elementary Statistics Eleventh Edition and the Triola.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Notes Bivariate Data Chapters Bivariate Data Explores relationships between two quantitative variables.
1 Chapter 10 Correlation and Regression 10.2 Correlation 10.3 Regression.
Chapter 10 Correlation and Regression
1 Everyday is a new beginning in life. Every moment is a time for self vigilance.
Correlation & Regression
Notes Bivariate Data Chapters Bivariate Data Explores relationships between two quantitative variables.
Statistics Class 7 2/11/2013. It’s all relative. Create a box and whisker diagram for the following data(hint: you need to find the 5 number summary):
Basic Concepts of Correlation. Definition A correlation exists between two variables when the values of one are somehow associated with the values of.
Regression MBA/510 Week 5. Objectives Describe the use of correlation in making business decisions Apply linear regression and correlation analysis. Interpret.
1.1 example these are prices for Internet service packages find the mean, median and mode determine what type of data this is create a suitable frequency.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
11/23/2015Slide 1 Using a combination of tables and plots from SPSS plus spreadsheets from Excel, we will show the linkage between correlation and linear.
Chapter 10 Correlation and Regression Lecture 1 Sections: 10.1 – 10.2.
The Big Picture Where we are coming from and where we are headed…
Chapter 4 Summary Scatter diagrams of data pairs (x, y) are useful in helping us determine visually if there is any relation between x and y values and,
Chapter 9: Correlation and Regression Analysis. Correlation Correlation is a numerical way to measure the strength and direction of a linear association.
Scatter Diagrams scatter plot scatter diagram A scatter plot is a graph that may be used to represent the relationship between two variables. Also referred.
Chapter 2 Examining Relationships.  Response variable measures outcome of a study (dependent variable)  Explanatory variable explains or influences.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Chapter 10 Correlation and Regression 10-2 Correlation 10-3 Regression.
Regression Analysis. 1. To comprehend the nature of correlation analysis. 2. To understand bivariate regression analysis. 3. To become aware of the coefficient.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
SOCW 671 #11 Correlation and Regression. Uses of Correlation To study the strength of a relationship To study the direction of a relationship Scattergrams.
1 MVS 250: V. Katch S TATISTICS Chapter 5 Correlation/Regression.
Slide Slide 1 Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley. Lecture Slides Elementary Statistics Tenth Edition and the.
©The McGraw-Hill Companies, Inc. 2008McGraw-Hill/Irwin Linear Regression and Correlation Chapter 13.
Slide Slide 1 Chapter 10 Correlation and Regression 10-1 Overview 10-2 Correlation 10-3 Regression 10-4 Variation and Prediction Intervals 10-5 Multiple.
Describing Relationships. Least-Squares Regression  A method for finding a line that summarizes the relationship between two variables Only in a specific.
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 9 l Simple Linear Regression 9.1 Simple Linear Regression 9.2 Scatter Diagram 9.3 Graphical.
Correlation & Linear Regression Using a TI-Nspire.
Part II Exploring Relationships Between Variables.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Welcome to the Unit 5 Seminar Kristin Webster
Presentation transcript:

1 i247: Information Visualization and Presentation Marti Hearst Graphing and Basic Statistics

2 Today Just for Fun: The Daily Show Graphing Practice Basic Statistics in Graphing Correlations and Scatterplots Sparklines

3 A Daily Show: Full Color Coverage Ok, I think it’s good that the news outlets are showing charts and graphs and color coding the candidates consistently. But … then they go crazy!

4 Class Exercise: Graphing Practice (Taken from Few’s “Show Me the Numbers”) You work for the CFO, who thinks expenses are excessive. Please provide her with a report that shows, for the current quarter, expenses to date compared to what was budgeted, organized by department.

5 Class Exercise: Graphing Practice Create a graph that shows both monthly revenues and monthly expenses, while at the same time highlighting the overall trends for profit over time.

6 Combining Bar Charts with a Line Graph (Few 2006)

7 Means vs Medians What’s the difference between the median salary in Seattle and the mean (average)?

8 Means and Medians in Tableau

9 Few’s Comparisons of Data Sets with the Same Medians

10 Means and Standard Deviations

11 An Alternative: Show the Range of the Variance Graphically

12 Tukey’s Box Plots (Few 2006)

13 Box Plots in Action Comparing preferred search result snippet length for different types of queries.

14 Few’s Bullet Graphs Goal: Display a key measure along with a comparative measure and qualitative ranges. An alternative to gauges and meters on dashboards.

15 Few’s Bullet Graphs

16 Cascading Bullet Graphs

17 Showing Correlations Through Scatterplots Example: Height vs Weight

18 Scatterplot Comparing Two Data Sets (Few 2006)

19 Scatterplot with Two Trend Lines (Few 2006)

20 Slide adapted from David Lippman's Correlation A correlation exists between two variables when one of them is related to the other in some way. A scatterplot is a graph in which the paired (x,y) sample data are plotted on a graph. The linear correlation coefficient r measures the strength of the linear relationship. Also called the Pearson correlation coefficient. Ranges from -1 to 1. r = 1 represents a perfect positive correlation. r = 0 represents no correlation r = -1 represents a perfect negative correlation

21 Slide adapted from David Lippman's Perfect positive Strong positive Positive correlation r = 1 correlation r = 0.99 correlation r = 0.80 Strong negative No Correlation Non-linear correlation r = r = 0.16 relationship

22 Slide adapted from David Lippman's Finding the correlation coefficient Can compute in excel (r2 in Tableau)

23 r 2 in Tableau

24 r 2 in Tableau

25 Slide adapted from David Lippman's Meanings r 2 represents the proportion of the variation in y that is explained by the linear relationship between x and y. Example: Using the heights and weights for a group of people, you find the correlation coefficient to be: r = 0.796, so r 2 = So we conclude that about 63.4% of the peoples’ weight can be explained by the relationship between height and weight. This suggests that 36.6% of the variation in weights cannot be explained by height.

26 Slide adapted from David Lippman's Bear in mind: Correlation does not imply causation. For example, there is a strong correlation between golf scores and salaries for CEOs. This does not imply that one can improve their salary by getting better at golf. Often times there are hidden variables, which is something that affects both variables being studied, but is not included in the study. Beware data based on averages. Averages suppress individual variation, and can artificially inflate the correlation coefficient. Look out for non-linear relationships. Just because there is no linear correlation does not mean that the variables might not be related in another way.

27 Slide adapted from David Lippman's Regression If there is a relationship between x and y, we might want to find the equation of a line that best approximates the data. This is called the regression line (also called best-fit line or least-squares regression line). We can use this line to make predictions.

28 Slide adapted from David Lippman's Example: Relationship between Tree Circumference and Height

29 Slide adapted from David Lippman's Tree Example There is a positive correlation between the circumference of a tree and its height (r = 0.828). The regression line has the equation: We could use this equation to estimate the height of a tree with circumference 4ft:

30 Slide adapted from David Lippman's Relationship between Tree Circumference and Height Outliers can strongly influence the graph of the regression line and inflate the correlation coefficient. In the above example, removing the outlier drops the correlation coefficient from r = to r =

31 Regression Formulae

32 Regression Coefficients in Tableau Also, significance testing

33 Same Regression Line, Very Different Distributions Anscombe: For all 4: Y=3+0.5X r2 =.67

34 ANOVA in Tableau online/Output/wwhelp/wwhimpl/js/html/wwhelp.htm

35 Scatter Plot Understandability Matthew Ericson, NYTimes Graphics Chief, noted that most people don’t understand scatter plots.

36 Scatter Plot Understandability Their strategy: –Use them infrequently –When you do use them, break them down and explain carefully.

37 Illustration from NYTimes

38 Illustration from NYTimes

39 A Scatter Plot Alternative: Few’s Correlation Bar Graph

40 Another Example from Few: Paired Bar Graph with Trend Lines

41 Tufte’s Sparklines Give a hint of the trend, but don’t show the actual axes and scales. Good for dashboards and small spaces. –A product call Bonavista microcharts does this nicely in excel Application: peer2patent.org website

42 peer2patent.org

43 Next Two Weeks Mon 18: Perceptual Principles –Few Chapter 4 Wed 20: Graphical Excellence –Tufte pages Mon 25: How to Critique a Viz –Few Wed 27: Graphical Integrity –Tufte pages For the Tufte days, bring your book so we can all look at the same illustration –Each student will lead a discussion of 2 pages of Tufte and do it in 5 minutes.