Dancing with the data Chong Ho Yu (Alex). Agenda Difference between static and dynamic graphics Visualization techniques from 1 to 5 dimensions Future.

Slides:



Advertisements
Similar presentations
Chapter 3 Examining Relationships
Advertisements

Inference for Regression
Chapter 1 Displaying the Order in a Group of Numbers
Chapter 15 (Ch. 13 in 2nd Can.) Association Between Variables Measured at the Interval-Ratio Level: Bivariate Correlation and Regression.
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 13 Introduction to Linear Regression and Correlation Analysis.
Analysis of Research Data
Linear Regression and Correlation Analysis
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Today Concepts underlying inferential statistics
Week 14 Chapter 16 – Partial Correlation and Multiple Regression and Correlation.
Chapter 7 Scatterplots, Association, Correlation Scatterplots and correlation Fitting a straight line to bivariate data © 2006 W. H. Freeman.
8/10/2015Slide 1 The relationship between two quantitative variables is pictured with a scatterplot. The dependent variable is plotted on the vertical.
Two-Way Analysis of Variance STAT E-150 Statistical Methods.
MAT 1000 Mathematics in Today's World. Last Time We saw how to use the mean and standard deviation of a normal distribution to determine the percentile.
Slide 1 SOLVING THE HOMEWORK PROBLEMS Simple linear regression is an appropriate model of the relationship between two quantitative variables provided.
How to Analyze Data? Aravinda Guntupalli. SPSS windows process Data window Variable view window Output window Chart editor window.
Fundamentals of Statistical Analysis DR. SUREJ P JOHN.
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 12 Describing Data.
Exploratory Data Analysis. Computing Science, University of Aberdeen2 Introduction Applying data mining (InfoVis as well) techniques requires gaining.
SPSS Series 1: ANOVA and Factorial ANOVA
Association between 2 variables We've described the distribution of 1 variable in Chapter 1 - but what if 2 variables are measured on the same individual?
September In Chapter 14: 14.1 Data 14.2 Scatterplots 14.3 Correlation 14.4 Regression.
PSYCHOLOGY: Themes and Variations Weiten and McCann Appendix B : Statistical Methods Copyright © 2007 by Nelson, a division of Thomson Canada Limited.
BPS - 3rd Ed. Chapter 211 Inference for Regression.
LECTURE UNIT 7 Understanding Relationships Among Variables Scatterplots and correlation Fitting a straight line to bivariate data.
Quantitative Skills 1: Graphing
A statistical method for testing whether two or more dependent variable means are equal (i.e., the probability that any differences in means across several.
Chapter Two: Summarizing and Graphing Data 2.2: Frequency Distributions 2.3: ** Histograms **
Review of Statistical Models and Linear Regression Concepts STAT E-150 Statistical Methods.
Why Is It There? Getting Started with Geographic Information Systems Chapter 6.
Statistics in Applied Science and Technology Chapter 13, Correlation and Regression Part I, Correlation (Measure of Association)
Introduction to Descriptive Statistics Objectives: 1.Explain the general role of statistics in assessment & evaluation 2.Explain three methods for describing.
1 Chapter 4: More on Two-Variable Data 4.1Transforming Relationships 4.2Cautions 4.3Relations in Categorical Data.
2 Categorical Variables (frequencies) Testing mean differences of a continuous variable between groups (categorical variable) 2 Continuous Variables 2.
Objectives (IPS Chapter 2.1)
Lab 5 instruction.  a collection of statistical methods to compare several groups according to their means on a quantitative response variable  Two-Way.
Chapter 2 Statistical Concepts Robert J. Drummond and Karyn Dayle Jones Assessment Procedures for Counselors and Helping Professionals, 6 th edition Copyright.
Social Science Research Design and Statistics, 2/e Alfred P. Rovai, Jason D. Baker, and Michael K. Ponton Within Subjects Analysis of Variance PowerPoint.
Correlation & Regression Chapter 15. Correlation It is a statistical technique that is used to measure and describe a relationship between two variables.
ITEC6310 Research Methods in Information Technology Instructor: Prof. Z. Yang Course Website: c6310.htm Office:
Lecture 10: Correlation and Regression Model.
Association between 2 variables We've described the distribution of 1 variable - but what if 2 variables are measured on the same individual? Examples?
Chapter 4 - Scatterplots and Correlation Dealing with several variables within a group vs. the same variable for different groups. Response Variable:
Displaying Data  Data: Categorical and Numerical  Dot Plots  Stem and Leaf Plots  Back-to-Back Stem and Leaf Plots  Grouped Frequency Tables  Histograms.
26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.
Jump to first page Inferring Sample Findings to the Population and Testing for Differences.
SUMMARY EQT 271 MADAM SITI AISYAH ZAKARIA SEMESTER /2015.
Graphs with SPSS Aravinda Guntupalli. Bar charts  Bar Charts are used for graphical representation of Nominal and Ordinal data  Height of the bar is.
CHAPTER 11 Mean and Standard Deviation. BOX AND WHISKER PLOTS  Worksheet on Interpreting and making a box and whisker plot in the calculator.
BPS - 5th Ed. Chapter 231 Inference for Regression.
(Unit 6) Formulas and Definitions:. Association. A connection between data values.
1 By maintaining a good heart at every moment, every day is a good day. If we always have good thoughts, then any time, any thing or any location is auspicious.
Choosing and using your statistic. Steps of hypothesis testing 1. Establish the null hypothesis, H 0. 2.Establish the alternate hypothesis: H 1. 3.Decide.
Central Bank of Egypt Basic statistics. Central Bank of Egypt 2 Index I.Measures of Central Tendency II.Measures of variability of distribution III.Covariance.
26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.
AP Review Exploring Data. Describing a Distribution Discuss center, shape, and spread in context. Center: Mean or Median Shape: Roughly Symmetrical, Right.
CHAPTER 10 DATA EXPLORATION 10.1 Data Exploration Box 10.1 Data Visualization Descriptive Statistics Box 10.2 Descriptive Statistics Graphs.
Practical Statistics Abbreviated Summary.
Week 14 Chapter 16 – Partial Correlation and Multiple Regression and Correlation.
(Residuals and
Ungraded quiz Unit 3.
An Introduction to Correlational Research
Association between 2 variables
Xbar Chart By Farrokh Alemi Ph.D
Association between 2 variables
Chapter 3 Central tendency and variation
Multi-dimensional data visualization
Ungraded quiz Unit 5.
Z-test and T-test Chong Ho (Alex) Yu 8/12/2019 1:50 AM
Presentation transcript:

Dancing with the data Chong Ho Yu (Alex)

Agenda Difference between static and dynamic graphics Visualization techniques from 1 to 5 dimensions Future trend: multi-panel visualization to go beyond 5 dimensions Hand-on exercises using JMP.

Opposition Fisher (1932) said:  Diagrams prove nothing, but bring outstanding features readily to the eye; they are therefore no substitute for such critical tests as may be applied to the data. Today many researchers insist that reporting the numbers is sufficient. How can we spot outliers, check assumptions (e.g. linearity, normality), identify patterns (e.g. clusters), evaluate model adequacy (e.g. residuals) without looking at the data?

Opposition “Many journal articles do not display graphics.” Because it is expensive! It’ll cost you an arm and a leg!

See beyond the horizon! Can you do this in a print journal? Yu, C. H., & Stockford, S. (2003). Evaluating spatial- and temporal-oriented multi- dimensional visualization techniques for research and instruction. Practical Assessment, Research & Evaluation, 8(17). Retrieved fromhttp://pareonline.net/getvn.asp?v=8&n=17http://pareonline.net/getvn.asp?v=8&n=17

See beyond the horizon! Can you do this in a hard copy?

Numbers may fool you! Anscombe's data is a classical example. Another one: Kurtosis is the relative ratio of the mass of the distribution located in the center vs. in the tails. Kurtosis = 3 → Normal curve. In this example, Kurtosis = 3.2, fairly normal, right? No, there is a lot of central mass, but the histogram shows that the distribution is skewed and there are two outliers.

Static vs. dynamic Static  What you see is what you get.  After the graph is made, you cannot manipulate the graph (changing the background color or the line width is not considered “data manipulation because it cannot give you any insight about the data) Dynamic  The data table and different graphic panels are linked. Changing one would change all others.  You can manipulate the graph to explore the data through different perspectives.

Invoke JMP/SAS in Excel

Boxplot of scores by state

Regression lines by gender The two lines do not look the same, but there is an outlier.

Regression lines by gender Put on a pair of sun glass (don't look at the outlier)

Example: Logistic regression Aged between 45 and 50 → in group 1 and 5.

GIS Map: World

GIS Map The Yankees (Northern states) are doing better. But usually people perceive “red” as “risk”.

Customized GIS Map

GIS Map: County

GIS Map: Zip

Coplot: scores X rank * sex

ANOVA and multiple comparison

SPSS Post hoc multiple comparison In SPSS you have 18 options. When I was a graduate student, I took a course on it.

Diamond plot Grand sample mean: horizontal black line Group means: horizontal line inside each diamond. Confidence intervals: The top of the diamond is the upper bound while the bottom is the lower bound. Quantile: boxplot

Ternary plot: Clustering and Profiling In the era of globalization, how can we define what a USA company is? One argue that if you buy a Korean Kia, you may help reducing the trade deficit.

Clustering pattern There are three clusters, but one company does not belong to any.

Visualizing multiple dimensions by colors and markers I want to know how academic rank and gender moderate the relationship between high school GPA and university test scores.

Right click on the scatterplot and choose row legend. Keep the default color assignment of rank. Now you are viewing three dimensions. Everything is everywhere! Good! No systematic concentration.

Do not assign colors to gender. Use sex symbols for gender marker. A green O is a female sophomore; a red + is a male freshman. Four dimensions Everything is everywhere! Good!

Regression by rank

Linking and brushing What are the characteristics of top performers in college test scores? They are from WA, UT, and CA. Their high school GPA is good but their SAT is not necessarily good.

Prediction Profiler What would the scores be if GPA is low, SAT is high, and household income is low? What would it be if GPA is high, SAT is high, and household income is low? What if….?

Two-way interaction is easy You can do it in Excel. We can extend the two-way plot to three-way in Mathematica or Maple. How about putting 2 two-way plots together?

Dancing with three-way interaction The objective of showing you these graphics is to let you be aware what options you have if you want to do multi-dimensional data visualization in the future. It is NOT required to learn how to create these graphics now. A regression equation is a function. Y is a function of Xs. wisdom.com/multimedia/regression.html

Dancing with three- way interaction Detecting and interpreting three- way interactions in regression may be very complicated. Using a mesh surface is much clearer. Interaction: the effect of X on Y is not consistent across all levels of A and B → regression lines vary If there is NO interaction, there should be no curving or dancing in the movie. Every frame should look the same.

WolframAlpha If you do not have Mathematica or Maple, you can use WolframAlpha. It is free!

How about five dimensions? Bubble plot

What the bubble dance tell you? In 1973 a strong association was found between the two crime rates, but in 1993 their connection became weaker. In both years big cities with a large population size tended to suffer from higher crime rates, with the Northeast region being the worst. The US crime rate has been steadily declining since the 1990s. In 2010, the crime rates appear to be under control. The robbery rate and the rape rate seemed to be negatively correlated. Big cities and Northeast are no longer the most dangerous places to live.

UN Public Data Explorer

Observations Mean years of adult schooling and R&D have a positive relationship. This relationship has been stable for over a decade. Countries that are doing well in both are high in Human Development Index. Size doesn't matter. Some very populated countries are not doing well in both.

Observations Japan has been ahead of the US in spending money for R&D (as a percentage of GDP) for over a decade. On the average Japanese people spend fewer years in school than their American counterparts, but they still invest more in R&D. Compare with other nations, US and Japan are among the top in terms of years of schooling and R&D. US has been leading in years of schooling and Germany catches up in recent years.

SAS Visual Analytics: Multi-panel visualization

Tableau: Multi-panel visualization

The contents are based upon

Assignment 7.1 Open the data set visualization_data.jmp Use Graph builder to make a US map. Show the SAT scores on the map. Which states have best and worst average SAT scores? Do the same as above for GPA. Create boxplots of GPA by academic rank. What are the characteristic?

Assignment 7.2 Create a scatterplot using (X: GPA, Y: scores) Use Race and gender as the Row legends. Is there a systematic pattern? Do race and gender moderate the relationship between high school GPA and college test scores? Use “distributions” to show all variables. Click on Females. Who are they in terms of their attributes of other variables? Do the same for students whose GPA is 3.0 or higher.

Assignment 7.3 Go to and press “Click here to access the data” Under Health, put Expenditure on health, total (% of GNP) into Y. Under Poverty, put Population living below $1.25 per day into X. Under Inequality, put Income Gini coefficient into size Under Composite Index, put Human development index value into color. Choose any two countries as the reference states. Press the play button. What do you see?