how to do a data analysis

Slides:



Advertisements
Similar presentations
Guide to Using Excel 2007 For Basic Statistical Applications To Accompany Business Statistics: A Decision Making Approach, 8th Ed. Chapter 15: Multiple.
Advertisements

Guide to Using Minitab 14 For Basic Statistical Applications To Accompany Business Statistics: A Decision Making Approach, 8th Ed. Chapter 15: Multiple.
Independent t -test Features: One Independent Variable Two Groups, or Levels of the Independent Variable Independent Samples (Between-Groups): the two.
Chapter Fourteen The Two-Way Analysis of Variance.
Design of Experiments and Analysis of Variance
Smith/Davis (c) 2005 Prentice Hall Chapter Thirteen Inferential Tests of Significance II: Analyzing and Interpreting Experiments with More than Two Groups.
ANOVA: Analysis of Variation
LECTURE 3 Introduction to Linear Regression and Correlation Analysis
Guide to Using Minitab For Basic Statistical Applications To Accompany Business Statistics: A Decision Making Approach, 6th Ed. Chapter 14: Multiple Regression.
Independent Sample T-test Formula
Lecture 23: Tues., Dec. 2 Today: Thursday:
Analysis of Variance Chapter 3Design & Analysis of Experiments 7E 2009 Montgomery 1.
Additional HW Exercise 12.9 (a) The amount of air pressure necessary to crack tubing manufactured by a company is of interest. Mean pressure in hundreds.
Stat 112: Lecture 13 Notes Finish Chapter 5: –Review Predictions in Log-Log Transformation. –Polynomials and Transformations in Multiple Regression Start.
Chapter 26: Comparing Counts. To analyze categorical data, we construct two-way tables and examine the counts of percents of the explanatory and response.
Comparing Many Group Means One Way Analysis of Variance.
Lecture 17 Interaction Plots Simple Linear Regression (Chapter ) Homework 4 due Friday. JMP instructions for question are actually for.
Measures of Variability: Range, Variance, and Standard Deviation
McGraw-Hill/Irwin Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. Business Statistics: Communicating with Numbers By Sanjiv Jaggia.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Introduction to Multiple Regression Statistics for Managers.
Chapter 12 Inferential Statistics Gay, Mills, and Airasian
PS 225 Lecture 15 Analysis of Variance ANOVA Tables.
Chapter 13: Inference in Regression
ANOVA Analysis of Variance.  Basics of parametric statistics  ANOVA – Analysis of Variance  T-Test and ANOVA in SPSS  Lunch  T-test in SPSS  ANOVA.
SPSS Series 1: ANOVA and Factorial ANOVA
Chapter 11 HYPOTHESIS TESTING USING THE ONE-WAY ANALYSIS OF VARIANCE.
Chapter 14 Introduction to Multiple Regression
Chapter 12 Examining Relationships in Quantitative Research Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin.
Analysis of Variance (Two Factors). Two Factor Analysis of Variance Main effect The effect of a single factor when any other factor is ignored. Example.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 13 Multiple Regression Section 13.2 Extending the Correlation and R-Squared for Multiple.
Basic Concepts of Correlation. Definition A correlation exists between two variables when the values of one are somehow associated with the values of.
Chapters 8 & 9 Linear Regression & Regression Wisdom.
6/2/2016Slide 1 To extend the comparison of population means beyond the two groups tested by the independent samples t-test, we use a one-way analysis.
Analysis of Variance 1 Dr. Mohammed Alahmed Ph.D. in BioStatistics (011)
Essential Statistics Chapter 161 Review Part III_A_Chi Z-procedure Vs t-procedure.
Analysis of Variance (One Factor). ANOVA Analysis of Variance Tests whether differences exist among population means categorized by only one factor or.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 13-1 Introduction to Regression Analysis Regression analysis is used.
Chapter Seventeen. Figure 17.1 Relationship of Hypothesis Testing Related to Differences to the Previous Chapter and the Marketing Research Process Focus.
CPE 619 One Factor Experiments Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama in.
Smith/Davis (c) 2005 Prentice Hall Chapter Fifteen Inferential Tests of Significance III: Analyzing and Interpreting Experiments with Multiple Independent.
Chapter 11: The ANalysis Of Variance (ANOVA)
Real Estate Sales Forecasting Regression Model of Pueblo neighborhood North Elizabeth Data sources from Pueblo County Website.
Analysis of Variance STAT E-150 Statistical Methods.
Samantha Bellah Adv. Stats Final Project Real Estate Forecasting Regression Model Market: Highland Park Neighborhood Data Sources: Zillow.com E:\PuebloRESales2014Q1Q2.xlsx.
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
Chapter 12 Introduction to Analysis of Variance
Analyze Of VAriance. Application fields ◦ Comparing means for more than two independent samples = examining relationship between categorical->metric variables.
HW 24 Key. 25:39 Emerald Diamonds. This data table of 144 diamonds includes the price (in dollars), the weight (in carats), and the clarity grade of diamonds.
Chapter 15 Analysis of Variance. The article “Could Mean Platelet Volume be a Predictive Marker for Acute Myocardial Infarction?” (Medical Science Monitor,
ANOVA: Analysis of Variation
Predicting Energy Consumption in Buildings using Multiple Linear Regression Introduction Linear regression is used to model energy consumption in buildings.
ANOVA: Analysis of Variation
ANOVA: Analysis of Variation
Chapter 14 Introduction to Multiple Regression
ANOVA: Analysis of Variation
Chapter 13 Multiple Regression
Applied Business Statistics, 7th ed. by Ken Black
Post Hoc Tests on One-Way ANOVA
Post Hoc Tests on One-Way ANOVA
Categorical Variables
Simple Linear Regression
Categorical Variables
Analysis of Variance Correlation and Regression Analysis
Example Problem 3.24 Complete analysis.
Chapter 11: The ANalysis Of Variance (ANOVA)
Chapter 13 Group Differences
Statistics for the Behavioral Sciences
Analysis of Variance: repeated measures
Experiments with More Than Two Groups
ANOVA: Analysis of Variance
Presentation transcript:

how to do a data analysis Stan Siranovich Crucial Connection LLC Prepared for SQL Saturday – Louisville 2018

The Story (based on a true life adventure) You log into your email first thing in the morning and the rumors are confirmed; your company is expanding with branch offices in three new cities. As you read, the Big Boss drops by your cubicle and says that she needs an analysis of the real estate situation in all three cities. The analysis needs to include summaries of prices based on factors such as number bedrooms, number of bathrooms, and number of square feet. It should include lots of visualizations, be clear and easy to understand, and point out any interesting relationships that you've uncovered. And you need to have it done by 11:30 a.m.

Summary Analysis for Louisville, Indianapolis, Cincinnati Requirements Plan of Attack Analysis for Louisville, Indianapolis, Cincinnati Beds, Baths, Sq. Ft., etc. Clear Visualizations Concise Report Due in Two Hours Use JMP data analysis software from SAS Collect, clean and examine data Summarize data Explore data visually Analyze data Prepare report

Residential Real Estate Data

The Software

By the Numbers Download and Concatenate Use Analyze > Distribution platform for visualization and data cleaning Use Recode function for further cleaning Use Analyze > Distribution platform for visualization and analysis

Concatenate Data in Analysis Software Open files and import into JMP data table Concatenate all three tables Include Source Column

Main Table with Source Column

Visual Data Cleaning Use Analyze > Distribution platform for first pass at cleaning

Partial Result from Analyze > Distribution

Cleaned Result from Analyze > Distribution

Recode Function

Recode Result with Formula Column Property Displays Match function Documentation Reproducible work flows

Analyze > Distribution Window Requested variables, all three cities

Result with Statistical Data and Boxplots

Box Plot Summary

Analyze > Distribution By Variable By Source Table

Result with Statistical Data and Boxplot

Stacked Results Red Triangle > Stack

Easy to Read Table Right Click > Edit > Make table of graphs like this

Progress Summary of prices, beds, baths, sq. ft. Done Next Summary of prices, beds, baths, sq. ft. Visualizations – clear, easy to understand Analysis Visualize distributions Comparisons of two variables Fit Y by X platform Data types and statistical measures

Output is Determined by Variable Type Analyze > Fit Y by X Examines the relationship between two variables Output depends on the variable modeling type

Price vs. Source

Statistical Results Red Triangle > Means / Anova Red Triangle > Compare Means > All Pairs, Tukey HSD

Multiple Variable vs. Source

Fit Y by X with Categorical and Continuous By Variables

Definitions R-square Measures the proportion of the variation accounted for by fitting means to each factor level. The remaining variation is attributed to random error. The R2 value is 1 if fitting the group means account for all the variation with no error. An R2 of 0 indicates that the fit serves no better as a prediction model than the overall response mean. F ratio Model mean square divided by the error mean square. If the analysis of variance model results in a significant reduction of variation from the total, the F ratio is higher than expected. Mean Square is a sum of squares divided by its associated degrees of freedom.

THE END How to Do a Data Analysis TITLE AUTHOR How to Do a Data Analysis Stan Siranovich Principal Analyst Crucial Connection LLC Jeffersonville, IN stan@CrucialConnection.com www.CrucialConnection.com www.StanSiranovich.com This work is the copyright of Stan Siranovich and Crucial Connection LLC