Download presentation
Presentation is loading. Please wait.
1
how to do a data analysis
Stan Siranovich Crucial Connection LLC Prepared for SQL Saturday – Louisville 2018
2
The Story (based on a true life adventure)
You log into your first thing in the morning and the rumors are confirmed; your company is expanding with branch offices in three new cities. As you read, the Big Boss drops by your cubicle and says that she needs an analysis of the real estate situation in all three cities. The analysis needs to include summaries of prices based on factors such as number bedrooms, number of bathrooms, and number of square feet. It should include lots of visualizations, be clear and easy to understand, and point out any interesting relationships that you've uncovered. And you need to have it done by 11:30 a.m.
3
Summary Analysis for Louisville, Indianapolis, Cincinnati
Requirements Plan of Attack Analysis for Louisville, Indianapolis, Cincinnati Beds, Baths, Sq. Ft., etc. Clear Visualizations Concise Report Due in Two Hours Use JMP data analysis software from SAS Collect, clean and examine data Summarize data Explore data visually Analyze data Prepare report
4
Residential Real Estate Data
5
The Software
6
By the Numbers Download and Concatenate
Use Analyze > Distribution platform for visualization and data cleaning Use Recode function for further cleaning Use Analyze > Distribution platform for visualization and analysis
7
Concatenate Data in Analysis Software
Open files and import into JMP data table Concatenate all three tables Include Source Column
8
Main Table with Source Column
9
Visual Data Cleaning Use Analyze > Distribution platform for first pass at cleaning
10
Partial Result from Analyze > Distribution
11
Cleaned Result from Analyze > Distribution
12
Recode Function
13
Recode Result with Formula Column Property
Displays Match function Documentation Reproducible work flows
14
Analyze > Distribution Window
Requested variables, all three cities
15
Result with Statistical Data and Boxplots
16
Box Plot Summary
17
Analyze > Distribution By Variable
By Source Table
18
Result with Statistical Data and Boxplot
19
Stacked Results Red Triangle > Stack
20
Easy to Read Table Right Click > Edit > Make table of graphs like this
21
Progress Summary of prices, beds, baths, sq. ft.
Done Next Summary of prices, beds, baths, sq. ft. Visualizations – clear, easy to understand Analysis Visualize distributions Comparisons of two variables Fit Y by X platform Data types and statistical measures
22
Output is Determined by Variable Type
Analyze > Fit Y by X Examines the relationship between two variables Output depends on the variable modeling type
23
Price vs. Source
24
Statistical Results Red Triangle > Means / Anova
Red Triangle > Compare Means > All Pairs, Tukey HSD
25
Multiple Variable vs. Source
26
Fit Y by X with Categorical and Continuous By Variables
27
Definitions R-square Measures the proportion of the variation accounted for by fitting means to each factor level. The remaining variation is attributed to random error. The R2 value is 1 if fitting the group means account for all the variation with no error. An R2 of 0 indicates that the fit serves no better as a prediction model than the overall response mean. F ratio Model mean square divided by the error mean square. If the analysis of variance model results in a significant reduction of variation from the total, the F ratio is higher than expected. Mean Square is a sum of squares divided by its associated degrees of freedom.
28
THE END How to Do a Data Analysis
TITLE AUTHOR How to Do a Data Analysis Stan Siranovich Principal Analyst Crucial Connection LLC Jeffersonville, IN This work is the copyright of Stan Siranovich and Crucial Connection LLC
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.