Presentation is loading. Please wait.

Presentation is loading. Please wait.

Summary Statistics Review

Similar presentations


Presentation on theme: "Summary Statistics Review"— Presentation transcript:

1 Summary Statistics Review
MIS2502 Data Analytics

2 Bottom line In large sets of data, these patterns aren’t obvious
And we can’t just figure it out in our head We need analytics software We’ll be using SAS to perform these three analyses on large sets of data Decision Trees Clustering Association Rules

3 Do most players make more or less than the mean? Explain.
Are player salaries normally distributed? Explain. What do you learn about player salaries based on the standard deviation being greater than the mean?

4 SAS #1 – Intro Start up SAS Modify an existing Project
Create a new Diagram within that Project Define Data Set AAEM61.Organics for that Project Modify Data Set AAEM61.Organics DemCluster:Reject TargetAmt: Reject TargetBuy: Target – Binary Analysis (during Data Source Definition) Target Buy - Proportion who Purchase Explore Data Source Organics DemGender (BAR Chart) DemAge (Summary Stats Max) Distribution of DemAffl - Mode v Mean

5 File >New > Diagram
File>New>Data Source

6 SAS #1- Data Source Wizard

7 SAS #1- Data Source Wizard : Step 2 Browse to SharedData>Libraries> AAEM >Organics OK

8 SAS #1- Data Source Wizard : Next for Steps 3, 4 and 5
SAS #1- Data Source Wizard : Next for Steps 3, 4 and 5 . Basic and then make changes

9 SAS #1- Data Source Wizard : Next for Steps 7, 8, 9 and 10, then Finish.

10 SAS #1 – Explore Right Click on file and Choose Explore
This will open the Summary Statistics window

11 SAS #1 – Explore Default Explore Window

12 SAS #1 – Explore using Bar Chart Actions>Plot>BarChart>

13 SAS #1 – Explore Sample Statistics

14 SAS #1 - Explore Using Histogram Actions>Plot>Histogram>
Highlight bars to get Stats

15 SAS #1 - Explore Using Histogram Actions>Plot>Histogram>
Answer will vary depending on fetch size and sample method but not by much…

16 SAS Homework 2 Review Decision Trees
Using Organics Data Set from exercise #1. If Organics is wrong then your Decision Tree will be wrong Partition: 50% Training, 50% Validation Add a Decision Tree using defaults (max number branches 2) Evaluate default Decision Tree using Average Square Error Add another Decision Tree but this time customize by changing the max number of branches from 2 to 3 Assess this Decision Tree using Average Square Error Compare default Tree (2 branch max) to customized Tree (3 branch max) and determine which model is ‘better’ Answer some questions regarding the customized Decision Tree

17 Partition

18 Decision Tree – Diagram
Right Click> Run after adding objects Difference is Maximum Branch

19 Assessing the 1st Decision Tree
2 branches Age is 1st branch

20 Assessing the 1st Decision Tree using average square error subtree assessment plot
View>Model>Subtree Assessment Plot The line is the Optimal Leaf count for that tree Note the Leaf Count and the Validation: Average Square Error

21 Assessing the 2nd Decision Tree

22 Assessing the 2nd Decision Tree
Note change in average square error In general, less error is better

23 What is the probability that a 39
What is the probability that a 39.5 year old male with an affluence grade of 15 buys organics ? Age = 39.5, AfflGrade > 11.5 Gender = M Look at the ‘Validation’ stats 1 = buy, 0 = no buy Navigating the Tree

24 View> Explorer


Download ppt "Summary Statistics Review"

Similar presentations


Ads by Google