Summary Statistics Review

Slides:



Advertisements
Similar presentations
Guide to Using Minitab 14 For Basic Statistical Applications
Advertisements

Guide to Using Excel For Basic Statistical Applications To Accompany Business Statistics: A Decision Making Approach, 6th Ed. Chapter 2: Graphs, Charts.
Data analysis: Explore GAP Toolkit 5 Training in basic drug abuse data management and analysis Training session 9.
D-1 Management Information Systems for the Information Age Copyright 2004 The McGraw-Hill Companies, Inc. All rights reserved Extended Learning Module.
Statistics for Decision Making Descriptive Statistics QM Fall 2003 Instructor: John Seydel, Ph.D.
Histograms & Summary Data.  Summarizing large of amounts of data in two ways: Histograms: graphs give a pictorial representation of the data Numerical.
Descriptive Statistics In SAS Exploring Your Data.
Histograms & Summary Data.  Summarizing large of amounts of data in two ways: Histograms: graphs give a pictorial representation of the data Numerical.
Evaluation of MineSet 3.0 By Rajesh Rathinasabapathi S Peer Mohamed Raja Guided By Dr. Li Yang.
Social Research Methods
Warm-up 2.5 The Normal Distribution Find the missing midpoint values, then find mean, median and standard deviation.
SW318 Social Work Statistics Slide 1 Using SPSS for Graphic Presentation  Various Graphics in SPSS  Pie chart  Bar chart  Histogram  Area chart 
SW388R7 Data Analysis & Computers II Slide 1 Assumption of normality Transformations Assumption of normality script Practice problems.
FEBRUARY, 2013 BY: ABDUL-RAUF A TRAINING WORKSHOP ON STATISTICAL AND PRESENTATIONAL SYSTEM SOFTWARE (SPSS) 18.0 WINDOWS.
Warm-Up Exercises EXAMPLE 1 Baseball The number of home runs hit by the 20 baseball players with the best single-season batting averages in Major League.
Overview DM for Business Intelligence.
Exploratory Data Analysis. Height and Weight 1.Data checking, identifying problems and characteristics Data exploration and Statistical analysis.
Building And Interpreting Decision Trees in Enterprise Miner.
Chapter 1: Exploring Data AP Stats, Questionnaire “Please take a few minutes to answer the following questions. I am collecting data for my.
Guide to Using Excel 2003 For Basic Statistical Applications To Accompany Business Statistics: A Decision Making Approach, 7th Ed. Chapter 2: Graphs, Charts.
SAS Homework 3 Review Association rules mining
6.1 What is Statistics? Definition: Statistics – science of collecting, analyzing, and interpreting data in such a way that the conclusions can be objectively.
GUTS Youth Leadership Corps Things you need to know.
WEKA - Explorer (sumber: WEKA Explorer user Guide for Version 3-5-5)
Chapter 9 – Classification and Regression Trees
Zhangxi Lin ISQS Texas Tech University Note: Most slides are from Decision Tree Modeling by SAS Lecture Notes 5 Auxiliary Uses of Trees.
Final Exam Review. The following is a list of items that you should review in preparation for the exam. Note that not every item in the following slides.
Exam 3 Sample Decision Trees Cluster Analysis Association Rules Data Visualization SAS.
How to find measures variability using SPSS
SAS Homework 4 Review Clustering and Segmentation
Guide to Using Excel For Basic Statistical Applications To Accompany Business Statistics: A Decision Making Approach, 6th Ed. Chapter 7: Estimating Population.
Summary Five numbers summary, percentiles, mean Box plot, modified box plot Robust statistic – mean, median, trimmed mean outlier Measures of variability.
What is SPSS  SPSS is a program software used for statistical analysis.  Statistical Package for Social Sciences.
1 Chapter 4: Describing Distributions 4.1Graphs: good and bad 4.2Displaying distributions with graphs 4.3Describing distributions with numbers.
EXAM REVIEW MIS2502 Data Analytics. Exam What Tool to Use? Evaluating Decision Trees Association Rules Clustering.
1 An Introduction to SPSS for Windows Jie Chen Ph.D. 6/4/20161.
Perform Descriptive Statistics Section 6. Descriptive Statistics Descriptive statistics describe the status of variables. How you describe the status.
1 STAT 5814 Statistical Data Mining. 2 Use of SAS Data Mining.
September 18-19, 2006 – Denver, Colorado Sponsored by the U.S. Department of Housing and Urban Development Conducting and interpreting multivariate analyses.
Section 1-1 Day One Types of Data Bar Graphs, Pie Charts Dots Plots, Stem and leaf plots, Histograms.
Mr. Magdi Morsi Statistician Department of Research and Studies, MOH
Section 2.4 Measures of Variation. Section 2.4 Objectives Determine the range of a data set Determine the variance and standard deviation of a population.
QM Spring 2002 Business Statistics Bivariate Analyses for Qualitative Data.
STATISTICS Chapter 2 and and 2.2: Review of Basic Statistics Topics covered today:  Mean, Median, Mode  5 number summary and box plot  Interquartile.
Announcements Exams returned at end of class Average = 78 Standard Dev = 12 Key with explanations will be posted Don’t be discouraged: First test is often.
Do Now #5: Normal Distributions Suppose the mean weight (in pounds) of a newborn elephant is 200 pounds with a standard deviation of 23 pounds. (Lesson.
Chapter 3: Getting Started with Tasks
Analysis of Quantitative Data
*Bring Money for Yearbook!
*Bring Money for Yearbook!
Guide to Using Minitab 14 For Basic Statistical Applications
Welcome Ms. Navarro.
Decision Analysis With Spreadsheet Software
Assumption of normality
StatCrunch Workshop Hector Facundo.
Social Research Methods
Advanced Analytics Using Enterprise Miner
Guide to Using Excel 2003 For Basic Statistical Applications
SAS Homework 2 Review Decision trees
AP Biology Intro to Statistic
More Weather Stats.
AP Biology Intro to Statistic
Representation of Data
AP Biology Intro to Statistic
Welcome to AP Statistics
Unit # Deviation Absolute Dev. Square of Dev
Probability and Statistics
Exercise 1 Use Transform  Compute variable to calculate weight lost by each person Calculate the overall mean weight lost Calculate the means and standard.
Presentation transcript:

Summary Statistics Review MIS2502 Data Analytics

Bottom line In large sets of data, these patterns aren’t obvious And we can’t just figure it out in our head We need analytics software We’ll be using SAS to perform these three analyses on large sets of data Decision Trees Clustering Association Rules

Do most players make more or less than the mean? Explain. Are player salaries normally distributed? Explain. What do you learn about player salaries based on the standard deviation being greater than the mean?

SAS #1 – Intro Start up SAS Modify an existing Project Create a new Diagram within that Project Define Data Set AAEM61.Organics for that Project Modify Data Set AAEM61.Organics DemCluster:Reject TargetAmt: Reject TargetBuy: Target – Binary Analysis (during Data Source Definition) Target Buy - Proportion who Purchase Explore Data Source Organics DemGender (BAR Chart) DemAge (Summary Stats Max) Distribution of DemAffl - Mode v Mean

File >New > Diagram File>New>Data Source

SAS #1- Data Source Wizard

SAS #1- Data Source Wizard : Step 2 Browse to SharedData>Libraries> AAEM >Organics OK

SAS #1- Data Source Wizard : Next for Steps 3, 4 and 5 SAS #1- Data Source Wizard : Next for Steps 3, 4 and 5 . Basic and then make changes

SAS #1- Data Source Wizard : Next for Steps 7, 8, 9 and 10, then Finish.

SAS #1 – Explore Right Click on file and Choose Explore This will open the Summary Statistics window

SAS #1 – Explore Default Explore Window

SAS #1 – Explore using Bar Chart Actions>Plot>BarChart>

SAS #1 – Explore Sample Statistics

SAS #1 - Explore Using Histogram Actions>Plot>Histogram> Highlight bars to get Stats

SAS #1 - Explore Using Histogram Actions>Plot>Histogram> Answer will vary depending on fetch size and sample method but not by much…

SAS Homework 2 Review Decision Trees Using Organics Data Set from exercise #1. If Organics is wrong then your Decision Tree will be wrong Partition: 50% Training, 50% Validation Add a Decision Tree using defaults (max number branches 2) Evaluate default Decision Tree using Average Square Error Add another Decision Tree but this time customize by changing the max number of branches from 2 to 3 Assess this Decision Tree using Average Square Error Compare default Tree (2 branch max) to customized Tree (3 branch max) and determine which model is ‘better’ Answer some questions regarding the customized Decision Tree

Partition

Decision Tree – Diagram Right Click> Run after adding objects Difference is Maximum Branch

Assessing the 1st Decision Tree 2 branches Age is 1st branch

Assessing the 1st Decision Tree using average square error subtree assessment plot View>Model>Subtree Assessment Plot The line is the Optimal Leaf count for that tree Note the Leaf Count and the Validation: Average Square Error

Assessing the 2nd Decision Tree

Assessing the 2nd Decision Tree Note change in average square error In general, less error is better

What is the probability that a 39 What is the probability that a 39.5 year old male with an affluence grade of 15 buys organics ? Age = 39.5, AfflGrade > 11.5 Gender = M Look at the ‘Validation’ stats 1 = buy, 0 = no buy Navigating the Tree

View> Explorer