Exam 3 Review Decision Trees Cluster Analysis Association Rules Data Visualization SAS.

Slides:



Advertisements
Similar presentations
Market Research : Sampling A sample means a part of a larger group If I asked all of you how many of you know the Capital of China ? and 50% got the right.
Advertisements

Data Mining Tri Nguyen. Agenda Data Mining As Part of KDD Decision Tree Association Rules Clustering Amazon Data Mining Examples.
1 9. Logistic Regression ECON 251 Research Methods.
Final Exam Review. Data Mining and Data Analytics Techniques Explain the three data analytics techniques we covered in the course Decision Trees, Clustering,
Data Quality Class 9. Rule Discovery Decision and Classification Trees Association Rules.
Chapter Extension 14 Database Marketing © 2008 Pearson Prentice Hall, Experiencing MIS, David Kroenke.
Classification Continued
Decision Trees (2). Numerical attributes Tests in nodes are of the form f i > constant.
Correlation A correlation exists between two variables when one of them is related to the other in some way. A scatterplot is a graph in which the paired.
Data Mining: A Closer Look Chapter Data Mining Strategies (p35) Moh!
Chapter Extension 12 Database Marketing.
The Binomial Probability
The discipline of statistics: Provides methods for organizing and summarizing data and for drawing conclusions based on information contained in data.
Chapter 5 Data mining : A Closer Look.
Introduction to Data Mining Data mining is a rapidly growing field of business analytics focused on better understanding of characteristics and.
Decision Tree Models in Data Mining
Bivariate Data Notes for Merit Comparing two Bi-variate plots.
Beyond Opportunity; Enterprise Miner Ronalda Koster, Data Analyst.
TURKISH STATISTICAL INSTITUTE INFORMATION TECHNOLOGIES DEPARTMENT (Muscat, Oman) DATA MINING.
Under the guidance of Mr. K. Bhaskar, MBA. Assistant Professor Submitted by S. Mujeebur Rahaman 095P1E0064.
Review of Basic Statistics. Definitions Population - The set of all items of interest in a statistical problem e.g. - Houses in Sacramento Parameter -
© Negnevitsky, Pearson Education, Introduction, or what is data mining? Introduction, or what is data mining? Data warehouse and query tools Data.
Think of 3 examples where you wanted to do something, but didn’t because of the consequences?
INTRODUCTION TO DATA MINING MIS2502 Data Analytics.
MULTIPLE REGRESSION Using more than one variable to predict another.
Data Mining By Fu-Chun (Tracy) Juang. What is Data Mining? ► The process of analyzing LARGE databases to find useful patterns. ► Attempts to discover.
DATA MINING Prof. Sin-Min Lee Surya Bhagvat CS 157B – Spring 2006.
Zhangxi Lin ISQS Texas Tech University Note: Most slides are from Decision Tree Modeling by SAS Lecture Notes 5 Auxiliary Uses of Trees.
HOMEWORK QUESTIONS?. 5.2 TWO-WAY TABLES PROBABILITY MODELS A probability model describes chance behavior by listing the possible outcomes in the sample.
Final Exam Review. The following is a list of items that you should review in preparation for the exam. Note that not every item in the following slides.
Chapter 18 – Part II Sampling Distribution for the Sample Mean.
Exam 3 Sample Decision Trees Cluster Analysis Association Rules Data Visualization SAS.
Introduction, or what is data mining? Introduction, or what is data mining? Data warehouse and query tools Data warehouse and query tools Decision trees.
Business Intelligence and Decision Modeling Week 11 Predictive Modeling (2) Logistic Regression.
1 COMP3503 Inductive Decision Trees with Daniel L. Silver Daniel L. Silver.
Summary Statistics Review
Take out homework and a pencil to prepare for the homework quiz! Check the file folder for your class to pick up graded work.
It is a process that is used to find answers to questions about the world around us.
Introduction of Data Mining and Association Rules cs157 Spring 2009 Instructor: Dr. Sin-Min Lee Student: Dongyi Jia.
EXAM REVIEW MIS2502 Data Analytics. Exam What Tool to Use? Evaluating Decision Trees Association Rules Clustering.
The Three Analytics Techniques. Decision Trees – Determining Probability.
BUS 362 Marketing Research SPSS Exam Spring 2014 Name: Emilija Naumoska Time of the exam start: digit/letter code:
Copyright © 2010 SAS Institute Inc. All rights reserved. Decision Trees Using SAS Sylvain Tremblay SAS Canada – Education SAS Halifax Regional User Group.
Data Mining Brandon Leonardo CS157B (Spring 2006).
Chapter Outline Goodness of Fit test Test of Independence.
Multiple Logistic Regression STAT E-150 Statistical Methods.
Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.
Review of Factorial ANOVA, correlations and reliability tests COMM Fall, 2007 Nan Yu.
R ISK A NALYSIS & M ANAGEMENT. Risk – possibility that an undesirable event (called the risk event) could happen – Involve uncertainty and loss – Events.
IMPORTANCE OF STATISTICS MR.CHITHRAVEL.V ASST.PROFESSOR ACN.
MIS2502: Data Analytics Advanced Analytics - Introduction.
CIS 335 CIS 335 Data Mining Classification Part I.
Classification Tree Interaction Detection. Use of decision trees Segmentation Stratification Prediction Data reduction and variable screening Interaction.
Chapter 9 Testing A Claim 9.1 SIGNIFICANT TESTS: THE BASICS OUTCOME: I WILL STATE THE NULL AND ALTERNATIVE HYPOTHESES FOR A SIGNIFICANCE TEST ABOUT A POPULATION.
By N.Gopinath AP/CSE.  A decision tree is a flowchart-like tree structure, where each internal node (nonleaf node) denotes a test on an attribute, each.
Appendix I A Refresher on some Statistical Terms and Tests.
Chapter 20 Data Warehousing and Mining 1 st Semester, 2016 Sanghyun Park.
Just the basics: Learning about the essential steps to do some simple things in SPSS Larkin Lamarche.
Hypothesis Testing Is It Significant?.
Exam #3 Review Zuyin (Alvin) Zheng.
SAS Homework 2 Review Decision trees
Wednesday, September 23 Descriptive v. Inferential statistics.
Inference About Variables Part IV Review
EQ: How well does the line fit the data?
Math Review #3 Jeopardy Random Samples and Populations
MIS2502: Review for Exam 3 Aaron Zhi Cheng
Data Science in Industry
15.1 The Role of Statistics in the Research Process
EXAMPLE.
Presentation transcript:

Exam 3 Review Decision Trees Cluster Analysis Association Rules Data Visualization SAS

When to Use Which Analysis (D, C or A)? –When someone gets an A in this class, what other classes do they get an A in? –What predicts whether a company will go bankrupt? –If someone upgrades to an iPhone, do they also buy a new case? –Which party will win the election? –Can we group our website visitors into types based on their online behaviors? –Which customers will purchase our product? –Can we identify different product markets based on customer demographics?

Decision Trees Which is the Root Node? # Leafs Nodes?

Probability of Purchase? i) Female, 130 lbs, 12 ft? ii) 120 lbs, 5 feet, male? Best predictor variable? Outcome Data 062% 138% n350 OutcomeData 055% 145% n250 OutcomeData 040% 160% n150 OutcomeData 060% 140% n250 Outcome Data 045% 155% n75 OutcomeData 035% 165% n75 Height Weight <150>=150 Weight Gender <170 >=170 Male Female <6’ >=6’

Probability of Purchase? i) 5 ft 5 inches? ii) 6 ft 5 inches 190 lbs? Outcome Data 062% 138% n350 OutcomeData 055% 145% n250 OutcomeData 040% 160% n150 OutcomeData 060% 140% n250 Outcome Data 045% 155% n75 OutcomeData 035% 165% n75 Height Weight <150>=150 Weight Gender <170 >=170 Male Female <6’ >=6’

Decision Trees What does it mean that Gender is only on the right side of the tree? Why is it not on both sides? Based on the tree, which demographic is MOST likely to buy the product? Least likely to buy the product?

Decision Trees What Statistics are Used to Determine Splits for Decision Trees? –Gini Coefficient, Chi-Square Statistics (p-value) What does it mean when the Gini = 1? What does it mean when the Chi-square is bigger? What happens to the p-value as the Chi-square gets bigger? –

Clustering What statistics do we care about in cluster analysis? What do they represent? What happens to these statistics as the number of clusters is increased? Why do we standardize data? Why do we eliminate outliers?

Clustering What are the pros and cons of having only a few clusters (compared to having many clusters)? What is bad about the below cluster analysis result? How would you improve it?

Association Rules How would you describe the following association rule? –{Meat, Dairy}  {Vegetables} How many items are in this item set? What is (are) the antecedents? What are the consequents? What are the statistics we care about when evaluating an association rule?

Association Rules Do the following two rules have to have the same Confidence? The same Support? The same Lift? –{Meat, Dairy}  {Vegetables} –{Vegetables}  {Meat, Dairy} What does Lift > 1 mean? Would you take action on such a rule? –What about Lift < 1? –What about Lift = 1?

Association Rules What might you do as a manager if you saw a very high Lift and Confidence for the following rule about product purchase? Why would you do this? –{Pasta}  {Orange Juice}

Association Rules What is the most reliable association rule below?

Data Visualization Look at In-Class Exercise Answers...