Final Exam Review. Data Mining and Data Analytics Techniques Explain the three data analytics techniques we covered in the course Decision Trees, Clustering,

Slides:



Advertisements
Similar presentations
Chapter 4 Scatterplots and Correlation. Rating Cereal: 0 to = unhealthy 100 = very nutritious.
Advertisements

LECTURE 3 Introduction to Linear Regression and Correlation Analysis
Chapter 17 Overview of Multivariate Analysis Methods
Final Review and Study Guide MIS2502, Spring 2011 Section 03.
Chapter 9 Business Intelligence Systems
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 13 Introduction to Linear Regression and Correlation Analysis.
Linear Regression and Correlation Analysis
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Chapter 7 Correlational Research Gay, Mills, and Airasian
SW318 Social Work Statistics Slide 1 Using SPSS for Graphic Presentation  Various Graphics in SPSS  Pie chart  Bar chart  Histogram  Area chart 
1 Chapter 4: Variability. 2 Variability The goal for variability is to obtain a measure of how spread out the scores are in a distribution. A measure.
Chapter 5 Data mining : A Closer Look.
Measures of Central Tendency
Chapter 3 (continued) Nutan S. Mishra. Exercises Size of the data set = 12 for all the five problems In 3.11 variable x 1 = monthly rent of.
SAS Homework 3 Review Association rules mining
Chapter 3: Central Tendency. Central Tendency In general terms, central tendency is a statistical measure that determines a single value that accurately.
(a.k.a: The statistical bare minimum I should take along from STAT 101)
Variable  An item of data  Examples: –gender –test scores –weight  Value varies from one observation to another.
1.
Lecture 22 Dustin Lueker.  The sample mean of the difference scores is an estimator for the difference between the population means  We can now use.
Measures of Dispersion 9/24/2013. Readings Chapter 2 Measuring and Describing Variables (Pollock) (pp.37-44) Chapter 6. Foundations of Statistical Inference.
Final Exam Review. The following is a list of items that you should review in preparation for the exam. Note that not every item in the following slides.
Exam 3 Sample Decision Trees Cluster Analysis Association Rules Data Visualization SAS.
SAS Homework 4 Review Clustering and Segmentation
Sec 1.5 Scatter Plots and Least Squares Lines Come in & plot your height (x-axis) and shoe size (y-axis) on the graph. Add your coordinate point to the.
Exam 3 Review Decision Trees Cluster Analysis Association Rules Data Visualization SAS.
1 Statistical Techniques Chapter Linear Regression Analysis Simple Linear Regression.
Preparing for the final - sample questions with answers.
EXAM REVIEW MIS2502 Data Analytics. Exam What Tool to Use? Evaluating Decision Trees Association Rules Clustering.
1 STAT 5814 Statistical Data Mining. 2 Use of SAS Data Mining.
The Three Analytics Techniques. Decision Trees – Determining Probability.
1. 2 * Introduction & Thoughts Behind The Experiment - We wanted to chose an experiment that was easy to complete, organized, and some what entertaining.
Notes 1.3 (Part 1) An Overview of Statistics. What you will learn 1. How to design a statistical study 2. How to collect data by taking a census, using.
CLUSTERING AND SEGMENTATION MIS2502 Data Analytics Adapted from Tan, Steinbach, and Kumar (2004). Introduction to Data Mining.
Financial Statistics Unit 2: Modeling a Business Chapter 2.2: Linear Regression.
Multivariate Analysis and Data Reduction. Multivariate Analysis Multivariate analysis tries to find patterns and relationships among multiple dependent.
NAMES OF GROUPS TITLE. STATISTICAL QUESTION Your Question here. Why did you chose this question?
MIS2502: Data Analytics Advanced Analytics - Introduction.
How Good is a Model? How much information does AIC give us? –Model 1: 3124 –Model 2: 2932 –Model 3: 2968 –Model 4: 3204 –Model 5: 5436.
Chapter 4: Variability. Variability The goal for variability is to obtain a measure of how spread out the scores are in a distribution. A measure of variability.
4.2 Displays of Quantitative Data. Stem and Leaf Plot A stem-and-leaf plot shows data arranged by place value. You can use a stem-and-leaf plot when you.
Scatter Plots. Scatter plots are used when data from an experiment or test have a wide range of values. You do not connect the points in a scatter plot,
Used Car Sales- GCSE Coursework Task. This task will demonstrate the statistical techniques you have learned We have looked at Mean from Frequency Table.
Statistical Fundamentals: Using Microsoft Excel for Univariate and Bivariate Analysis Alfred P. Rovai Charts Overview PowerPoint Prepared by Alfred P.
Elementary Statistics (Math 145) June 19, Statistics is the science of collecting, analyzing, interpreting, and presenting data. is the science.
GOAL: I CAN USE TECHNOLOGY TO COMPUTE AND INTERPRET THE CORRELATION COEFFICIENT OF A LINEAR FIT. (S-ID.8) Data Analysis Correlation Coefficient.
Math 145 June 19, Outline 1. Recap 2. Sampling Designs 3. Graphical methods.
Data Resource Management – MGMT An overview of where we are right now SQL Developer OLAP CUBE 1 Sales Cube Data Warehouse Denormalized Historical.
Appendix I A Refresher on some Statistical Terms and Tests.
ACC 421 Final Exam Guide 4 Sets To purchase this material link Final-Exam-Guide For more courses visit.
Howard Community College
Practice As part of a program to reducing smoking, a national organization ran an advertising campaign to convince people to quit or reduce their smoking.
MIS2502: Data Analytics Advanced Analytics - Introduction
©Jiawei Han and Micheline Kamber
SEEM5770/ECLT5840 Course Review
EDRS6208 Fundamentals of Education Research 1
PSYCH 625 Education on your terms/snaptutorial.com.
©Jiawei Han and Micheline Kamber
Promoting Usage of Location-based Services, an Approach Based on Intimacy Theory and Data Mining Techniques.
What Statistical Knowledge Should Students Know to Prepare for Graduate Business Analytics? Jeffrey D. Camm Wake Forest University School of Business
Exam #3 Review Zuyin (Alvin) Zheng.
MIS2502: Data Analytics Clustering and Segmentation
MIS2502: Review for Exam 3 Aaron Zhi Cheng
MIS2502: Data Analytics Clustering and Segmentation
Statistics Paper Define data set you are using for your paper.
MIS2502: Data Analytics Introduction to Advanced Analytics
Factor Analysis (Principal Components) Output
Quantitative Data Who? Cans of cola. What? Weight (g) of contents.
Practice As part of a program to reducing smoking, a national organization ran an advertising campaign to convince people to quit or reduce their smoking.
MIS2502: Data Analytics Introduction to Advanced Analytics and R
Presentation transcript:

Final Exam Review

Data Mining and Data Analytics Techniques Explain the three data analytics techniques we covered in the course Decision Trees, Clustering, and Association Rules What kinds of problems can each solve? Provide a business- oriented example. Make recommendations to a business based on the results of each type of analysis. Explain how data mining differs from OLAP analysis Why would you use this instead of a data cube and a pivot table?

Understanding Descriptive Statistics Be able to read and interpret a histogram Negative Skew/Positive Skew Be able to read and interpret sample statistics

Sample Statistics VariableMinMaxMean Male01.7 Female Did not reply01.1

Decision Tree Analysis

Cluster Analysis Scatter Plot What do you look for in a histogram to tell if a variable should be included?

Cluster Analysis Cohesion: Similar items being grouped together. Separation: Separate different clusters away from each other. Which cluster has the highest cohesion? Cluster 1. Why? Which cluster has the highest separation? Cluster 3. Why?

Cluster Analysis Segment profile Plot Segment 6: Does ORIGINAL have higher values than the data or lower? Higher. Why?

Association Rules Be able to read and interpret the output from an association rule analysis Find the strongest (or weakest) rule from a set of output Understand the difference between confidence, lift, and support You should be able to explain the difference between them Can you have high confidence and low lift? You won’t have to compute them, but you should understand how they are computed so you can interpret the statistics

What’s the difference between a standard association rule analysis and a sequence analysis? What situations would a sequence analysis make sense? When wouldn’t it? What activity is most likely to be undertaken if website is visited first? Archive. Why? What is a situation where you may have high confidence but not a high lift? Musicstream-> Website. What is happening here?