1 An Excel-based Data Mining Tool Chapter 4. 2 4.1 The iData Analyzer.

Slides:



Advertisements
Similar presentations
An Introduction to Data Mining
Advertisements

Florida International University COP 4770 Introduction of Weka.
Data Mining Tools Overview Business Intelligence for Managers.
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Unsupervised Learning Reading: Chapter 8 from Introduction to Data Mining by Tan, Steinbach, and Kumar, pp , , Chapter 8.
Data preprocessing before classification In Kennedy et al.: “Solving data mining problems”
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A. Hall.
Data Mining: A Closer Look Chapter Data Mining Strategies.
Chapter 9 Business Intelligence Systems
Basic Data Mining Techniques Chapter Decision Trees.
Neural Networks. R & G Chapter Feed-Forward Neural Networks otherwise known as The Multi-layer Perceptron or The Back-Propagation Neural Network.
Basic Data Mining Techniques
Neural Networks Chapter Feed-Forward Neural Networks.
Data Mining: A Closer Look Chapter Data Mining Strategies (p35) Moh!
Genetic Algorithm Genetic Algorithms (GA) apply an evolutionary approach to inductive learning. GA has been successfully applied to problems that are difficult.
ML ALGORITHMS. Algorithm Types Classification (supervised) Given -> A set of classified examples “instances” Produce -> A way of classifying new examples.
Data Mining – Intro.
Part I: Classification and Bayesian Learning
Data Mining: A Closer Look
Data Mining: A Closer Look Chapter Data Mining Strategies 2.
Chapter 5 Data mining : A Closer Look.
Evaluating Performance for Data Mining Techniques
Chapter 7 Decision Tree.
1 Formal Evaluation Techniques Chapter 7. 2 test set error rates, confusion matrices, lift charts Focusing on formal evaluation methods for supervised.
CSCI 347 / CS 4206: Data Mining Module 06: Evaluation Topic 01: Training, Testing, and Tuning Datasets.
Data Mining : Introduction Chapter 1. 2 Index 1. What is Data Mining? 2. Data Mining Functionalities 1. Characterization and Discrimination 2. MIning.
Basic Data Mining Techniques
Overview DM for Business Intelligence.
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
An Excel-based Data Mining Tool Chapter The iData Analyzer.
Data Mining and Application Part 1: Data Mining Fundamentals Part 2: Tools for Knowledge Discovery Part 3: Advanced Data Mining Techniques Part 4: Intelligent.
Inductive learning Simplest form: learn a function from examples
Chapter 9 Neural Network.
Basic Data Mining Technique
Chapter 8 The k-Means Algorithm and Genetic Algorithm.
1 Knowledge Discovery Transparencies prepared by Ho Tu Bao [JAIST] ITCS 6162.
Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 2 Data Mining: A Closer Look Jason C. H. Chen, Ph.D. Professor of MIS School of Business Administration.
Analyzing Students' Behavior in a Beginner's Programming Course Marija Brkić, Higher Teaching Assistant Maja Matetić, Associate Professor.
1 Statistical Techniques Chapter Linear Regression Analysis Simple Linear Regression.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
N. GagunashviliRAVEN Workshop Heidelberg Nikolai Gagunashvili (University of Akureyri, Iceland) Data mining methods in RAVEN network.
1 1 Slide Using Weka. 2 2 Slide Data Mining Using Weka n What’s Data Mining? We are overwhelmed with data We are overwhelmed with data Data mining is.
Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Part I Data Mining Fundamentals Chapter 1 Data Mining: A First View Jason C. H. Chen, Ph.D. Professor.
Part I Data Mining Fundamentals. Data Mining: A First View Chapter 1.
Part II Tools for Knowledge Discovery Ch 5. Knowledge Discovery in Databases Ch 6. The Data Warehouse Ch 7. Formal Evaluation Technique.
1 Data Mining: Concepts and Techniques (3 rd ed.) — Chapter 12 — Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
An Investigation of Commercial Data Mining Presented by Emily Davis Supervisor: John Ebden.
Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 4 An Excel-based Data Mining Tool (iData Analyzer) Jason C. H. Chen, Ph.D. Professor of MIS.
Clustering Clustering is a technique for finding similarity groups in data, called clusters. I.e., it groups data instances that are similar to (near)
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall DM Finals Study Guide Rodney Nielsen.
DM.Lab in University of Seoul Data Mining Laboratory April 24 th, 2008 Summarized by Sungjick Lee An Excel-Based Data Mining Tool iData Analyzer.
1 Classification: predicts categorical class labels (discrete or nominal) classifies data (constructs a model) based on the training set and the values.
An Excel-based Data Mining Tool Chapter The iData Analyzer.
Basic Data Mining Techniques Chapter 3-A. 3.1 Decision Trees.
Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 3 Basic Data Mining Techniques Jason C. H. Chen, Ph.D. Professor of MIS School of Business.
Data Mining CH6 Implementation: Real machine learning schemes(2) Reporter: H.C. Tsai.
Chapter 3 Data Mining: Classification & Association Chapter 4 in the text box Section: 4.3 (4.3.1),
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
Rodney Nielsen Many of these slides were adapted from: I. H. Witten, E. Frank and M. A. Hall Data Science Algorithms: The Basic Methods Clustering WFH:
Machine Learning with Spark MLlib
Chapter 6 Decision Tree.
Table 1. Advantages and Disadvantages of Traditional DM/ML Methods
An Excel-based Data Mining Tool
Prepared by: Mahmoud Rafeek Al-Farra
Prepared by: Mahmoud Rafeek Al-Farra
Prepared by: Mahmoud Rafeek Al-Farra
Classification and Prediction
CSCI N317 Computation for Scientific Applications Unit Weka
Machine Learning in Business John C. Hull
Presentation transcript:

1 An Excel-based Data Mining Tool Chapter 4

2 4.1 The iData Analyzer

3

4

5 4.2 ESX: A Multipurpose Tool for Data Mining

6 ESX Supports supervised learning and unsupervised clustering Does not make statistical assumptions Deal with missing attribute values Applied to categorical and numerical data Point out inconsistencies and unusual values

7 For supervised classification, ESX can determine those instances and attributes best able to classify new instances For unsupervised clustering, ESX incorporates a globally optimizing evaluation function that encourages a best instance clustering

8

9 4.3 iDAV Format for Data Mining

10

11

A Five-step Approach for Unsupervised Clustering Step 1: Enter the Data to be Mined Step 2: Perform a Data Mining Session Step 3: Read and Interpret Summary Results Step 4: Read and Interpret Individual Class Results Step 5: Visualize Individual Class Rules

13 Step 1: Enter The Data To Be Mined

14

15 Step 2: Perform A Data Mining Session

16

17

18 Step 3: Read and Interpret Summary Results Class Resemblance Scores Domain Resemblance Score –Attributes, instances, no model Domain Predictability

19

20

21 Step 4: Read and Interpret Individual Class Results Class Predictability is a within- class measure. Class Predictiveness is a between-class measure.

22

23

24 Step 5: Visualize Individual Class Rules

25

A Six-Step Approach for Supervised Learning Step 1: Choose an Output Attribute Step 2: Perform the Mining Session Step 3: Read and Interpret Summary Results Step 4: Read and Interpret Test Set Results Step 5: Read and Interpret Class Results Step 6: Visualize and Interpret Class Rules

27 Read and Interpret Test Set Results

Techniques for Generating Rules 1. Choose an attribute 2. use the attribute to subdivide instances into classes 3. –if the instances in the subclass satisfy a predefined criteria, generate a defining rule –If not, repeat 1

Techniques for Generating Rules 1.Define the scope of the rules. 2.Choose the instances. 3.Set the minimum rule correctness. 4.Define the minimum rule coverage. 5.Choose an attribute significance value.

30

Instance Typicality

32 Typicality Scores Identify prototypical and outlier instances. Select a best set of training instances. Used to compute individual instance classification confidence scores.

33

Special Considerations and Features Avoid Mining Delays The Quick Mine Feature Erroneous and Missing Data