DM.Lab in University of Seoul Data Mining Laboratory April 24 th, 2008 Summarized by Sungjick Lee An Excel-Based Data Mining Tool iData Analyzer.

Slides:

Advertisements

Similar presentations

Ch2 Data Preprocessing part3 Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.

Advertisements

Decision Tree Approach in Data Mining

1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)

Data preprocessing before classification In Kennedy et al.: “Solving data mining problems”

Introduction to Data Mining with XLMiner

Data Mining: A Closer Look Chapter Data Mining Strategies.

Multiple Criteria for Evaluating Land Cover Classification Algorithms Summary of a paper by R.S. DeFries and Jonathan Cheung-Wai Chan April, 2000 Remote.

Civil and Environmental Engineering Carnegie Mellon University Sensors & Knowledge Discovery (a.k.a. Data Mining) H. Scott Matthews April 14, 2003.

1 Lecture 5: Automatic cluster detection Lecture 6: Artificial neural networks Lecture 7: Evaluation of discovered knowledge Brief introduction to lectures.

T T02-06 Histogram (6 SD) Purpose Allows the analyst to analyze quantitative data by summarizing it in sorted format, scattergram by observation,

Basic Data Mining Techniques Chapter Decision Trees.

Neural Networks. R & G Chapter Feed-Forward Neural Networks otherwise known as The Multi-layer Perceptron or The Back-Propagation Neural Network.

Induction of Decision Trees

Basic Data Mining Techniques

Neural Networks Chapter Feed-Forward Neural Networks.

Classification II.

Data Mining: A Closer Look Chapter Data Mining Strategies (p35) Moh!

T T02-04 Histogram (User Selected Classes) Purpose Allows the analyst to analyze quantitative data by summarizing it in sorted format, scattergram.

Genetic Algorithm Genetic Algorithms (GA) apply an evolutionary approach to inductive learning. GA has been successfully applied to problems that are difficult.

ML ALGORITHMS. Algorithm Types Classification (supervised) Given -> A set of classified examples “instances” Produce -> A way of classifying new examples.

Evaluation of Results (classifiers, and beyond) Biplav Srivastava Sources: [Witten&Frank00] Witten, I.H. and Frank, E. Data Mining - Practical Machine.

1 An Excel-based Data Mining Tool Chapter The iData Analyzer.

CS Instance Based Learning1 Instance Based Learning.

Data Mining – Intro.

Part I: Classification and Bayesian Learning

Oracle Data Mining Ying Zhang. Agenda Data Mining Data Mining Algorithms Oracle DM Demo.

Data Mining: A Closer Look

Data Mining: A Closer Look Chapter Data Mining Strategies 2.

Chapter 5 Data mining : A Closer Look.

Decision Tree Models in Data Mining

Evaluating Performance for Data Mining Techniques

CSCI 347 / CS 4206: Data Mining Module 06: Evaluation Topic 01: Training, Testing, and Tuning Datasets.

1 © Goharian & Grossman 2003 Introduction to Data Mining (CS 422) Fall 2010.

Basic Data Mining Techniques

An Excel-based Data Mining Tool Chapter The iData Analyzer.

Data Mining and Application Part 1: Data Mining Fundamentals Part 2: Tools for Knowledge Discovery Part 3: Advanced Data Mining Techniques Part 4: Intelligent.

Inductive learning Simplest form: learn a function from examples

COMP3503 Intro to Inductive Modeling

Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.

Classification. An Example (from Pattern Classification by Duda & Hart & Stork – Second Edition, 2001)

Data Mining CS157B Fall 04 Professor Lee By Yanhua Xue.

Chapter 9 Neural Network.

Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.

Chapter 7 Neural Networks in Data Mining Automatic Model Building (Machine Learning) Artificial Intelligence.

Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia.

1 Knowledge Discovery Transparencies prepared by Ho Tu Bao [JAIST] ITCS 6162.

Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 2 Data Mining: A Closer Look Jason C. H. Chen, Ph.D. Professor of MIS School of Business Administration.

1 Statistical Techniques Chapter Linear Regression Analysis Simple Linear Regression.

The CRISP Data Mining Process. August 28, 2004Data Mining2 The Data Mining Process Business understanding Data evaluation Data preparation Modeling Evaluation.

Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.

Data Reduction via Instance Selection Chapter 1. Background KDD  Nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable.

Part II Tools for Knowledge Discovery Ch 5. Knowledge Discovery in Databases Ch 6. The Data Warehouse Ch 7. Formal Evaluation Technique.

1 Data Mining: Concepts and Techniques (3 rd ed.) — Chapter 12 — Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign.

Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.

Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.

An Investigation of Commercial Data Mining Presented by Emily Davis Supervisor: John Ebden.

Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 4 An Excel-based Data Mining Tool (iData Analyzer) Jason C. H. Chen, Ph.D. Professor of MIS.

An Introduction Student Name: Riaz Ahmad Program: MSIT( ) Subject: Data warehouse & Data Mining.

1-1 Copyright © 2014, 2011, and 2008 Pearson Education, Inc.

Data Mining and Decision Support

An Excel-based Data Mining Tool Chapter The iData Analyzer.

Basic Data Mining Techniques Chapter 3-A. 3.1 Decision Trees.

Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 3 Basic Data Mining Techniques Jason C. H. Chen, Ph.D. Professor of MIS School of Business.

1 A latent information function to extend domain attributes to improve the accuracy of small-data-set forecasting Reporter ： Zhao-Wei Luo Che-Jung Chang,Der-Chiang.

Business Intelligence and Decision Support Systems (9 th Ed., Prentice Hall) Chapter 6: Artificial Neural Networks for Data Mining.

Data Transformation: Normalization

Dipartimento di Ingegneria «Enzo Ferrari»,

NBA Draft Prediction BIT 5534 May 2nd 2018

An Excel-based Data Mining Tool

Classification and Prediction

Presentation transcript:

DM.Lab in University of Seoul Data Mining Laboratory April 24 th, 2008 Summarized by Sungjick Lee An Excel-Based Data Mining Tool iData Analyzer

Data Mining Laboratory 2 DM.Lab in University of Seoul Contents  The iData Analyzer  ESX:A Multipurpose Tool for Data Mining  iDAV Foramt for Data Mining  A Approach for Unsupervised Clustering  A Approach for Supervised Learning

Data Mining Laboratory 3 DM.Lab in University of Seoul The iData Analyzer Scanning for errors illegal numeric values balnk lines missing items allows users to extract a representative subset of the data exemplar-based data mining tool builds a concept hierarchy to generalize data A backpropagation neural network for supervised learning A self-organizing feature map for unsupervised clustering

Data Mining Laboratory 4 DM.Lab in University of Seoul ESX:A Multipurpose Tool for Data Mining(1/2)  Both supervised learning and unsupervised clustering  No statistical assumptions about the nature for data  An automated method for dealing with missing attrib ute values  In domains containg both categorical and numberical data  For supervised classification, Determination of those instances and attributes best able to classify new instances of unknown origin  For unsupervised clustering, a globally optimizing evaluation function that encourages a best instance clustering

Data Mining Laboratory 5 DM.Lab in University of Seoul ESX:A Multipurpose Tool for Data Mining(2/2) define the concept classes summary statistics about the attribute values found within instance-level summary information about the domain Report Generator summary report in spreadsheet format Class resemblance scores

Data Mining Laboratory 6 DM.Lab in University of Seoul iDAV Format for Data Mining C : categorical (nomical) R : real-valued (numerical) I : input attribute U : not used D : not used for classification or clustering, but attribute avlue summary information is displayed O : used as an ouput attribute

Data Mining Laboratory 7 DM.Lab in University of Seoul A Approach for Unsupervised Clustering 1.Enter data into a new Excep Spreadsheet 2.Perform a data mining session 3.Read and interpret summary results 4.Read and interpret results for individual clusters 5.Visualize and interpret rules defining the individual clusters

Data Mining Laboratory 8 DM.Lab in University of Seoul A approach for unsupervised clustering Enter data into a new Excel Spreadsheet  CreditCardPromotion.xls

Data Mining Laboratory 9 DM.Lab in University of Seoul A approach for unsupervised clustering Perform a data mining session(1/2) A value closer to 100 : encourages the formation of new clusters A value closer to 0 : discourages the formation of new clusters The similarity criteria for real-valued attribute 1.0 is usually appropriate 8 classes are too many!! Change Instance similarity value and try again.

Data Mining Laboratory 10 DM.Lab in University of Seoul A approach for unsupervised clustering Perform a data mining session(2/2) Attribute Significance {The largest class mean(class 1 age = 43.33) - The smallest class mean(Class 2 age = 37.00) } / the domain standar deviation

Data Mining Laboratory 11 DM.Lab in University of Seoul A approach for unsupervised clustering Result– RES RUL(The generated production rules) Rules for Class 1Rules for Class 2Rules for Class 3 **Total Percent Coverage = 0.00% Income Range = "20-30,000" :rule accuracy % :rule coverage 80.00% <= Age <= :rule accuracy % :rule coverage 60.00% <= Age <= and Income Range = "20-30,000" :rule accuracy % :rule coverage 60.00% <= Age <= and Magazine Promo = No :rule accuracy % :rule coverage 60.00% ( 중간 생략 ) **Total Percent Coverage = 80.00% Income Range = "30-40,000" :rule accuracy 80.00% :rule coverage 57.14% Magazine Promo = Yes :rule accuracy 75.00% :rule coverage 85.71% Life Ins Promo = Yes :rule accuracy 77.78% :rule coverage % <= Age <= :rule accuracy 77.78% :rule coverage % ( 중간 생략 ) **Total Percent Coverage = %

Data Mining Laboratory 12 DM.Lab in University of Seoul A approach for unsupervised clustering Result– RES SUM(summary statistics) (1/2) Resemblance Score Within-class resemblance scores are higher than the domain resemblance value? If not, why? Bad choice of attributes Bad choice of instances The domain does not contain definable classes Attribute Significance {The largest class mean(class 1 age = 43.33) - The smallest class mean(Class 2 age = 37.00) } / the domain standar deviation (9.51)

Data Mining Laboratory 13 DM.Lab in University of Seoul A approach for unsupervised clustering Result–RES CLS(statistics about the individual class) (1/2) Typicality the average similarity of an instance to all other members of its cluster Predictiveness the state of being predicted the probability an instance reside in the Class between-class measures If ‘1’, the value is sufficient Predictability degree that a correct forecast the percent of instances within a class within-class measur es If ‘1’, the value is necessary

Data Mining Laboratory 14 DM.Lab in University of Seoul A approach for unsupervised clustering Result–RES CLS(statistics about the individual class) (1/2) Highly greater than or equal to 0.80

Data Mining Laboratory 15 DM.Lab in University of Seoul A Approach for Supervised Clustering 1.Enter data into a new Excep Spreadsheet and Choose output attribute 2.Perform a data mining session 3.Read and interpret summary results 4.Read and interpret test set results 5.Read and interpret results for individual clusters 6.Visualize and interpret class rules