Part II Tools for Knowledge Discovery. Knowledge Discovery in Databases Chapter 5.

Slides:



Advertisements
Similar presentations
Ch2 Data Preprocessing part3 Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.
Advertisements

Decision Tree Approach in Data Mining
1 Chapter 34 Data Mining Transparencies © Pearson Education Limited 1995, 2005.
Data Mining Sangeeta Devadiga CS 157B, Spring 2007.
Introduction to Data Mining with XLMiner

Data Mining: A Closer Look Chapter Data Mining Strategies.
Civil and Environmental Engineering Carnegie Mellon University Sensors & Knowledge Discovery (a.k.a. Data Mining) H. Scott Matthews April 14, 2003.
Basic Data Mining Techniques Chapter Decision Trees.
Neural Networks. R & G Chapter Feed-Forward Neural Networks otherwise known as The Multi-layer Perceptron or The Back-Propagation Neural Network.
Basic Data Mining Techniques
Neural Networks Chapter Feed-Forward Neural Networks.
SLIDE 1IS 257 – Fall 2008 Data Mining and the Weka Toolkit University of California, Berkeley School of Information IS 257: Database Management.
Data Mining: A Closer Look Chapter Data Mining Strategies (p35) Moh!
Introduction. 1.Data Mining and Knowledge Discovery 2.Data Mining Methods 3.Supervised Learning 4.Unsupervised Learning 5.Other Learning Paradigms 6.Introduction.
1 An Excel-based Data Mining Tool Chapter The iData Analyzer.
Data Mining: A Closer Look
Data Mining: A Closer Look Chapter Data Mining Strategies 2.
Chapter 5 Data mining : A Closer Look.
Chapter 1 Data Preprocessing
Chapter 35 Data Mining Transparencies. 2 Chapter Objectives u The concepts associated with data mining. u The main features of data mining operations,
CS2032 DATA WAREHOUSING AND DATA MINING
Enterprise systems infrastructure and architecture DT211 4
1 Formal Evaluation Techniques Chapter 7. 2 test set error rates, confusion matrices, lift charts Focusing on formal evaluation methods for supervised.
Shilpa Seth.  What is Data Mining What is Data Mining  Applications of Data Mining Applications of Data Mining  KDD Process KDD Process  Architecture.
Data Mining. 2 Models Created by Data Mining Linear Equations Rules Clusters Graphs Tree Structures Recurrent Patterns.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Data Mining Techniques As Tools for Analysis of Customer Behavior
An Excel-based Data Mining Tool Chapter The iData Analyzer.
Inductive learning Simplest form: learn a function from examples
An Introduction to Data Mining. Definition  Data mining refers to the mining or discovery of new information in terms of patterns or rules from vast.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Data Mining Process A manifestation of best practices A systematic way to conduct DM projects Different groups has different versions Most common standard.
Data Mining – A First View Roiger & Geatz. Definition Data mining is the process of employing one or more computer learning techniques to automatically.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 2 Data Mining: A Closer Look Jason C. H. Chen, Ph.D. Professor of MIS School of Business Administration.
1 Statistical Techniques Chapter Linear Regression Analysis Simple Linear Regression.
Preprocessing for Data Mining Vikram Pudi IIIT Hyderabad.
Data Preprocessing Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Part I Data Mining Fundamentals Chapter 1 Data Mining: A First View Jason C. H. Chen, Ph.D. Professor.
Part I Data Mining Fundamentals. Data Mining: A First View Chapter 1.
Outline Knowledge discovery in databases. Data warehousing. Data mining. Different types of data mining. The Apriori algorithm for generating association.
Data Preprocessing Dr. Bernard Chen Ph.D. University of Central Arkansas.
Data Mining Anomaly Detection Lecture Notes for Chapter 10 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction to.
Part II Tools for Knowledge Discovery Ch 5. Knowledge Discovery in Databases Ch 6. The Data Warehouse Ch 7. Formal Evaluation Technique.
Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 4 An Excel-based Data Mining Tool (iData Analyzer) Jason C. H. Chen, Ph.D. Professor of MIS.
Chapter 14 Data Mining Transparencies. 2 Chapter Objectives u The concepts associated with data mining. u The main features of data mining operations,

1 Introduction to Data Mining C hapter 1. 2 Chapter 1 Outline Chapter 1 Outline – Background –Information is Power –Knowledge is Power –Data Mining.
Introduction to Data Mining by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003.
CSCI 347, Data Mining Evaluation: Training and Testing, Section 5.1, pages
Data Mining and Decision Support
1 Classification: predicts categorical class labels (discrete or nominal) classifies data (constructs a model) based on the training set and the values.
Data Mining Copyright KEYSOFT Solutions.
An Excel-based Data Mining Tool Chapter The iData Analyzer.
Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 3 Basic Data Mining Techniques Jason C. H. Chen, Ph.D. Professor of MIS School of Business.
The KDD Process for Extracting Useful Knowledge from Volumes of Data Fayyad, Piatetsky-Shapiro, and Smyth Ian Kim SWHIG Seminar.
Pattern Recognition Lecture 20: Data Mining 2 Dr. Richard Spillman Pacific Lutheran University.
What Is Cluster Analysis?
Introduction to Data Mining
MIS 451 Building Business Intelligence Systems
An Excel-based Data Mining Tool
Sangeeta Devadiga CS 157B, Spring 2007
Week 11 Knowledge Discovery Systems & Data Mining :
CSCI N317 Computation for Scientific Applications Unit Weka
Evaluating Logarithms
Exponential and Logarithmic Forms
Presentation transcript:

Part II Tools for Knowledge Discovery

Knowledge Discovery in Databases Chapter 5

5.1 A KDD Process Model

Figure 5.1 A seven-step KDD process model

Figure 5.2 Applyiing the scientific method to data mining

Step 1: Goal Identification Define the Problem. Choose a Data Mining Tool. Estimate Project Cost. Estimate Project Completion Time. Address Legal Issues. Develop a Maintenance Plan.

Step 2: Creating a Target Dataset

Figure 5.3 The Acme credit card database

Step 3: Data Preprocessing Noisy Data Missing Data

Noisy Data Locate Duplicate Records. Locate Incorrect Attribute Values. Smooth Data.

Preprocessing Missing Data Discard Records With Missing Values. Replace Missing Real-valued Items With the Class Mean. Replace Missing Values With Values Found Within Highly Similar Instances.

Processing Missing Data While Learning Ignore Missing Values. Treat Missing Values As Equal Compares. Treat Missing values As Unequal Compares.

Step 4: Data Transformation Data Normalization Data Type Conversion Attribute and Instance Selection

Data Normalization Decimal Scaling Min-Max Normalization Normalization using Z-scores Logarithmic Normalization

Attribute and Instance Selection Eliminating Attributes Creating Attributes Instance Selection

Step 5: Data Mining 1. Choose training and test data. 2. Designate a set of input attributes. 3. If learning is supervised, choose one or more output attributes. 4. Select learning parameter values. 5. Invoke the data mining tool.

Step 6: Interpretation and Evaluation Statistical analysis. Heuristic analysis. Experimental analysis. Human analysis.

Step 7: Taking Action Create a report. Relocate retail items. Mail promotional information. Detect fraud. Fund new research.

5.9 The Crisp-DM Process Model 1.Business understanding 2.Data understanding 3.Data preparation 4.Modeling 5.Evaluation 6.Deployment

5.10 Experimenting with ESX

A Four-Step Model for Knowledge Discovery 1.Identify the goal. 2.Prepare the data. 3.Apply data mining. 4.Interpret and evaluate the results.

Experiment 1: Attribute Evaluation *Applying the Four-Step Process Model to the Credit Screening Dataset*

Experiment 2: Parameter Evaluation *Applying the Four-Step Process Model to the Satellite Image Dataset*

Figure 5.4 Satellite image data