Download presentation
Presentation is loading. Please wait.
1
Part I Data Mining Fundamentals
2
Data Mining: A First View Chapter 1
3
1.1 Data Mining: A Definition
4
Data Mining The process of employing one or more computer learning techniques to automatically analyze and extract knowledge from data.
5
Induction-based Learning The process of forming general concept definitions by observing specific examples of concepts to be learned.
6
Knowledge Discovery in Databases (KDD) The application of the scientific method to data mining. Data mining is one step of the KDD process.
7
Data Mining: A KDD Process –Data mining: the core of knowledge discovery process. Data Cleaning Data Integration Databases Data Warehouse Task-relevant Data Selection Data Mining Pattern Evaluation
8
1.2 What Can Computers Learn?
9
Four Levels of Learning Facts Concepts Procedures (to be worked out) Principles
10
Concepts Computers are good at learning concepts. Concepts are the output of a data mining session.
11
Three Concept Views Classical View (Crisp)---old hands –As a definition Probabilistic View (85%)---with some experience –DM rules with confidence Exemplar View (CBR)—new comer An illustrated example: –good credit?
12
Supervised Learning Build a learner model using data instances of known origin. Use the model to determine the outcome new instances of unknown origin.
13
Supervised Learning: A Decision Tree Example
14
Decision Tree A tree structure where non-terminal nodes represent tests on one or more attributes and terminal nodes reflect decision outcomes.
16
Figure 1.1 A decision tree for the data in Table 1.1
18
Production Rules IF Swollen Glands = Yes THEN Diagnosis = Strep Throat IF Swollen Glands = No & Fever = Yes THEN Diagnosis = Cold IF Swollen Glands = No & Fever = No THEN Diagnosis = Allergy
19
Unsupervised Clustering A data mining method that builds models from data without predefined classes.
21
3 groups formed (table 1.3 is only a part of whole table) G1.MarginAccount=yes and age =20-29 and AnnualIncome=40-59k accuracy=80% coverage=0.5 G2. AccountType=Custodial and FavoriteRecreation=Skiing and AnnualIncome=40-59k accuracy=95% coverage=0.35 G3.AccountType=joint and Trades/Month>5 and TransactionMethod=online accuracy=82% coverage=0.65
22
1.3 Is Data Mining Appropriate for My Problem?
23
Data Mining or Data Query? Shallow Knowledge (SQL) Multidimensional Knowledge (OLAP) Hidden Knowledge (DM) Deep Knowledge (human)
24
Data Mining vs. Data Query: An Example Use data query if you already almost know what you are looking for. Use data mining to find regularities in data that are not obvious.
25
1.4 Expert Systems or Data Mining?
26
圖14-2 專家系統架構細部圖
27
Expert System A computer program that emulates the problem-solving skills of one or more human experts.
28
Knowledge Engineer A person trained to interact with an expert in order to capture their knowledge.
29
Figure 1.2 Data mining vs. expert systems
30
1.5 A Simple Data Mining Process Model
31
Figure 1.3 A simple data mining process model
32
Assembling the Data The Data Warehouse Relational Databases and Flat Files
33
Mining the Data
34
Interpreting the Results
35
Result Application
36
1.6 Why Not Simple Search? Nearest Neighbor Classifier (i.e., CBA, add a new instance in a class based on similarity) –Time consuming and entropy independent K-nearest Neighbor Classifier –Form a class consisting of K-nearest neighbors
37
Assignment 4 A new instance, Patient ID=14, Sore Throat=yes, Fever =No, Swollen Glands=No, Congestion =No, Headache =No Comparison: with one matched attribute: ID=1,9 with one matched attribute: ID=2,5,10 with one matched attribute: ID=3,6,7,8 with one matched attribute: ID=4 strep throat? Correct diagnosis should be allergy using decision tree Q: Try K-nearest Neighbor Classifier
38
1.7 Data Mining Applications
39
Customer Intrinsic Value
40
Figure 1.4 Intrinsic vs. actual customer value
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.