Download presentation
Presentation is loading. Please wait.
Published byAustin Willis Modified over 8 years ago
1
3-1 Data Mining Kelby Lee
2
3-2 Overview ¨ Transaction Database ¨ What is Data Mining ¨ Data Mining Primitives ¨ Data Mining Objectives ¨ Predictive Modeling ¨ Knowledge Discovery ¨ Other Objectives to Data Mining ¨ What Data Mining is Not ¨ Other Factors in Data Mining Categorization ¨ Conclusion
3
3-3 Transaction Database ¨ Relation Consisting of Transactions ¨ TID (Transaction Identifier) ¨ Regularities between Transaction Behavior
4
3-4 Transaction Database Table 1.1 Transaction Database TID CustomerItemDatePriceQuantity --------------------------------------------------------------------------------------------------------------------------------- 100C1chocolate01/11/20011.592 100C1ice cream01/11/20011.891 200C2chocolate01/12/20011.593 200C2candy bar01/12/20011.192 200C2jackets01/12/2001120.392 300C3jackets01/14/2001168.881 300C3color shirts01/14/200127.952 400C4jackets01/15/2001149.491
5
3-5 Association Rules ¨ A customer who buys chocolate will likely buy candy bar ¨ one type of Data Mining task
6
3-6 Discovered Rules Table 1.2 Discovered Rules RuleBought this......also bought that ------------------------------------------------------------------------------------------------- 1chocolateice cream 2candy barchocolate 3ski pantscolored shirt 4beerdiaper
7
3-7 What is Data Mining ¨ Retrieve individual elements ¨Given a name of a product, find price and producer ¨ Analysis ¨Average monthly sales amount and derivation
8
3-8 Advances Allow For ¨ Large amounts of Data to be Handled ¨ Aspect of Analysis ¨ “Data Rich” but “Knowledge Poor”
9
3-9 Discover Patterns ¨ Improve Business Performance ¨Exploit favorable patterns ¨Avoid problematic patterns ¨ Increase Understanding ¨ Predict Outcome
10
3-10 Answer the Key Business Questions ¨ Who will buy? What will they buy? How much? ¨Classification and Prediction ¨ What are the different types of Customers? ¨Segmentation of Customers
11
3-11 Answer the Key Business Questions ¨ What relationship exists between customers or Website visitors and the products? ¨Association ¨ What are the groupings hidden in the data? ¨Clustering Analysis
12
3-12 Data Mining Definition Non Trivial Extraction of implicit, previously unknown, interesting, and potentially useful information from data
13
3-13 Different Types of Data Mining ¨ Business Data Mining ¨ Scientific Data Mining ¨ Internet Data Mining
14
3-14 Data Mining Applications ¨ Medical ¨ Control Theory ¨ Engineering ¨ Public Administration ¨ Marketing and Finance ¨ Data Mining on the Web ¨ Scientific Data Base ¨ Fraud Detection
15
3-15 Data Mining Primitives ¨ Fundamental Elements Needed to Define a Data Mining Task ¨ Eight Elements (P,D,K,B,T,M,I,U) ¨8 - Tuple
16
3-16 Elements ¨ P - Problem Specification ¨ D - Task Relevant Data ¨ K - Kind of Knowledge to be Mined ¨ B - Background Knowledge ¨ T - Specific algorithms or techniques ¨ M - Models developed or knowledge patterns extracted ¨ I - Interestingness ¨ U- User
17
3-17 Diagram
18
3-18 Relationship between Elements ¨ User Defines Problem (P) and specifies Interestingness (I) ¨ Data Miner with K and T as core elements utilizing D and B and incorporates I ¨ Data Miner produces M
19
3-19 Data Mining Objectives ¨ Discovery ¨Finding human interpretable patterns describing the data ¨ Prediction ¨Using some variables or fields in database to predict unknown or future values or other variables of interest
20
3-20 Data Mining Objectives ¨ Knowledge Discovery ¨Stage somewhat prior to prediction where information is insufficient ¨Closer to decision support
21
3-21 Predictive Modeling ¨ Predict Values Based on Similar Groups of Data ¨ Submit records with some unknown fields and system will predict value
22
3-22 Predictive Modeling ¨ Pattern Recognition ¨Association of an observation to past experience or knowledge ¨Interchangeable with classification
23
3-23 Predictive Modeling ¨ Classification ¨Process of assigning finite set of labels to an observation ¨ Estimation ¨Assign infinite number of numeric labels to an observation
24
3-24 Knowledge Discovery ¨ Find Patterns in Data Base ¨If someone buys one thing, what else will they buy ¨ Interesting + Certain = Knowledge ¨Output called Discovered Knowledge ¨ KDD - Knowledge Discovery in Data Base
25
3-25 Data Mining ¨ Is about why, about hidden regularities, important aspect related to perception, learning and evolving ¨ Decision support process in which we search patterns of information in data ¨Once found, display in suitable format
26
3-26 Four Points of KDD ¨ Discovered Knowledge Represented in High-Level Language ¨ Accurately Portray contents of Database ¨ Interesting to user ¨ Process is Efficient
27
3-27 Important Issues ¨ Human Centered ¨Under control of human user to meet human needs ¨ Incorporate Interestingness ¨ Provide Various Types ¨ Provide Visualization
28
3-28 Other Objectives ¨ Forensic analysis ¨Applying extracted patterns to find anomalous or unusual data elements largely involved in business applications ¨Find out what the norm is and find those that deviate from the norm
29
3-29 What Data Mining is Not ¨ Analysis vs Monitoring ¨Analysis - previously collected information ¨Monitoring ¨ Collect data as it comes in and compare to set of conditions ¨ Unexpected Discovery ¨Must have general goal in mind
30
3-30 Other Factors in Categorization ¨ Data Retention ¨Data is retained for future pattern matching ¨ Pattern Distillation ¨Analyse data, extract pattern, leave data behind
31
3-31 Conclusion ¨ Transaction Database ¨ What is Data Mining ¨ Data Mining Primitives ¨ Data Mining Objectives ¨ Predictive Modeling ¨ Knowledge Discovery ¨ Other Objectives to Data Mining ¨ What Data Mining is Not ¨ Other Factors in Data Mining Categorization
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.