Download presentation
Presentation is loading. Please wait.
1
Week 9 Data Mining System (Knowledge Data Discovery)
2
Case Scenario ABC Enterprise is a multinational company that offers multimedia content services in several regions in Asia. It has more than 6 millions content subscribers. For a company of this size, another major problem is to maintain good relationship with their existing content subscribers. Every year, they have to offer good content promotion to suit their customer needs. However, this is a difficult task because they have huge collection of data about their subscribers which have different needs and lifestyle. Therefore, the CEO of the company, Mr. Ridzuan wishes that there is a system that can be built to analyze enormous data about their subscribers and can suggest what kind of content promotions suitable for them.
3
Knowledge Discovery & Data Mining Knowledge Discovery (KD) is a process of extracting previously unknown, valid, and actionable (understandable) information from large databases. Data mining is a step in the KDD process of applying data analysis and discovery algorithms. Relates to machine learning, pattern recognition, statistics, data visualization etc.
4
Knowledge discovery in databases (KDD) is the non-trivial process of identifying valid, potentially useful and ultimately understandable patterns in data. Clean, Collect, Summarize Data Warehouse Data Preparation Training Data Mining Model Patterns Verification, Evaluation Operational Databases
5
Why Mine Data? Huge amounts of data being collected and warehoused Walmart records 20 millions per day health care transactions: multi-gigabyte databases Mobil Oil: geological data of over 100 terabytes Affordable computing Competitive pressure gain an edge by providing improved, customized services information as a product in its own right
6
Data Mining Methods Prediction Methods using some variables to predict unknown or future values of other variables Descriptive Methods finding human-interpretable patterns describing the data
7
Data Mining Tasks Classification Clustering Association Rule Discovery Sequential Pattern Discovery
8
1. Classification Data defined in terms of attributes, one of which is the class. Find a model for class attribute as a function of the values of other(predictor) attributes, such that previously unseen records can be assigned a class as accurately as possible.
9
Classification:Example
10
Classification: Direct Marketing Goal: Reduce cost of soliciting (mailing) by targeting a set of consumers likely to buy a new product. Data for similar product introduced earlier we know which customers decided to buy and which did not {buy, not buy} class attribute collect various demographic, lifestyle, and company related information about all such customers - as possible predictor variables. Learn classifier model
11
2. Clustering Given a set of data points, each having a set of attributes, and a similarity measure among them, find clusters such that data points in one cluster are more similar to one another data points in separate clusters are less similar to one another. Similarity measures Euclidean distance if attributes are continuous Problem specific measures
12
Clustering: Market Segmentation Goal: subdivide a market into distinct subsets of customers where any subset may conceivably be selected as a market target to be reached with a distinct marketing mix. Approach: collect different attributes on customers based on geographical, and lifestyle related information identify clusters of similar customers measure the clustering quality by observing buying patterns of customers in same cluster vs. those from different clusters.
13
3. Association Rule Discovery Given a set of records, each of which contain some number of items from a given collection: produce dependency rules which will predict occurrence of an item based on occurences of other items
14
Association Rule Discovery Marketing and Sales Promotion Application
15
4. Sequential Pattern Discovery Given: set of objects, each associated with its own timeline of events, find rules that predict strong sequential dependencies among different events, of the form (A B) (C) (D E) --> (F)
16
Sequential Pattern Discovery: Examples sequences in which customers purchase goods/services understanding long term customer behavior -- timely promotions. In point-of--sale transaction sequences Athletic Apparel Store: (Shoes) (Racket, Racketball) --> (Sports Jacket)
17
Data Mining Systems Clementine (SPSS) http://www.spss.com/spssbi/clementine/index.htm Data Miner (Statistica) http://www.statsoft.com/dataminer.html RuleQuest (C5.0) http://www.rulequest.com/
18
Limitation/Challenges large data number of variables (features), number of cases (examples) multi gigabyte, terabyte databases efficient algorithms, parallel processing high dimensionality large number of features: exponential increase in search space (potential for spurious patterns) Use of domain knowledge utilizing knowledge on complex data relationships, known facts
19
Intelligence Density Dimension Accuracy Explainability Flexibility Response speed
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.