Download presentation
Presentation is loading. Please wait.
Published byMervin Rogers Modified over 9 years ago
1
DATA MINING 1
2
2 Data Mining Extracting or “mining” knowledge from large amounts of data Data mining is the process of autonomously retrieving useful information or knowledge from large data stores or sets. Data mining is a technique for searching large-scale databases for patterns used mainly to find previously unknown correlations between variables.
3
Data Mining Motivation Changes in the Business Environment Customers becoming more demanding Markets are saturated Databases today are huge: More than 1,000,000 entities/records/rows From 10 to 10,000 fields/attributes/variables Gigabytes and terabytes Databases a growing at an unprecedented rate Decisions must be made rapidly Decisions must be made with maximum knowledge 3
4
4 Data Mining: Confluence of Multiple Disciplines Data Mining Database Technology Statistics Machine Learning A.I. Algorithm Other Disciplines Visualization
5
5 VISULIZATION The visual interpretation of complex relationships in multidimensional data. Graphics tools are used to illustrate data relationships. Statistic In data mining it is used for classifying and grouping things Machine learning the ability of a machine to improve its performance based on previous results. Artificial Intelligence the branch of computer science that deal with writing computer programs that can solve problems creatively Algorithm precise rule (or set of rules) specifying how to solve some problem
6
Why Not Traditional Data Analysis Tremendous amount of data High complexity of data 6
7
KDD is non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data Data Mining is a step in KDD process consisting of particular data mining algorithms 7 Knowledge discovery in database
8
8 Data Mining (cont.) Data Mining is a step of Knowledge Discovery in Databases (KDD) Process Data Warehousing Data Selection Data Preprocessing Data Transformation Data Mining Interpretation/Evaluation Data Mining is sometimes referred to as KDD and DM and KDD tend to be used as synonyms
9
Steps of a KDD Process Learning the application domain: relevant prior knowledge and goals of application data selection Creating a target data set: Data cleaning and preprocessing: Data reduction and transformation: Find useful features, dimensionality/variable reduction, invariant representation. Choosing functions of data mining summarization, classification, regression, association, clustering. 9
10
Choosing the mining algorithm(s) Data mining: search for patterns of interest Pattern evaluation and knowledge presentation visualization, transformation, removing redundant patterns, etc. Use of discovered knowledge 10
11
DATA MININING EVALUTION 11 Data Integration Databases Data Warehouse Task-relevant Data Selection Data Mining Pattern Evaluation
12
Data Mining: On What Kind of Data? Relational databases Data warehouses Transactional databases Advanced DB and information repositories Object-oriented and object-relational databases Spatial databases Text databases and multimedia databases WWW 12
13
13 Data Mining Applications: Banking: loan/credit card approval predict good customers based on old customers Targeted marketing: identify likely responders to promotions Fraud detection: telecommunications, financial transactions from an online stream of event identify fraudulent events
14
14 Data Mining Applications: Medicine: disease outcome, effectiveness of treatments analyze patient disease history: find relationship between diseases Molecular/Pharmaceutical: identify new drugs Scientific data analysis: identify new galaxies by searching for sub clusters Web site/store design and promotion: find affinity of visitor to pages and
15
15 Financial Industry, Banks, Businesses, E-commerce Stock and investment analysis Identify loyal customers vs. risky customer Predict customer spending Risk management Sales forecasting
16
16 Data Mining in CRM: Customer Life Cycle Customer Life Cycle The stages in the relationship between a customer and a business Key stages in the customer lifecycle Prospects: people who are not yet customers but are in the target market Responders: prospects who show an interest in a product or service Active Customers: people who are currently using the product or service Former Customers: may be “bad” customers who did not pay their bills or who incurred high costs
17
17 Data Mining in CRM DM helps to Determine the behavior surrounding a particular lifecycle event Find other people in similar life stages and determine which customers are following similar behavior patterns
18
18 Data Mining in CRM (cont.) Data Warehouse Data Mining Campaign Management Customer Profile Customer Life Cycle Info.
19
19 Data Mining Techniques Descriptive Clustering Association Sequential Analysis Predictive Classification Decision Tree Rule Induction Neural Networks Nearest Neighbor Classification Regression
20
Predictive DM 20 Predictive data mining, which produces the model of the system described by the given data set build models in order to estimate unknown values of interest. Examples: Given a customer’s characteristics a model predicts how much the customer will spend on the next catalog order.
21
21 Descriptive DM Descriptive data mining, which produces new, nontrivial information based on the available data set. Descriptive DM is used to learn about and understand the data. Example: Identify and describe groups of customers with common buying behavior
22
22 Classification Classification is the process of sub-dividing a data set with regard to a number of specific outcomes. Example Given old data about customers and payments, predict new applicant’s loan eligibility.
23
23 Decision Trees haireyesclass brownblueA brownbrownB redblueA darkblueB brownblueA darkbrownB brownbrownB Tree where internal nodes are simple decision rules on one or more attributes and leaf nodes are predicted class labels
24
24 Decision Trees: Learned Predictive Rules hair eyes B B A A dark red brown bluebrown
25
25 Rule induction In rule induction action are given and we have to discover the rule. The extraction of useful if-then rules from data based on statistical significance. Rule induction is an area of machine learning in which formal rules are extracted from a set of observations. Examples Do not give the discount on 2 items that are frequently brought. use the discount on 1 to pull the others. Send camcorder offer to VCR purchasers 2-3 months after VCR purchase.
26
26 NEUTAL NETWORK Set of nodes connected by directed weighted edges Useful for learning complex data like handwriting, speech and image recognition. Neural networks have broad applicability to real world business problems and have already been successfully applied in many industries. Since neural networks are best at identifying patterns or trends in data, they are well suited for prediction or forecasting needs including:
27
27
28
NEAREST NEIGHBOUR MEHTOD The nearest neighbor algorithm in pattern recognition is a method for classifying phenomena based upon observable features. Define proximity between instances, find neighbors of new instance and assign majority class. The nearest neighbor algorithm is a heuristic algorithm that is not guaranteed to produce a correct result in most cases.
29
29 Clustering The art of finding groups in data. Objective: gather items from a database into sets according to (unknown) common characteristics. Group existing customers based on time series of payment history such that similar customers in same cluster. Key requirement: Need a good measure of similarity between instances.
30
Major issues in data mining 30 Mining different kinds of knowledge in databases. Expression and visualization of data mining results. Handling noise and incomplete data. Pattern evaluation: the interestingness problem. Efficiency and scalability of data mining algorithms. Handling relational and complex types of data.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.