Presentation is loading. Please wait.

Presentation is loading. Please wait.

DATA MINING 1. 2 Data Mining Extracting or “mining” knowledge from large amounts of data Data mining is the process of autonomously retrieving useful.

Similar presentations


Presentation on theme: "DATA MINING 1. 2 Data Mining Extracting or “mining” knowledge from large amounts of data Data mining is the process of autonomously retrieving useful."— Presentation transcript:

1 DATA MINING 1

2 2 Data Mining Extracting or “mining” knowledge from large amounts of data Data mining is the process of autonomously retrieving useful information or knowledge from large data stores or sets. Data mining is a technique for searching large-scale databases for patterns used mainly to find previously unknown correlations between variables.

3 Data Mining Motivation Changes in the Business Environment Customers becoming more demanding Markets are saturated Databases today are huge: More than 1,000,000 entities/records/rows From 10 to 10,000 fields/attributes/variables Gigabytes and terabytes Databases a growing at an unprecedented rate Decisions must be made rapidly Decisions must be made with maximum knowledge 3

4 4 Data Mining: Confluence of Multiple Disciplines Data Mining Database Technology Statistics Machine Learning A.I. Algorithm Other Disciplines Visualization

5 5 VISULIZATION The visual interpretation of complex relationships in multidimensional data. Graphics tools are used to illustrate data relationships. Statistic In data mining it is used for classifying and grouping things Machine learning the ability of a machine to improve its performance based on previous results. Artificial Intelligence the branch of computer science that deal with writing computer programs that can solve problems creatively Algorithm precise rule (or set of rules) specifying how to solve some problem

6 Why Not Traditional Data Analysis Tremendous amount of data High complexity of data 6

7 KDD is non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data Data Mining is a step in KDD process consisting of particular data mining algorithms 7 Knowledge discovery in database

8 8 Data Mining (cont.) Data Mining is a step of Knowledge Discovery in Databases (KDD) Process Data Warehousing Data Selection Data Preprocessing Data Transformation Data Mining Interpretation/Evaluation Data Mining is sometimes referred to as KDD and DM and KDD tend to be used as synonyms

9 Steps of a KDD Process Learning the application domain: relevant prior knowledge and goals of application data selection Creating a target data set: Data cleaning and preprocessing: Data reduction and transformation: Find useful features, dimensionality/variable reduction, invariant representation. Choosing functions of data mining summarization, classification, regression, association, clustering. 9

10 Choosing the mining algorithm(s) Data mining: search for patterns of interest Pattern evaluation and knowledge presentation visualization, transformation, removing redundant patterns, etc. Use of discovered knowledge 10

11 DATA MININING EVALUTION 11 Data Integration Databases Data Warehouse Task-relevant Data Selection Data Mining Pattern Evaluation

12 Data Mining: On What Kind of Data? Relational databases Data warehouses Transactional databases Advanced DB and information repositories Object-oriented and object-relational databases Spatial databases Text databases and multimedia databases WWW 12

13 13 Data Mining Applications: Banking: loan/credit card approval predict good customers based on old customers Targeted marketing: identify likely responders to promotions Fraud detection: telecommunications, financial transactions from an online stream of event identify fraudulent events

14 14 Data Mining Applications: Medicine: disease outcome, effectiveness of treatments analyze patient disease history: find relationship between diseases Molecular/Pharmaceutical: identify new drugs Scientific data analysis: identify new galaxies by searching for sub clusters Web site/store design and promotion: find affinity of visitor to pages and

15 15 Financial Industry, Banks, Businesses, E-commerce Stock and investment analysis Identify loyal customers vs. risky customer Predict customer spending Risk management Sales forecasting

16 16 Data Mining in CRM: Customer Life Cycle Customer Life Cycle The stages in the relationship between a customer and a business Key stages in the customer lifecycle Prospects: people who are not yet customers but are in the target market Responders: prospects who show an interest in a product or service Active Customers: people who are currently using the product or service Former Customers: may be “bad” customers who did not pay their bills or who incurred high costs

17 17 Data Mining in CRM DM helps to Determine the behavior surrounding a particular lifecycle event Find other people in similar life stages and determine which customers are following similar behavior patterns

18 18 Data Mining in CRM (cont.) Data Warehouse Data Mining Campaign Management Customer Profile Customer Life Cycle Info.

19 19 Data Mining Techniques Descriptive Clustering Association Sequential Analysis Predictive Classification Decision Tree Rule Induction Neural Networks Nearest Neighbor Classification Regression

20 Predictive DM 20 Predictive data mining, which produces the model of the system described by the given data set build models in order to estimate unknown values of interest. Examples: Given a customer’s characteristics a model predicts how much the customer will spend on the next catalog order.

21 21 Descriptive DM Descriptive data mining, which produces new, nontrivial information based on the available data set. Descriptive DM is used to learn about and understand the data. Example: Identify and describe groups of customers with common buying behavior

22 22 Classification Classification is the process of sub-dividing a data set with regard to a number of specific outcomes. Example Given old data about customers and payments, predict new applicant’s loan eligibility.

23 23 Decision Trees haireyesclass brownblueA brownbrownB redblueA darkblueB brownblueA darkbrownB brownbrownB Tree where internal nodes are simple decision rules on one or more attributes and leaf nodes are predicted class labels

24 24 Decision Trees: Learned Predictive Rules hair eyes B B A A dark red brown bluebrown

25 25 Rule induction In rule induction action are given and we have to discover the rule. The extraction of useful if-then rules from data based on statistical significance. Rule induction is an area of machine learning in which formal rules are extracted from a set of observations. Examples Do not give the discount on 2 items that are frequently brought. use the discount on 1 to pull the others. Send camcorder offer to VCR purchasers 2-3 months after VCR purchase.

26 26 NEUTAL NETWORK Set of nodes connected by directed weighted edges Useful for learning complex data like handwriting, speech and image recognition. Neural networks have broad applicability to real world business problems and have already been successfully applied in many industries. Since neural networks are best at identifying patterns or trends in data, they are well suited for prediction or forecasting needs including:

27 27

28 NEAREST NEIGHBOUR MEHTOD The nearest neighbor algorithm in pattern recognition is a method for classifying phenomena based upon observable features. Define proximity between instances, find neighbors of new instance and assign majority class. The nearest neighbor algorithm is a heuristic algorithm that is not guaranteed to produce a correct result in most cases.

29 29 Clustering The art of finding groups in data. Objective: gather items from a database into sets according to (unknown) common characteristics. Group existing customers based on time series of payment history such that similar customers in same cluster. Key requirement: Need a good measure of similarity between instances.

30 Major issues in data mining 30 Mining different kinds of knowledge in databases. Expression and visualization of data mining results. Handling noise and incomplete data. Pattern evaluation: the interestingness problem. Efficiency and scalability of data mining algorithms. Handling relational and complex types of data.


Download ppt "DATA MINING 1. 2 Data Mining Extracting or “mining” knowledge from large amounts of data Data mining is the process of autonomously retrieving useful."

Similar presentations


Ads by Google