DATA MINING It is a process of extracting interesting(non trivial, implicit, previously, unknown and useful ) information from any data repository. The process of analyzing data from different perceptive and summarizing it into useful information
APPLICATIONS Retail/Marketing Identifying buying patterns of customers Predicting response to mailing campaigns Market basket analysis Banking Detecting patterns of fraudulent credit card use Identifying loyal customers
APPLICATIONS Insurance Claims analysis Predicting which customers will buy new policies Telecommunications: phone-call fraud Phone call model: destination of the call, duration, time of day or week. Analyze patterns that deviate from an expected norm
APPLICATIONS Web mining Web is a big information network: from Page Rank to Google Analysis of Web information networks
ASSOCIATION RULE MINING aims to establish links, called associations, between the individual records, or sets of records, in a database. In data mining association rules are useful for analyzing and predicting customer behavior.
EXAMPLE "If a customer buys a dozen eggs, he is 80% likely to also purchase milk." An association rule has two parts, an antecedent (if) and a consequent (then). An antecedent is an item found in the data. A consequent is an item that is found in combination with the antecedent.
RULES FOR ASSOCIATION MINING Given a data set, find the items in the data that are associated with each other Association is measured as frequency of occurrence in the same context
Market basket analysis TID items 1 { bread, milk } 2 { bread, diapers, beer, eggs} 3 { bread, diapers, beer, cola} 4 {bread, milk, diapers, beer} 5 {bread, milk, diapers, cola} What is the association: {diapers, milk} - > {beer, cola}?
The CRISP-DM Model Phase Business understanding Data understanding Data preparation Modeling Evaluation Deployment
Business understanding The various tasks involved determine business objectives; determine data mining goal; and produce a project plan.
Data understanding collect initial data; describe data; explore data; and verify data quality. The tasks involved in this phase Data preparation select data; clean data; construct data; integrate data; and format data.
Modeling The tasks in this phase are select modeling technique; generate test design; build model; and assess model. Evaluation evaluate results; review process; and determine next steps.
Deployment plan deployment; plan monitoring and maintenance; produce final report; and review report.