Presentation is loading. Please wait.

Presentation is loading. Please wait.

Overview of Data Mining Methods Data mining techniques What techniques do, examples, advantages & disadvantages.

Similar presentations


Presentation on theme: "Overview of Data Mining Methods Data mining techniques What techniques do, examples, advantages & disadvantages."— Presentation transcript:

1 Overview of Data Mining Methods Data mining techniques What techniques do, examples, advantages & disadvantages

2 結束 4-2Contents Reviews data mining tools Compares data mining perspectives Discusses data mining functions Presents four sets of data used to demonstrate tools in subsequent chapters Shows the Enterprise Miner structure for data mining analysis in the appendix

3 結束 4-3 Data mining applications Automobile insurance company: Fraud detection Business applications: loan evaluation, customer segmentation, employee evaluation… Data mining tools categorized by the tasks of classification, estimation, prediction, clustering, and summarization. Classification, estimation, prediction are predictive, while clustering and summarization are descriptive.

4 結束 4-4History Statistics AI:  genetic algorithms, neural networks analogies with biology  memory-based reasoning  link analysis from graph theory See table. 4.1

5 結束 4-5 Data mining perspectives Methods can be viewed from different perspectives, data mining methods include:  Cluster analysis (Chapter 5)  Regression of various forms (best fit methods, chapter 6)  Discriminant analysis (use of regression for classification, chapter 6)  Line fitting through the operations research tool of multiple objective linear programming (Chapter 9) AI:  ANN (chapter 7)  Rule induction (decision trees, chapter 8)  Genetic algorithms (supplement) See page 55 for more descriptions

6 結束 4-6Techniques Statistical  Market-Basket Analysis - find groups of items  Memory-Based Reasoning - case based  Cluster Detection - undirected (quantitative) Artificial Intelligence  Link Analysis - MCI ’ s Friends & Family  Decision Trees, Rule Induction - production rule  Neural Networks - automatic pattern detection  Genetic Algorithms - keep best parameters

7 結束 4-7Models Regression:Y = a + bX Classification:assign new record to class Predictive:assign value to new record Clustering:groups for data Time-series:assign future value Links:patterns in data

8 結束 4-8Fitting Underfitting: not enough detail  leave out important variables Overfitting: too much detail  memorizes training set, but doesn ’ t help with new data data set too small redundancy in data

9 結束 4-9 Comparison of Features RulesNeural NetCaseBaseGenetic Noisy dataGoodVery goodGoodVery good Missing dataGood Very goodGood Large setsVery goodPoorGood Different typesGoodNumericalVery goodTransform AccuracyHighVery highHigh ExplanationVery goodPoorVery goodGood IntegrationGood Very good EaseEasyDifficultEasyDifficult

10 結束 4-10 Data Mining Functions Classification  Identify categories in data Prediction  Formula to predict future observations Association  Rules using relationships among entities Detection  Anomalies (unusual) & irregularities (fraud detection)

11 結束 4-11 Financial Applications TechniqueApplicationProblem Type Neural netForecast stock pricePrediction NN, Rule Forecast bankruptcy Fraud detection Prediction Detection NN, CaseForecast interest ratePrediction NN, visualLate loan detectionDetection Rule Credit assessment Risk classification Prediction Classification Rule, Case Corporate bond rate ( 公司債 ) Prediction

12 結束 4-12 Telecom Applications TechniqueApplicationProblem Type Neural net, Rule induction Forecast network behavior. Prediction Rule induction Churn Fraud detection Classification Detection Case basedCall trackingClassification

13 結束 4-13 Marketing Applications TechniqueApplicationProblem Type Rule induction Market segment Cross-selling Classification Association Rule induction, visual Lifestyle analysis Performance analysis. Classification Association Rule induction, genetic, visual Reaction to promotion Prediction Case basedOnline sales supportClassification

14 結束 4-14 Web Applications TechniqueApplicationProblem Type Rule induction, Visualization User browsing similarity analysis. Classification, Association Rule-based heuristics Web page content similarity Association

15 結束 4-15 Other Applications TechniqueApplicationProblem Type Neural netSoftware costDetection Neural net, rule induction Litigation assessmentPrediction Rule induction Insurance fraud Healthcare except. Detection Case based Insurance claim Software quality Prediction Classification Genetic algorithmBudget spendingClassification

16 結束 4-16 Data Sets Loan Applications  classification Job Applications  classification Insurance Fraud  detection Expenditure Data  prediction

17 結束 4-17 Loan Data 650 observations OUTCOMES (binary):  On-timecost of error: $300  Late (default)cost of error: $2,000 Variables  Age, Income, Assets, Debts, Want, Credit Credit ordinal  Transform: Assets, Debts, & Want →Risk

18 結束 4-18 Job Application Data 500 observations OUTCOMES (ordinal):  Unacceptable  Minimal  Acceptable  Excellent Variables  Age, State, Degree, Major, Experience State nominal; degree & major ordinal State is superfluous

19 結束 4-19 Insurance Claim Data 5000 observations OUTCOMES (binary):  OKcost of error $500  Fraudulentcost of error $2,500 Variables  Age, Gender, Claim, Tickets, Prior claims, Attorney Gender & attorney nominal, tickets & prior claims categorical

20 結束 4-20 Expenditure Data 10,000 observations OUTCOMES:  Could predict response in a number of categories  Others Variables:  Age, Gender, Marital, Dependents, Income, Job years, Town years, Education years, Drivers license, Own home, Number of credit cards  Churn, proportion of income spent on seven categories


Download ppt "Overview of Data Mining Methods Data mining techniques What techniques do, examples, advantages & disadvantages."

Similar presentations


Ads by Google