Overview of Data Mining Methods Data mining techniques What techniques do, examples, advantages & disadvantages.

Slides:



Advertisements
Similar presentations
Introduction to Data Mining with XLMiner
Advertisements

Data Mining: A Closer Look Chapter Data Mining Strategies.
Chapter 9 Business Intelligence Systems
DATA MINING CS157A Swathi Rangan. A Brief History of Data Mining The term “Data Mining” was only introduced in the 1990s. Data Mining roots are traced.
Data Mining By Archana Ketkar.
Data Mining Adrian Tuhtan CS157A Section1.
Data Mining – Intro.
CS157A Spring 05 Data Mining Professor Sin-Min Lee.
Data Mining: A Closer Look Chapter Data Mining Strategies 2.
Chapter 5 Data mining : A Closer Look.
TURKISH STATISTICAL INSTITUTE INFORMATION TECHNOLOGIES DEPARTMENT (Muscat, Oman) DATA MINING.
Enterprise systems infrastructure and architecture DT211 4
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
Dr. Awad Khalil Computer Science Department AUC
Data Mining By Jason Baltazar, Phil Cademas, Jillian Latham, Rachel Peeler & Kamila Singh.
Shilpa Seth.  What is Data Mining What is Data Mining  Applications of Data Mining Applications of Data Mining  KDD Process KDD Process  Architecture.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Data Mining Techniques As Tools for Analysis of Customer Behavior
Data Mining Chun-Hung Chou
Enabling Organization-Decision Making
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
Data Mining Joyeeta Dutta-Moscato July 10, Wherever we have large amounts of data, we have the need for building systems capable of learning information.
Chapter 13 Genetic Algorithms. 2 Data Mining Techniques So Far… Chapter 5 – Statistics Chapter 6 – Decision Trees Chapter 7 – Neural Networks Chapter.
Anomaly detection with Bayesian networks Website: John Sandiford.
Using Neural Networks in Database Mining Tino Jimenez CS157B MW 9-10:15 February 19, 2009.
Chapter 6 Regression Algorithms in Data Mining
Lecture 9: Knowledge Discovery Systems Md. Mahbubul Alam, PhD Associate Professor Dept. of AEIS Sher-e-Bangla Agricultural University.
More value from data using Data Mining Allan Mitchell SQL Server MVP.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Chapter 7 Neural Networks in Data Mining Automatic Model Building (Machine Learning) Artificial Intelligence.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Zhangxi Lin ISQS Texas Tech University Note: Most slides are from Decision Tree Modeling by SAS Lecture Notes 5 Auxiliary Uses of Trees.
Neural Networks Automatic Model Building (Machine Learning) Artificial Intelligence.
Amer Kanj Data Mining For Business Professionals.
1 Introduction to Neural Networks And Their Applications.
Regression Models Fit data Time-series data: Forecast Other data: Predict.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Guest Lecture Introduction to Data Mining Dr. Bhavani Thuraisingham September 17, 2010.
CS157B Fall 04 Introduction to Data Mining Chapter 22.3 Professor Lee Yu, Jianji (Joseph)
Data Mining In contrast to the traditional (reactive) DSS tools, the data mining premise is proactive. Data mining tools automatically search the data.
Jennifer Lewis Priestley Presentation of “Assessment of Evaluation Methods for Prediction and Classification of Consumer Risk in the Credit Industry” co-authored.
CRM - Data mining Perspective. Predicting Who will Buy Here are five primary issues that organizations need to address to satisfy demanding consumers:
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
1 STAT 5814 Statistical Data Mining. 2 Use of SAS Data Mining.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
Overview of Methods Data mining techniques What techniques do, examples, advantages & disadvantages.
An Introduction Student Name: Riaz Ahmad Program: MSIT( ) Subject: Data warehouse & Data Mining.
Data Mining and Decision Support
Data Mining Copyright KEYSOFT Solutions.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Clustering Algorithms Minimize distance But to Centers of Groups.
Monday, February 22,  The term analytics is often used interchangeably with:  Data science  Data mining  Knowledge discovery  Extracting useful.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
McGraw-Hill/Irwin © 2003 The McGraw-Hill Companies, Inc.,All Rights Reserved. Part Four ANALYSIS AND PRESENTATION OF DATA.
Data Resource Management – MGMT An overview of where we are right now SQL Developer OLAP CUBE 1 Sales Cube Data Warehouse Denormalized Historical.
DATA MINING and VISUALIZATION Instructor: Dr. Matthew Iklé, Adams State University Remote Instructor: Dr. Hong Liu, Embry-Riddle Aeronautical University.
Business Intelligence and Decision Support Systems (9 th Ed., Prentice Hall) Chapter 6: Artificial Neural Networks for Data Mining.
Data Mining.
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining – Intro.
DATA MINING © Prentice Hall.
David L. Olson Department of Management University of Nebraska
Data Mining 101 with Scikit-Learn
Adrian Tuhtan CS157A Section1
Gerd Kortemeyer, William F. Punch
Dr. Morgan C. Wang Department of Statistics
Supporting End-User Access
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Presentation transcript:

Overview of Data Mining Methods Data mining techniques What techniques do, examples, advantages & disadvantages

結束 4-2Contents Reviews data mining tools Compares data mining perspectives Discusses data mining functions Presents four sets of data used to demonstrate tools in subsequent chapters Shows the Enterprise Miner structure for data mining analysis in the appendix

結束 4-3 Data mining applications Automobile insurance company: Fraud detection Business applications: loan evaluation, customer segmentation, employee evaluation… Data mining tools categorized by the tasks of classification, estimation, prediction, clustering, and summarization. Classification, estimation, prediction are predictive, while clustering and summarization are descriptive.

結束 4-4History Statistics AI:  genetic algorithms, neural networks analogies with biology  memory-based reasoning  link analysis from graph theory See table. 4.1

結束 4-5 Data mining perspectives Methods can be viewed from different perspectives, data mining methods include:  Cluster analysis (Chapter 5)  Regression of various forms (best fit methods, chapter 6)  Discriminant analysis (use of regression for classification, chapter 6)  Line fitting through the operations research tool of multiple objective linear programming (Chapter 9) AI:  ANN (chapter 7)  Rule induction (decision trees, chapter 8)  Genetic algorithms (supplement) See page 55 for more descriptions

結束 4-6Techniques Statistical  Market-Basket Analysis - find groups of items  Memory-Based Reasoning - case based  Cluster Detection - undirected (quantitative) Artificial Intelligence  Link Analysis - MCI ’ s Friends & Family  Decision Trees, Rule Induction - production rule  Neural Networks - automatic pattern detection  Genetic Algorithms - keep best parameters

結束 4-7Models Regression:Y = a + bX Classification:assign new record to class Predictive:assign value to new record Clustering:groups for data Time-series:assign future value Links:patterns in data

結束 4-8Fitting Underfitting: not enough detail  leave out important variables Overfitting: too much detail  memorizes training set, but doesn ’ t help with new data data set too small redundancy in data

結束 4-9 Comparison of Features RulesNeural NetCaseBaseGenetic Noisy dataGoodVery goodGoodVery good Missing dataGood Very goodGood Large setsVery goodPoorGood Different typesGoodNumericalVery goodTransform AccuracyHighVery highHigh ExplanationVery goodPoorVery goodGood IntegrationGood Very good EaseEasyDifficultEasyDifficult

結束 4-10 Data Mining Functions Classification  Identify categories in data Prediction  Formula to predict future observations Association  Rules using relationships among entities Detection  Anomalies (unusual) & irregularities (fraud detection)

結束 4-11 Financial Applications TechniqueApplicationProblem Type Neural netForecast stock pricePrediction NN, Rule Forecast bankruptcy Fraud detection Prediction Detection NN, CaseForecast interest ratePrediction NN, visualLate loan detectionDetection Rule Credit assessment Risk classification Prediction Classification Rule, Case Corporate bond rate ( 公司債 ) Prediction

結束 4-12 Telecom Applications TechniqueApplicationProblem Type Neural net, Rule induction Forecast network behavior. Prediction Rule induction Churn Fraud detection Classification Detection Case basedCall trackingClassification

結束 4-13 Marketing Applications TechniqueApplicationProblem Type Rule induction Market segment Cross-selling Classification Association Rule induction, visual Lifestyle analysis Performance analysis. Classification Association Rule induction, genetic, visual Reaction to promotion Prediction Case basedOnline sales supportClassification

結束 4-14 Web Applications TechniqueApplicationProblem Type Rule induction, Visualization User browsing similarity analysis. Classification, Association Rule-based heuristics Web page content similarity Association

結束 4-15 Other Applications TechniqueApplicationProblem Type Neural netSoftware costDetection Neural net, rule induction Litigation assessmentPrediction Rule induction Insurance fraud Healthcare except. Detection Case based Insurance claim Software quality Prediction Classification Genetic algorithmBudget spendingClassification

結束 4-16 Data Sets Loan Applications  classification Job Applications  classification Insurance Fraud  detection Expenditure Data  prediction

結束 4-17 Loan Data 650 observations OUTCOMES (binary):  On-timecost of error: $300  Late (default)cost of error: $2,000 Variables  Age, Income, Assets, Debts, Want, Credit Credit ordinal  Transform: Assets, Debts, & Want →Risk

結束 4-18 Job Application Data 500 observations OUTCOMES (ordinal):  Unacceptable  Minimal  Acceptable  Excellent Variables  Age, State, Degree, Major, Experience State nominal; degree & major ordinal State is superfluous

結束 4-19 Insurance Claim Data 5000 observations OUTCOMES (binary):  OKcost of error $500  Fraudulentcost of error $2,500 Variables  Age, Gender, Claim, Tickets, Prior claims, Attorney Gender & attorney nominal, tickets & prior claims categorical

結束 4-20 Expenditure Data 10,000 observations OUTCOMES:  Could predict response in a number of categories  Others Variables:  Age, Gender, Marital, Dependents, Income, Job years, Town years, Education years, Drivers license, Own home, Number of credit cards  Churn, proportion of income spent on seven categories