Overview of Methods Data mining techniques What techniques do, examples, advantages & disadvantages.

Slides:



Advertisements
Similar presentations
Introduction to Data Mining with XLMiner
Advertisements

Data Mining: A Closer Look Chapter Data Mining Strategies.
Chapter 9 Business Intelligence Systems
DATA MINING CS157A Swathi Rangan. A Brief History of Data Mining The term “Data Mining” was only introduced in the 1990s. Data Mining roots are traced.
© 2003 The McGraw-Hill Companies, Inc. All rights reserved. Working with Financial Statements Chapter Three.
Data Mining Knowledge Discovery in Databases Data 31.
© 2003 The McGraw-Hill Companies, Inc. All rights reserved. Interest Rates and Bond Valuation Chapter Seven.
Data Mining Adrian Tuhtan CS157A Section1.
Data Mining – Intro.
CS157A Spring 05 Data Mining Professor Sin-Min Lee.
Data Mining: A Closer Look Chapter Data Mining Strategies 2.
Enterprise systems infrastructure and architecture DT211 4
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
Dr. Awad Khalil Computer Science Department AUC
Data Mining By Jason Baltazar, Phil Cademas, Jillian Latham, Rachel Peeler & Kamila Singh.
Data Mining. 2 Models Created by Data Mining Linear Equations Rules Clusters Graphs Tree Structures Recurrent Patterns.
Chapter 1: Introduction to Predictive Modeling 1.1 Applications 1.2 Generalization 1.3 JMP Predictive Modeling Platforms.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Data Mining Chun-Hung Chou
Enabling Organization-Decision Making
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
Data Mining Joyeeta Dutta-Moscato July 10, Wherever we have large amounts of data, we have the need for building systems capable of learning information.
Chapter 13 Genetic Algorithms. 2 Data Mining Techniques So Far… Chapter 5 – Statistics Chapter 6 – Decision Trees Chapter 7 – Neural Networks Chapter.
Using Neural Networks in Database Mining Tino Jimenez CS157B MW 9-10:15 February 19, 2009.
Overview of Data Mining Methods Data mining techniques What techniques do, examples, advantages & disadvantages.
Chapter 6 Regression Algorithms in Data Mining
Lecture 9: Knowledge Discovery Systems Md. Mahbubul Alam, PhD Associate Professor Dept. of AEIS Sher-e-Bangla Agricultural University.
More value from data using Data Mining Allan Mitchell SQL Server MVP.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Chapter 9 – Classification and Regression Trees
Chapter 7 Neural Networks in Data Mining Automatic Model Building (Machine Learning) Artificial Intelligence.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Neural Networks Automatic Model Building (Machine Learning) Artificial Intelligence.
Copyright © 2009 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Financial Statement Analysis K R Subramanyam John J Wild.
1 Knowledge Discovery Transparencies prepared by Ho Tu Bao [JAIST] ITCS 6162.
Amer Kanj Data Mining For Business Professionals.
Regression Models Fit data Time-series data: Forecast Other data: Predict.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
CS157B Fall 04 Introduction to Data Mining Chapter 22.3 Professor Lee Yu, Jianji (Joseph)
Data Mining In contrast to the traditional (reactive) DSS tools, the data mining premise is proactive. Data mining tools automatically search the data.
CRM - Data mining Perspective. Predicting Who will Buy Here are five primary issues that organizations need to address to satisfy demanding consumers:
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
1 STAT 5814 Statistical Data Mining. 2 Use of SAS Data Mining.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
Customer Relationship Management (CRM) Chapter 4 Customer Portfolio Analysis Learning Objectives Why customer portfolio analysis is necessary for CRM implementation.
An Introduction Student Name: Riaz Ahmad Program: MSIT( ) Subject: Data warehouse & Data Mining.
Data Mining and Decision Support
Decision Tree Algorithms Rule Based Suitable for automatic generation.
Data Mining Copyright KEYSOFT Solutions.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Miloš Kotlar 2012/115 Single Layer Perceptron Linear Classifier.
Clustering Algorithms Minimize distance But to Centers of Groups.
Monday, February 22,  The term analytics is often used interchangeably with:  Data science  Data mining  Knowledge discovery  Extracting useful.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Classification Tree Interaction Detection. Use of decision trees Segmentation Stratification Prediction Data reduction and variable screening Interaction.
Data Resource Management – MGMT An overview of where we are right now SQL Developer OLAP CUBE 1 Sales Cube Data Warehouse Denormalized Historical.
DATA MINING and VISUALIZATION Instructor: Dr. Matthew Iklé, Adams State University Remote Instructor: Dr. Hong Liu, Embry-Riddle Aeronautical University.
Business Intelligence and Decision Support Systems (9 th Ed., Prentice Hall) Chapter 6: Artificial Neural Networks for Data Mining.
Prepared by Fayes Salma.  Introduction: Financial Tasks  Data Mining process  Methods in Financial Data mining o Neural Network o Decision Tree  Trading.
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
DATA MINING © Prentice Hall.
David L. Olson Department of Management University of Nebraska
RESEARCH APPROACH.
Adrian Tuhtan CS157A Section1
Gerd Kortemeyer, William F. Punch
כריית נתונים.
Supporting End-User Access
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Welcome! Knowledge Discovery and Data Mining
Presentation transcript:

Overview of Methods Data mining techniques What techniques do, examples, advantages & disadvantages

McGraw-Hill/Irwin©2007 The McGraw-Hill Companies, Inc. All rights reserved 4-2 History Statistics AI: –genetic algorithms, neural networks analogies with biology –memory-based reasoning –link analysis from graph theory

McGraw-Hill/Irwin©2007 The McGraw-Hill Companies, Inc. All rights reserved 4-3 Techniques Statistical –Market-Basket Analysis - find groups of items –Memory-Based Reasoning - case based –Cluster Detection - undirected (quantitative MBA) Artificial Intelligence –Link Analysis - MCI’s Friends & Family –Decision Trees, Rule Induction - production rule –Neural Networks - automatic pattern detection –Genetic Algorithms - keep best parameters

McGraw-Hill/Irwin©2007 The McGraw-Hill Companies, Inc. All rights reserved 4-4 Models Regression:Y = a + bX Classification:assign new record to class Predictive:assign value to new record Clustering:groups for data Time-series:assign future value Links:patterns in data

McGraw-Hill/Irwin©2007 The McGraw-Hill Companies, Inc. All rights reserved 4-5 Fitting Underfitting: not enough detail –leave out important variables Overfitting: too much detail –memorizes training set, but doesn’t help with new data data set too small redundancy in data

McGraw-Hill/Irwin©2007 The McGraw-Hill Companies, Inc. All rights reserved 4-6 Comparison of Features RulesNeural NetCaseBaseGenetic Noisy dataGoodVery goodGoodVery good Missing dataGood Very goodGood Large setsVery goodPoorGood Different typesGoodNumericalVery goodTransform AccuracyHighVery highHigh ExplanationVery goodPoorVery goodGood IntegrationGood Very good EaseEasyDifficultEasyDifficult

McGraw-Hill/Irwin©2007 The McGraw-Hill Companies, Inc. All rights reserved 4-7 Data Mining Functions Classification –Identify categories in data Prediction –Formula to predict future observations Association –Rules using relationships among entities Detection –Anomalies & irregularities (fraud detection)

McGraw-Hill/Irwin©2007 The McGraw-Hill Companies, Inc. All rights reserved 4-8 Financial Applications TechniqueApplicationProblem Type Neural netForecast stock pricePrediction NN, RuleForecast bankruptcy Fraud detection Prediction Detection NN, CaseForecast interest ratePrediction NN, visualLate loan detectionDetection RuleCredit assessment Risk classification Prediction Classification Rule, CaseCorporate bond ratePrediction

McGraw-Hill/Irwin©2007 The McGraw-Hill Companies, Inc. All rights reserved 4-9 Telecom Applications TechniqueApplicationProblem Type Neural net, Rule induct Forecast network behav. Prediction Rule inductChurn Fraud detection Classification Detection Case basedCall trackingClassification

McGraw-Hill/Irwin©2007 The McGraw-Hill Companies, Inc. All rights reserved 4-10 Marketing Applications TechniqueApplicationProblem Type Rule inductMarket segment Cross-selling Classification Association Rule induct, visualLifestyle analysis Performance analy. Classification Association Rule induct, genetic, visual Reaction to promotion Prediction Case basedOnline sales supportClassification

McGraw-Hill/Irwin©2007 The McGraw-Hill Companies, Inc. All rights reserved 4-11 Web Applications TechniqueApplicationProblem Type Rule induct, Visualization User browsing similarity analy. Classification, Association Rule-based heuristics Web page content similarity Association

McGraw-Hill/Irwin©2007 The McGraw-Hill Companies, Inc. All rights reserved 4-12 Other Applications TechniqueApplicationProblem Type Neural netSoftware costDetection Neural net, rule induct Litigation assessment Prediction Rule inductInsurance fraud Healthcare except. Detection Case basedInsurance claim Software quality Prediction Classification Genetic algor.Budget spendingClassification

McGraw-Hill/Irwin©2007 The McGraw-Hill Companies, Inc. All rights reserved 4-13 Data Sets Loan Applications –classification Job Applications –classification Insurance Fraud –detection Expenditure Data –prediction

McGraw-Hill/Irwin©2007 The McGraw-Hill Companies, Inc. All rights reserved 4-14 Loan Data 650 observations OUTCOMES (binary): –On-timecost of error: $300 –Late (default)cost of error: $2,000 Variables –Age, Income, Assets, Debts, Want, Credit Credit ordinal –Transform: Assets, Debts, & Want →Risk

McGraw-Hill/Irwin©2007 The McGraw-Hill Companies, Inc. All rights reserved 4-15 Job Application Data 500 observations OUTCOMES (ordinal): –Unacceptable –Minimal –Acceptable –Excellent Variables –Age, State, Degree, Major, Experience State nominal; degree & major ordinal State is superfluous

McGraw-Hill/Irwin©2007 The McGraw-Hill Companies, Inc. All rights reserved 4-16 Insurance Claim Data 5000 observations OUTCOMES (binary): –OKcost of error $500 –Fraudulentcost of error $2,500 Variables –Age, Gender, Claim, Tickets, Prior claims, Attorney Gender & attorney nominal, tickets & prior claims categorical

McGraw-Hill/Irwin©2007 The McGraw-Hill Companies, Inc. All rights reserved 4-17 Expenditure Data 10,000 observations OUTCOMES: –Could predict response in a number of categories –Others Variables: –Age, Gender, Marital, Dependents, Income, Job years, Town years, Education years, Drivers license, Own home, Number of credit cards –Churn, proportion of income spent on seven categories