1 Data Mining DT211 4 Refer to Connolly and Begg 4ed.

Slides:



Advertisements
Similar presentations
1 Chapter 34 Data Mining Transparencies © Pearson Education Limited 1995, 2005.
Advertisements

Data Mining Glen Shih CS157B Section 1 Dr. Sin-Min Lee April 4, 2006.
Week 9 Data Mining System (Knowledge Data Discovery)
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
Manajemen Basis Data Pertemuan 8 Matakuliah: M0264/Manajemen Basis Data Tahun: 2008.
Data Mining.
Data Mining By Archana Ketkar.
Data Mining Adrian Tuhtan CS157A Section1.
Data Mining – Intro.
CS157A Spring 05 Data Mining Professor Sin-Min Lee.
Data Mining: A Closer Look
Data Mining: A Closer Look Chapter Data Mining Strategies 2.
Chapter 5 Data mining : A Closer Look.
Chapter 35 Data Mining Transparencies. 2 Chapter Objectives u The concepts associated with data mining. u The main features of data mining operations,
TURKISH STATISTICAL INSTITUTE INFORMATION TECHNOLOGIES DEPARTMENT (Muscat, Oman) DATA MINING.
Enterprise systems infrastructure and architecture DT211 4
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
Dr. Awad Khalil Computer Science Department AUC
Knowledge Discovery & Data Mining process of extracting previously unknown, valid, and actionable (understandable) information from large databases Data.
Chapter 5: Data Mining for Business Intelligence
MAKING THE BUSINESS BETTER Presented By Mohammed Dwikat DATA MINING Presented to Faculty of IT MIS Department An Najah National University.
Shilpa Seth.  What is Data Mining What is Data Mining  Applications of Data Mining Applications of Data Mining  KDD Process KDD Process  Architecture.
Data Mining Dr. Chang Liu. What is Data Mining Data mining has been known by many different terms Data mining has been known by many different terms Knowledge.
Business Intelligence, Data Mining and Data Analytics/Predictive Analytics By: Asela Thomason IS 495 Summer 2015.
Data Mining Techniques As Tools for Analysis of Customer Behavior
Data Mining Chun-Hung Chou
Understanding Data Analytics and Data Mining Introduction.
Chapter 7 DATA, TEXT, AND WEB MINING Pages , 311, Sections 7.3, 7.5, 7.6.
3 Objects (Views Synonyms Sequences) 4 PL/SQL blocks 5 Procedures Triggers 6 Enhanced SQL programming 7 SQL &.NET applications 8 OEM DB structure 9 DB.
Introduction To Data Mining. What Is Data Mining? A toolA tool Extraction of interesting (non-trivial, implicit, previously unknown and potentially useful)
Chapter 11 LEARNING FROM DATA. Chapter 11: Learning From Data Outline  The “Learning” Concept  Data Visualization  Neural Networks The Basics Supervised.
Data Warehouse Fundamentals
Using Neural Networks in Database Mining Tino Jimenez CS157B MW 9-10:15 February 19, 2009.
Data Mining CS157B Fall 04 Professor Lee By Yanhua Xue.
Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Data MINING Data mining is the process of extracting previously unknown, valid and actionable information from large data and then using the information.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Guest Lecture Introduction to Data Mining Dr. Bhavani Thuraisingham September 17, 2010.
CS157B Fall 04 Introduction to Data Mining Chapter 22.3 Professor Lee Yu, Jianji (Joseph)
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
3-1 Data Mining Kelby Lee. 3-2 Overview ¨ Transaction Database ¨ What is Data Mining ¨ Data Mining Primitives ¨ Data Mining Objectives ¨ Predictive Modeling.
CRM - Data mining Perspective. Predicting Who will Buy Here are five primary issues that organizations need to address to satisfy demanding consumers:
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
Chapter 14 Data Mining Transparencies. 2 Chapter Objectives u The concepts associated with data mining. u The main features of data mining operations,
MIS2502: Data Analytics Advanced Analytics - Introduction.
Data Mining and Decision Support
Academic Year 2014 Spring Academic Year 2014 Spring.
Data Mining Copyright KEYSOFT Solutions.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Knowledge Discovery and Data Mining 19 th Meeting Course Name: Business Intelligence Year: 2009.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
DATA MINING It is a process of extracting interesting(non trivial, implicit, previously, unknown and useful ) information from any data repository. The.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
Data Resource Management – MGMT An overview of where we are right now SQL Developer OLAP CUBE 1 Sales Cube Data Warehouse Denormalized Historical.
Data Mining – Introduction (contd…) Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
DATA MINING and VISUALIZATION Instructor: Dr. Matthew Iklé, Adams State University Remote Instructor: Dr. Hong Liu, Embry-Riddle Aeronautical University.
Data Mining Transparencies
Data Mining.
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining – Intro.
MIS2502: Data Analytics Advanced Analytics - Introduction
DATA MINING © Prentice Hall.
Data and Applications Security Introduction to Data Mining
Adrian Tuhtan CS157A Section1
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining The process of extracting valid, previously unknown, comprehensible, and actionable information from large databases and using it to make.
Presentation transcript:

1 Data Mining DT211 4 Refer to Connolly and Begg 4ed

2 Data Mining u The process of extracting valid, previously unknown, comprehensible, and actionable information from large databases and using it to make crucial business decisions, (Simoudis,1996). u Involves the analysis of data and the use of software techniques for finding hidden and unexpected patterns and relationships in sets of data.

3 Data Mining u Reveals information that is hidden and unexpected, as little value in finding patterns and relationships that are already intuitive. u Patterns and relationships are identified by examining the underlying rules and features in the data.

4 Data Mining u Tends to work from the data up and most accurate results normally require large volumes of data to deliver reliable conclusions. u Starts by developing an optimal representation of structure of sample data, during which time knowledge is acquired and extended to larger sets of data.

5 Data Mining u Data mining can provide huge paybacks for companies who have made a significant investment in data warehousing. u Relatively new technology, however already used in a number of industries.

6 Examples of Applications of Data Mining u Retail / Marketing –Identifying buying patterns of customers –Finding associations among customer demographic characteristics –Predicting response to mailing campaigns –Market basket analysis

7 Examples of Applications of Data Mining u Banking –Detecting patterns of fraudulent credit card use –Identifying loyal customers –Predicting customers likely to change their credit card affiliation –Determining credit card spending by customer groups

8 Examples of Applications of Data Mining u Insurance –Claims analysis –Predicting which customers will buy new policies u Medicine –Characterizing patient behavior to predict surgery visits –Identifying successful medical therapies for different illnesses

9 Data Mining Operations u Four main operations include: –Predictive modeling –Database segmentation –Link analysis –Deviation detection u There are recognized associations between the applications and the corresponding operations. –e.g. Direct marketing strategies use database segmentation.

10 Data Mining Techniques u Techniques are specific implementations of the data mining operations. u Each operation has its own strengths and weaknesses.

11 Data Mining Techniques u Data mining tools sometimes offer a choice of operations to implement a technique. u Criteria for selection of tool includes –Suitability for certain input data types –Transparency of the mining output –Tolerance of missing variable values –Level of accuracy possible –Ability to handle large volumes of data

12 Data Mining Operations and Associated Techniques

13 Predictive Modeling u Similar to the human learning experience –uses observations to form a model of the important characteristics of some phenomenon. u Uses generalizations of ‘real world’ and ability to fit new data into a general framework. u Can analyze a database to determine essential characteristics (model) about the data set.

14 Predictive Modeling u Model is developed using a supervised learning approach, which has two phases: training and testing. –Training builds a model using a large sample of historical data called a training set. –Testing involves trying out the model on new, previously unseen data to determine its accuracy and physical performance characteristics.

15 Predictive Modeling u Applications of predictive modeling include customer retention management, credit approval, cross selling, and direct marketing. u There are two techniques associated with predictive modeling: classification and value prediction, which are distinguished by the nature of the variable being predicted.

16 Predictive Modeling - Classification u Used to establish a specific predetermined class for each record in a database from a finite set of possible, class values. u Two specializations of classification: tree induction and neural induction.

17 Example of Classification using Tree Induction

18 Predictive Modeling - Value Prediction u Used to estimate a continuous numeric value that is associated with a database record. u Uses the traditional statistical techniques of linear regression and nonlinear regression. u Relatively easy-to-use and understand.

19 Predictive Modeling - Value Prediction u Linear regression attempts to fit a straight line through a plot of the data, such that the line is the best representation of the average of all observations at that point in the plot. u Problem is that the technique only works well with linear data and is sensitive to the presence of outliers (that is, data values, which do not conform to the expected norm).

20 Predictive Modeling - Value Prediction u Although nonlinear regression avoids the main problems of linear regression, it is still not flexible enough to handle all possible shapes of the data plot. u Statistical measurements are fine for building linear models that describe predictable data points, however, most data is not linear in nature.

21 Predictive Modeling - Value Prediction u Data mining requires statistical methods that can accommodate non-linearity, outliers, and non-numeric data. u Applications of value prediction include credit card fraud detection or target mailing list identification.

22 Database Segmentation u Aim is to partition a database into an unknown number of segments, or clusters, of similar records. u Uses unsupervised learning to discover homogeneous sub-populations in a database to improve the accuracy of the profiles.

23 Database Segmentation u Less precise than other operations thus less sensitive to redundant and irrelevant features. u Sensitivity can be reduced by ignoring a subset of the attributes that describe each instance or by assigning a weighting factor to each variable. u Applications of database segmentation include customer profiling, direct marketing, and cross selling.

24 Example of Database Segmentation using a Scatterplot (see page 1237): 2 different sets of forgeries…

25 Database Segmentation u Associated with demographic or neural clustering techniques, which are distinguished by –Allowable data inputs –Methods used to calculate the distance between records –Presentation of the resulting segments for analysis

26 Link Analysis u Aims to establish links (associations) between records, or sets of records, in a database. u There are three specializations –Associations discovery –Sequential pattern discovery –Similar time sequence discovery u Applications include product affinity analysis, direct marketing, and stock price movement.

27 Link Analysis - Associations Discovery u Finds items that imply the presence of other items in the same event. u Affinities between items are represented by association rules. –e.g. ‘When a customer rents property for more than 2 years and is more than 25 years old, in 40% of cases, the customer will buy a property. This association happens in 35% of all customers who rent properties’.

28 Link Analysis - Sequential Pattern Discovery u Finds patterns between events such that the presence of one set of items is followed by another set of items in a database of events over a period of time. –e.g. Used to understand long term customer buying behavior.

29 Link Analysis - Similar Time Sequence Discovery u Finds links between two sets of data that are time-dependent, and is based on the degree of similarity between the patterns that both time series demonstrate. –e.g. Within three months of buying property, new home owners will purchase goods such as cookers, freezers, and washing machines.

30 Deviation Detection (do not include this method) u Relatively new operation in terms of commercially available data mining tools. u Often a source of true discovery because it identifies outliers, which express deviation from some previously known expectation and norm.

31 Deviation Detection u Can be performed using statistics and visualization techniques or as a by-product of data mining. u Applications include fraud detection in the use of credit cards and insurance claims, quality control, and defects tracing.

32 Data Mining and Data Warehousing u Major challenge to exploit data mining is identifying suitable data to mine. u Data mining requires single, separate, clean, integrated, and self-consistent source of data.

33 Data Mining and Data Warehousing u A data warehouse is well equipped for providing data for mining. u Data quality and consistency is a pre-requisite for mining to ensure the accuracy of the predictive models. Data warehouses are populated with clean, consistent data.

34 Data Mining and Data Warehousing u It is advantageous to mine data from multiple sources to discover as many interrelationships as possible. Data warehouses contain data from a number of sources. u Selecting the relevant subsets of records and fields for data mining requires the query capabilities of the data warehouse.

35 Data Mining and Data Warehousing u The results of a data mining study are useful if there is some way to further investigate the uncovered patterns. Data warehouses provide the capability to go back to the data source.

36 Sample types questions u “Data Mining is one of the most essential information technologies to aid strategic formulation” Discuss the validity of this statement. u u Discuss, how different data mining types operations can generate meaningful information for the enterprise.