Data Mining Concepts 1.1 COT5230 Data Mining Week 1 Data Mining Concepts M O N A S H A U S T R A L I A ’ S I N T E R N A T I O N A L U N I V E R S I T.

Slides:



Advertisements
Similar presentations
Data Warehousing and Data Mining J. G. Zheng May 20 th 2008 MIS Chapter 3.
Advertisements

An Introduction to Data Mining
Supporting End-User Access
OLAP Tuning. Outline OLAP 101 – Data warehouse architecture – ROLAP, MOLAP and HOLAP Data Cube – Star Schema and operations – The CUBE operator – Tuning.
Chapter 18: Data Analysis and Mining Kat Powell. Chapter 18: Data Analysis and Mining ➔ Decision Support Systems ➔ Data Analysis and OLAP ➔ Data Warehousing.
Data Mining Glen Shih CS157B Section 1 Dr. Sin-Min Lee April 4, 2006.
Data Mining Sangeeta Devadiga CS 157B, Spring 2007.
Chapter 9 Business Intelligence Systems
DATA MINING CS157A Swathi Rangan. A Brief History of Data Mining The term “Data Mining” was only introduced in the 1990s. Data Mining roots are traced.
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Introduction to Data Mining Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential.
Week 9 Data Mining System (Knowledge Data Discovery)
Data Mining By Archana Ketkar.
Data Mining – Intro.
CS157A Spring 05 Data Mining Professor Sin-Min Lee.
DASHBOARDS Dashboard provides the managers with exactly the information they need in the correct format at the correct time. BI systems are the foundation.
Data Mining: A Closer Look
Chapter 5 Data mining : A Closer Look.
Introduction to Data Mining Data mining is a rapidly growing field of business analytics focused on better understanding of characteristics and.
Business Intelligence
Enterprise systems infrastructure and architecture DT211 4
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Data Mining : Introduction Chapter 1. 2 Index 1. What is Data Mining? 2. Data Mining Functionalities 1. Characterization and Discrimination 2. MIning.
『 Data Mining 』 By Jung, hae-sun. 1.Introduction 2.Definition 3.Data Mining Applications 4.Data Mining Tasks 5. Overview of the System 6. Data Mining.
Chapter 5: Data Mining for Business Intelligence
Business Intelligence. Topics Chart Online Analytical Process, OLAP – Excel’s Pivot table – Data visualization with dashboard Data warehousing Data Mining.
Data Mining Techniques
Shilpa Seth.  What is Data Mining What is Data Mining  Applications of Data Mining Applications of Data Mining  KDD Process KDD Process  Architecture.
Data Mining. 2 Models Created by Data Mining Linear Equations Rules Clusters Graphs Tree Structures Recurrent Patterns.
Data Mining Dr. Chang Liu. What is Data Mining Data mining has been known by many different terms Data mining has been known by many different terms Knowledge.
1 Data Mining DT211 4 Refer to Connolly and Begg 4ed.
Data Mining Solutions (Westphal & Blaxton, 1998) Dr. K. Palaniappan Dept. of Computer Engineering & Computer Science, UMC.
Intelligent Systems Lecture 23 Introduction to Intelligent Data Analysis (IDA). Example of system for Data Analyzing based on neural networks.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Data Mining Techniques As Tools for Analysis of Customer Behavior
1 Some Issues Concerning Data Mining Muhammad Ali Yousuf ITM (Based on Notes by David Squire, Monash University)
Data Mining Chun-Hung Chou
Understanding Data Analytics and Data Mining Introduction.
Introduction: The essential background
INTRODUCTION TO DATA MINING MIS2502 Data Analytics.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Some working definitions…. ‘Data Mining’ and ‘Knowledge Discovery in Databases’ (KDD) are used interchangeably Data mining = –the discovery of interesting,
Data MINING Data mining is the process of extracting previously unknown, valid and actionable information from large data and then using the information.
 Fundamentally, data mining is about processing data and identifying patterns and trends in that information so that you can decide or judge.  Data.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
CS157B Fall 04 Introduction to Data Mining Chapter 22.3 Professor Lee Yu, Jianji (Joseph)
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
3-1 Data Mining Kelby Lee. 3-2 Overview ¨ Transaction Database ¨ What is Data Mining ¨ Data Mining Primitives ¨ Data Mining Objectives ¨ Predictive Modeling.
CRM - Data mining Perspective. Predicting Who will Buy Here are five primary issues that organizations need to address to satisfy demanding consumers:
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
Business Intelligence. Topics Chart Online Analytical Process, OLAP – Excel’s Pivot table – Data visualization with dashboard Scenario Management Data.
Chapter 14 Data Mining Transparencies. 2 Chapter Objectives u The concepts associated with data mining. u The main features of data mining operations,
MIS2502: Data Analytics Advanced Analytics - Introduction.
An Introduction Student Name: Riaz Ahmad Program: MSIT( ) Subject: Data warehouse & Data Mining.
Data Mining and Decision Support
Academic Year 2014 Spring Academic Year 2014 Spring.
Knowledge Discovery and Data Mining 19 th Meeting Course Name: Business Intelligence Year: 2009.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
Data Resource Management – MGMT An overview of where we are right now SQL Developer OLAP CUBE 1 Sales Cube Data Warehouse Denormalized Historical.
Data Mining.
Data Mining – Intro.
MIS2502: Data Analytics Advanced Analytics - Introduction
DATA MINING © Prentice Hall.
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Data Mining: Concepts and Techniques Course Outline
Data Analysis.
Supporting End-User Access
Data Warehousing Data Mining Privacy
Presentation transcript:

Data Mining Concepts 1.1 COT5230 Data Mining Week 1 Data Mining Concepts M O N A S H A U S T R A L I A ’ S I N T E R N A T I O N A L U N I V E R S I T Y

Data Mining Concepts 1.2 A Definition of Data Mining Use of analytical tools to discover knowledge in a collection of data The knowledge takes the form of patterns, relationships and facts which would not otherwise be immediately apparent These analytical tools may be drawn from a number of disciplines, which include: »machine learning »pattern recognition »machine discovery »statistics »artificial intelligence »human-computer interaction »information visualization

Data Mining Concepts 1.3 Data Mining Why has the area appeared? –Large volumes of data stored by organizations in a competitive environment combined with advances in technologies which can be applied to the data Background and evolution –The failure of traditional approaches The need for Data Mining –Niche marketing, customer retention, the internet The means to implement Data Mining –The data warehouse, the available computing power, effective modeling approaches

Data Mining Concepts 1.4 A Case Study - Data Preparation (Cabena et al. page 106) Health Insurance Commission Australia –550Gb online; 1300Gb in 5 year history DB –Aim to prevent fraud and inappropriate practice –Considered 6.8 million visits requesting up to 20 pathology tests and 17,000 doctors –Descriptive variables were added to the GP records –Records were pivoted to create separate records for each pathology test –Records were then aggregated by provider number (GP) –An association discovery operation was carried out

Data Mining Concepts 1.5 An Association Rule The Rule –When a customer buys a shirt, in 70% of cases, he or she will also buy a tie –The Confidence Factor is 70% The Support Factor –This occurs in 13.5% of all purchases –The Support Factor is 13.5%

Data Mining Concepts 1.6 A Case Study - Modeling and Analysis –Rules with a confidence factor greater than 50% were considered –The software Intelligent Miner (IBM) was used –The level of support was gradually reduced »i.e. the number of records to which the rule applied was reduced –Rules considered to be noise were excluded. –Domain knowledge indicated that some tests should be excluded and more useful rules were revealed –GP profiling was carried out –The new segments were related back to existing classifications of GPs –Some rules corresponded to expensive tests that could be substituted

Data Mining Concepts 1.7 Episodes DatabaseGP Database Rules 1% support If test A then test B will occur in 62% of cases Segment 1 Segment 2 97 GPs 206 GPs Score = 1.8 Score = 2.7 Data Preparation Merge Association DiscoveryDatabase Segmentation

Data Mining Concepts 1.8 Data Mining for Business Decision Support (From Berry & Linoff 1997) Identify the business problem Use data mining techniques to transform the data into actionable information Act on information Measure the results

Data Mining Concepts 1.9 The Process of Knowledge Discovery Pre-processing –data selection –cleaning –coding Data Mining –select a model –apply the model Analysis of results and assimilation –Take action and measure the results

Data Mining Concepts 1.10 The Process of Knowledge Discovery DataCleaning & Enrichment CodingData mining Reporting selection -domain consistency - clustering - segmentation -de-duplication - prediction -disambiguation Information Requirement Action Feedback Operational dataExternal data The Knowledge Discovery in Databases (KDD) process (Adriens/Zantinge)

Data Mining Concepts 1.11 Data Selection Identify the relevant data, both internal and external to the organization Select the subset of the data appropriate for the particular data mining application Store the data in a database separate from the operational systems

Data Mining Concepts 1.12 Data Preprocessing Cleaning –Domain consistency: replace certain values with null –De-duplication: customers are often added to the DB on each purchase transaction –Disambiguation: highlighting ambiguities for a decision by the user »e.g. if names differed slightly but addresses were the same Enrichment –Additional fields are added to records from external sources which may be vital in establishing relationships. Coding »e.g. take addresses and replace them with regional codes »e.g. transform birth dates into age ranges –It is often necessary to convert continuous data into range data for categorization purposes.

Data Mining Concepts 1.13 Data Mining Preliminary Analysis –Much interesting information can be found by querying the data set –May be supported by a visualization of the data set. Choose a one or more modeling approaches There are two styles of data mining –Hypothesis testing –Knowledge discovery The styles and approaches are not mutually exclusive

Data Mining Concepts 1.14 Data Mining Tasks Various taxonomies exist. Berry & Linoff define 6 tasks: »Classification »Estimation »Prediction »Affinity Grouping »Clustering »Description The tasks are also referred to as operations. Cabena et al define 4 operations: »Predictive Modeling »Database Segmentation »Link Analysis »Deviation Detection

Data Mining Concepts 1.15 Classification Classification involves considering the features of some object then assigning it it to some pre- defined class, for example: –Spotting fraudulent insurance claims –Which phone numbers are fax numbers –Which customers are high-value

Data Mining Concepts 1.16 Estimation Estimation deals with numerically valued outcomes rather than discrete categories as occurs in classification. –Estimating the number of children in a family –Estimating family income

Data Mining Concepts 1.17 Prediction Essentially the same as classification and estimation but involves future behaviour Historical data is used to build a model explaining behaviour (outputs) for known inputs The model developed is then applied to current inputs to predict future outputs –Predict which customers will respond to a promotion –Classifying loan applications

Data Mining Concepts 1.18 Affinity Grouping Affinity grouping is also referred to as Market Basket Analysis A common example is which items are bought together at the supermarket. Once this is known, decisions can be made on, for example: –how to arrange items on the shelves –which items should be promoted together

Data Mining Concepts 1.19 Clustering Clustering is also sometimes referred to as segmentation (though this has other meanings in other fields) In clustering there are no pre-defined classes. Self-similarity is used to group records. The user must attach meaning to the clusters formed Clustering often precedes some other data mining task, for example: –once customers are separated into clusters, a promotion might be carried out based on market basket analysis of the resulting cluster

Data Mining Concepts 1.20 Description A good description of data can provide understanding of behaviour The description of the behaviour can suggest an explanation for it as well Statistical measures can be useful in describing data, as can techniques that generate rules

Data Mining Concepts 1.21 Deviation Detection Records whose attributes deviate from the norm by significant amounts are also called outliers Application areas include: –fraud detection –quality control –tracing defects. Visualization techniques and statistical techniques are useful in finding outliers A cluster which contains only a few records may in fact represent outliers

Data Mining Concepts 1.22 Data Mining Techniques –Query tools –Decision Trees –Memory-Based Reasoning –Artificial Neural Networks –Genetic Algorithms –Association and sequence detection –Statistical Techniques –Visualization –Others (Logistic regression,Generalized Additive Models (GAM), Multivariate Adaptive Regression Splines (MARS), K Means Clustering,...)

Data Mining Concepts 1.23 Data Mining and the Data Warehouse Organizations realized that they had large amounts of data stored (especially of transactions) but it was not easily accessible The data warehouse provides a convenient data source for data mining. Some data cleaning has usually occurred. It exists independently of the operational systems –Data is retrieved rather than updated –Indexed for efficient retrieval –Data will often cover 5 to 10 years A data warehouse is not a pre-requisite for data mining

Data Mining Concepts 1.24 Data Mining and OLAP Online Analytic Processing (OLAP) Tools that allow a powerful and efficient representation of the data Makes use of a representation known as a cube A cube can be sliced and diced OLAP provide reporting with aggregation and summary information but does not reveal patterns, which is the purpose of data mining