Dr. Awad Khalil Computer Science Department AUC

Slides:



Advertisements
Similar presentations
1 Chapter 34 Data Mining Transparencies © Pearson Education Limited 1995, 2005.
Advertisements

Data Mining Glen Shih CS157B Section 1 Dr. Sin-Min Lee April 4, 2006.
DATA MINING CS157A Swathi Rangan. A Brief History of Data Mining The term “Data Mining” was only introduced in the 1990s. Data Mining roots are traced.
Week 9 Data Mining System (Knowledge Data Discovery)
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
Data Mining By Archana Ketkar.
Data Mining Adrian Tuhtan CS157A Section1.
Data Mining – Intro.
CS157A Spring 05 Data Mining Professor Sin-Min Lee.
Chapter 5 Data mining : A Closer Look.
Chapter 35 Data Mining Transparencies. 2 Chapter Objectives u The concepts associated with data mining. u The main features of data mining operations,
Data Mining & Data Warehousing PresentedBy: Group 4 Kirk Bishop Joe Draskovich Amber Hottenroth Brandon Lee Stephen Pesavento.
Computer Science Universiteit Maastricht Institute for Knowledge and Agent Technology Data mining and the knowledge discovery process Summer Course 2005.
TURKISH STATISTICAL INSTITUTE INFORMATION TECHNOLOGIES DEPARTMENT (Muscat, Oman) DATA MINING.
GUHA method in Data Mining Esko Turunen Tampere University of Technology Tampere, Finland.
Enterprise systems infrastructure and architecture DT211 4
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
Data Mining : Introduction Chapter 1. 2 Index 1. What is Data Mining? 2. Data Mining Functionalities 1. Characterization and Discrimination 2. MIning.
Knowledge Discovery & Data Mining process of extracting previously unknown, valid, and actionable (understandable) information from large databases Data.
Chapter 5: Data Mining for Business Intelligence
Data Mining Techniques
Shilpa Seth.  What is Data Mining What is Data Mining  Applications of Data Mining Applications of Data Mining  KDD Process KDD Process  Architecture.
Data Mining Dr. Chang Liu. What is Data Mining Data mining has been known by many different terms Data mining has been known by many different terms Knowledge.
1 Data Mining DT211 4 Refer to Connolly and Begg 4ed.
Intelligent Systems Lecture 23 Introduction to Intelligent Data Analysis (IDA). Example of system for Data Analyzing based on neural networks.
Data Mining Techniques As Tools for Analysis of Customer Behavior
Data Mining Chun-Hung Chou
3 Objects (Views Synonyms Sequences) 4 PL/SQL blocks 5 Procedures Triggers 6 Enhanced SQL programming 7 SQL &.NET applications 8 OEM DB structure 9 DB.
The CRISP-DM Process Model
Data Warehouse Fundamentals
Data Mining CS157B Fall 04 Professor Lee By Yanhua Xue.
Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu.
Lecture 9: Knowledge Discovery Systems Md. Mahbubul Alam, PhD Associate Professor Dept. of AEIS Sher-e-Bangla Agricultural University.
INTRODUCTION TO DATA MINING MIS2502 Data Analytics.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Knowledge Discovery and Data Mining Evgueni Smirnov.
DATA MINING 1. 2 Data Mining Extracting or “mining” knowledge from large amounts of data Data mining is the process of autonomously retrieving useful.
Some working definitions…. ‘Data Mining’ and ‘Knowledge Discovery in Databases’ (KDD) are used interchangeably Data mining = –the discovery of interesting,
Data MINING Data mining is the process of extracting previously unknown, valid and actionable information from large data and then using the information.
Fox MIS Spring 2011 Data Mining Week 9 Introduction to Data Mining.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
3-1 Data Mining Kelby Lee. 3-2 Overview ¨ Transaction Database ¨ What is Data Mining ¨ Data Mining Primitives ¨ Data Mining Objectives ¨ Predictive Modeling.
CRM - Data mining Perspective. Predicting Who will Buy Here are five primary issues that organizations need to address to satisfy demanding consumers:
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
Chapter 14 Data Mining Transparencies. 2 Chapter Objectives u The concepts associated with data mining. u The main features of data mining operations,
MIS2502: Data Analytics Advanced Analytics - Introduction.
Data Mining and Decision Support
CHAPTER 8 DATA MINING BASICS.
Academic Year 2014 Spring Academic Year 2014 Spring.
Data Mining Copyright KEYSOFT Solutions.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Knowledge Discovery and Data Mining 19 th Meeting Course Name: Business Intelligence Year: 2009.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
DATA MINING It is a process of extracting interesting(non trivial, implicit, previously, unknown and useful ) information from any data repository. The.
LOAD FORECASTING. - ELECTRICAL LOAD FORECASTING IS THE ESTIMATION FOR FUTURE LOAD BY AN INDUSTRY OR UTILITY COMPANY - IT HAS MANY APPLICATIONS INCLUDING.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
Data Resource Management – MGMT An overview of where we are right now SQL Developer OLAP CUBE 1 Sales Cube Data Warehouse Denormalized Historical.
Data Mining – Introduction (contd…) Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
DATA MINING and VISUALIZATION Instructor: Dr. Matthew Iklé, Adams State University Remote Instructor: Dr. Hong Liu, Embry-Riddle Aeronautical University.
Data Mining Transparencies
Data Mining.
MIS2502: Data Analytics Advanced Analytics - Introduction
DATA MINING © Prentice Hall.
Adrian Tuhtan CS157A Section1
Data Warehousing Data Mining Privacy
Data Mining The process of extracting valid, previously unknown, comprehensible, and actionable information from large databases and using it to make.
Presentation transcript:

Dr. Awad Khalil Computer Science Department AUC Data Mining Dr. Awad Khalil Computer Science Department AUC Data Mining, by Dr. Khalil

Data Mining, by Dr. Khalil Content What and Why Data Mining Data Mining Applications Data Mining Operations & associated Techniques Predictive Modeling Database Segmentation Link Analysis Deviation Detection The Data Mining Process The CRISP-DM Model Data Mining, by Dr. Khalil

What and Why Data Mining? Data Mining is the process of extracting valid, previously unknown, comprehensible, and actionable information from large databases and using it to make crucial business decisions. Data mining is concerned with the analysis of data and the use of software techniques for finding hidden and unexpected patterns and relationships in sets of data. The focus of data mining is to reveal information that is hidden and unexpected. Data mining requires a single, separate, clean, integrated, and self-consistent source of data. A data warehouse is well equipped for providing data for data mining. Data mining can provide huge paybacks for companies who have made a significant investment in data warehousing. Data Mining, by Dr. Khalil

Data Mining Applications Retail/Marketing: Identifying buying patterns of customers Finding associations among customer demographic characteristic Predicting response to mailing companies Market basket analysis Banking: Detecting patterns of fraudulent credit card use Identifying loyal customers Predicting customers likely to change their credit card affiliation Determining credit card spending by customer groups Insurance: Claims analysis Predicting which customers will buy new policies Medicine: Characterizing patient behavior to predict surgery visits Identifying successful medical therapies for different illnesses Data Mining, by Dr. Khalil

Data Mining Operations & Associated Techniques Predictive Modeling: Classification Value prediction Database Segmentation: Demographic clustering Neural clustering Link Analysis: Associate discovery Sequential pattern discovery Similar time sequence discovery Deviation Detection: Statistics Visualization Data Mining, by Dr. Khalil

Data Mining, by Dr. Khalil Predictive Modeling Predictive Modeling is similar to the human learning experience in using observations to form a model of the important characteristics of some phenomenon. This approach uses generalization of the “real world” and the ability to fit new data into a general framework. Predictive modeling can be used to analyze an existing database to determine some essential characteristics (model) about the data set. Applications of predictive modeling include customer retention management, credit approval, cross-selling, and direct marketing. There are two techniques associated with predictive modeling: classification and value prediction. Data Mining, by Dr. Khalil

Data Mining, by Dr. Khalil Classification Classification is used to establish a specific predetermined class for each record in a database from a finite set of possible class values. There are two specializations of classification: Tree induction; Neural induction. Data Mining, by Dr. Khalil

Classification – Tree Induction In the shown example, we are interested in predicting who is currently renting property is likely to be interested in buying property. A predictive model has determined that only two variables are of interest: the length of time the customer has rented property and the age of the customer. The decision tree presents the analysis in an intuitive way. The model predicts that those customers who have rented for more than two years and are over 25 years old are the most likely to be interested in buying property Data Mining, by Dr. Khalil

Classification – Neural Network A Neural Network contains collections of connected nodes with input, output, and processing at each node. Between the visible input and output layers may be a number of hidden processing layers. Each processing unit (circle) in one layer is connected to each processing unit in the next layer by a weighted value, expressing the strength of the relationship. The network attempts to mirror the way the human brain works in processing patterns by arithmetically combining all the variables associated with a given data point. In this way, it is possible to develop nonlinear predictive models that “learn” by studying combinations of variables and how different combinations of variables affect different data sets. Data Mining, by Dr. Khalil

Data Mining, by Dr. Khalil Value Prediction Value prediction is used to estimate a continuous numeric value that is associated with a database record. This technique uses the traditional statistical techniques of linear regression and nonlinear regression. Linear regression attempts to fit a straight line through a plot of the data, such that the line is the best representation of the average of all observations at that point in the plot. Linear regression works well with linear data and is sensitive to the presence of outliers (that is, data values which do not conform to the expected norm). Although nonlinear regression avoids the main problems of linear regression, it is still not flexible enough to handle all possible shapes of the data plot. Applications of value prediction include credit card fraud detection and target mailing list identification. Data Mining, by Dr. Khalil

Database Segmentation The aim of database segmentation is to partition a database into an unknown number of segments, or clusters, of similar records, that is, records that share a number of properties and so are considered to be homogeneous. This approach uses unsupervised learning to discover homogeneous sub-populations in a database to improve the accuracy of the profiles. Database segmentation is less precise than other operations and is therefore less sensitive to redundant and irrelevant features. Applications of database segmentation include customer profiling, direct marketing, and cross-selling. Database segmentation is associated with demographic or neural clustering techniques, which are distinguished by the allowable data inputs, the methods used to calculate the distance between records, and the presentation of the resulting segments for analysis. Data Mining, by Dr. Khalil

Data Mining, by Dr. Khalil Link Analysis Link analysis aims to establish links, called associations, between the individual records, or sets of records, in a database. There are three specializations of link analysis: Association discovery: finds items that imply the presence of other items in the same event. These affinities between items are represented by association rules. For example “when a customer rents a property for more than two years and is more than 25 years old, in 40% of cases, the customer will buy a property. This association happens in 35% of all customers who rent properties.” Sequential pattern discovery: finds patterns between events such that the presence of one set of items is followed by another set of items in a database of events over a period of time. For example, this approach can be used to understand long-term customer buying behavior. Similar time sequence discovery: is used, for example, in the discovery of links between two sets of data that are time-dependent, and is based on the degree of similarity between the patterns that both time series demonstrate, For example, within three months of buying property, new home owners will purchase goods such as cookers, freezers, and washing machines. Applications of link analysis include product affinity analysis, direct marketing, and stock price movement. Data Mining, by Dr. Khalil

Data Mining, by Dr. Khalil Deviation Detection Deviation detection is a relatively new technique in terms of commercially available data mining tools. It identifies outliers, which express deviation from some previously known expectation and norm. This operation can be performed using statistics and visualization techniques. For example, linear regression facilitates the identification of outliers in data while modern visualization techniques display summaries and graphical representations that make deviations easy to detect. Applications of deviation detection include fraud detection in the use of credit cards and insurance claims, quality control, and defects tracing. Data Mining, by Dr. Khalil

The Data Mining Process In 1996 a consortium of vendors and users developed a specification called the Cross Industry Standard Process for Data Mining (CRISP-DM). CRISP-DM specifies a data mining process that is not specific to any particular industry or tool. CRISP-DM has evolved from the knowledge Discovery processes used widely in industry and in direct response to user requirements. The major aims of CRISP-DM are make large data mining projects run more efficiently as well as to make them cheaper, more reliable, and more manageable. Data Mining, by Dr. Khalil

Data Mining, by Dr. Khalil The CRISP-DM Model The CRISP-DM methodology is a hierarchical process model. At the top level, the process is divided into six different generic phases, ranging from business understanding to deployment of project results. The next level elaborates each of these phases as comprising several generic tasks. At this level, the description is generic enough to cover all the DM scenarios. The third level specializes these tasks for specific situations. For example, the generic task might be cleaning data, and the specialized task could be cleaning of numeric or categorical values. The fourth level is the process instance, that is, a record of actions, decisions, and result of an actual execution of a DM project. The model also discusses relationships between different DM tasks. Data Mining, by Dr. Khalil

Data Mining, by Dr. Khalil The CRISP-DM Phases Business understanding – determine business objectives, assess situation, determine data mining goal; and produce a project plan. Data understanding – collect initial data, describe data; explore data; and verify data quality. Data preparation – select data, clean data, construct data, integrate data, and format data. Modeling – select modeling technique, generate test design, build model, and assess model. Evaluation – evaluate results, review process, and determine next step. Deployment – plan deployment, plan monitoring and maintenance, produce final report, and review report. Data Mining, by Dr. Khalil

Data Mining, by Dr. Khalil Data Mining Tools There are a growing number of commercial data mining tools on the marketplace. The important features of data mining tools include: Data preparation Selection of data mining operations (algorithms) Product scalability and performance Facilities for understanding results Data Mining, by Dr. Khalil

Data Mining, by Dr. Khalil Thank you Finally, I would like to emphasize that our students receive an education that is very similar to that of students from comparable institutions in the US. They secure excellent jobs in competitive local, regional and international markets, and many are able to pursue further graduate studies at leading institutions in the US and Europe. Maintaining the high quality of our programs in the school will continue to be an objective, and the future for these programs looks bright and full of promise. Then, “Let knowledge grow from more to more but more of reverence in us dwell.” Data Mining, by Dr. Khalil