Data Mining Dr. Chang Liu. What is Data Mining Data mining has been known by many different terms Data mining has been known by many different terms Knowledge.

Slides:



Advertisements
Similar presentations
Data Mining (and Machine Learning) With Microsoft Tools Michael Lisin, Plaster Group May 8, 2014.
Advertisements

Data Mining Glen Shih CS157B Section 1 Dr. Sin-Min Lee April 4, 2006.
1. Abstract 2 Introduction Related Work Conclusion References.
DATA MINING CS157A Swathi Rangan. A Brief History of Data Mining The term “Data Mining” was only introduced in the 1990s. Data Mining roots are traced.
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
Data Mining Knowledge Discovery in Databases Data 31.
Data Mining By Archana Ketkar.
Chapter Extension 12 Database Marketing.
Data Mining Concepts 1.1 COT5230 Data Mining Week 1 Data Mining Concepts M O N A S H A U S T R A L I A ’ S I N T E R N A T I O N A L U N I V E R S I T.
Data Mining – Intro.
CS157A Spring 05 Data Mining Professor Sin-Min Lee.
DataMining By Guan Hang Su CS157A section 2 fall 2005.
Data Mining: A Closer Look
Data Mining: A Closer Look Chapter Data Mining Strategies 2.
Chapter 5 Data mining : A Closer Look.
Introduction to Data Mining Data mining is a rapidly growing field of business analytics focused on better understanding of characteristics and.
Data Mining CS 157B Section 2 Keng Teng Lao. Overview Definition of Data Mining Application of Data Mining.
TURKISH STATISTICAL INSTITUTE INFORMATION TECHNOLOGIES DEPARTMENT (Muscat, Oman) DATA MINING.
GUHA method in Data Mining Esko Turunen Tampere University of Technology Tampere, Finland.
Enterprise systems infrastructure and architecture DT211 4
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
1 © Goharian & Grossman 2003 Introduction to Data Mining (CS 422) Fall 2010.
Dr. Awad Khalil Computer Science Department AUC
Knowledge Discovery & Data Mining process of extracting previously unknown, valid, and actionable (understandable) information from large databases Data.
10 Data Mining. What is Data Mining? “Data Mining is the process of selecting, exploring and modeling large amounts of data to uncover previously unknown.
Shilpa Seth.  What is Data Mining What is Data Mining  Applications of Data Mining Applications of Data Mining  KDD Process KDD Process  Architecture.
Data Mining. 2 Models Created by Data Mining Linear Equations Rules Clusters Graphs Tree Structures Recurrent Patterns.
1 Data Mining DT211 4 Refer to Connolly and Begg 4ed.
Business Intelligence, Data Mining and Data Analytics/Predictive Analytics By: Asela Thomason IS 495 Summer 2015.
Intelligent Systems Lecture 23 Introduction to Intelligent Data Analysis (IDA). Example of system for Data Analyzing based on neural networks.
Data Mining Techniques As Tools for Analysis of Customer Behavior
1 An Introduction to Data Mining Hosein Rostani Alireza Zohdi Report 1 for “advance data base” course Supervisor: Dr. Masoud Rahgozar December 2007.
Forecast Anything! The Seven Data Mining Models Andy Cheung ISV Developer Evangelist Microsoft Hong Kong.
3 Objects (Views Synonyms Sequences) 4 PL/SQL blocks 5 Procedures Triggers 6 Enhanced SQL programming 7 SQL &.NET applications 8 OEM DB structure 9 DB.
Using Neural Networks in Database Mining Tino Jimenez CS157B MW 9-10:15 February 19, 2009.
INTRODUCTION TO DATA MINING MIS2502 Data Analytics.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Introduction to SQL Server Data Mining Nick Ward SQL Server & BI Product Specialist Microsoft Australia Nick Ward SQL Server & BI Product Specialist Microsoft.
Banking on Analytics Dr A S Ramasastri Director, IDRBT.
Amer Kanj Data Mining For Business Professionals.
 Fundamentally, data mining is about processing data and identifying patterns and trends in that information so that you can decide or judge.  Data.
CS157B Fall 04 Introduction to Data Mining Chapter 22.3 Professor Lee Yu, Jianji (Joseph)
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
Consul- ting Services Outsour- cing Services Techno- logy Services Local Profes- sional Services Competence Centers Business Intelligence WebTech SAP.
CRM - Data mining Perspective. Predicting Who will Buy Here are five primary issues that organizations need to address to satisfy demanding consumers:
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
DATA MINING By Cecilia Parng CS 157B.
Finding Hidden Intelligence with Predictive Analysis of Data Mining Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd
Chapter 14 Data Mining Transparencies. 2 Chapter Objectives u The concepts associated with data mining. u The main features of data mining operations,
MIS2502: Data Analytics Advanced Analytics - Introduction.
Academic Year 2014 Spring Academic Year 2014 Spring.
Data Mining Copyright KEYSOFT Solutions.
Customer Relationship Management (CRM) Chapter 4 Customer Portfolio Analysis Learning Objectives Why customer portfolio analysis is necessary for CRM implementation.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Miloš Kotlar 2012/115 Single Layer Perceptron Linear Classifier.
Monday, February 22,  The term analytics is often used interchangeably with:  Data science  Data mining  Knowledge discovery  Extracting useful.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
Data Mining With SQL Server Data Tools Mining Data Using Tools You Already Have.
Chapter 3 Building Business Intelligence Chapter 3 DATABASES AND DATA WAREHOUSES Building Business Intelligence 6/22/2016 1Management Information Systems.
Ahmed K. Ezzat, SQL Server 2008 and Data Mining Overview 1 Data Mining and Big Data.
Data Resource Management – MGMT An overview of where we are right now SQL Developer OLAP CUBE 1 Sales Cube Data Warehouse Denormalized Historical.
Data Mining – Introduction (contd…) Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
Data Mining.
MIS2502: Data Analytics Advanced Analytics - Introduction
DATA MINING © Prentice Hall.
כריית נתונים.
Presentation transcript:

Data Mining Dr. Chang Liu

What is Data Mining Data mining has been known by many different terms Data mining has been known by many different terms Knowledge Discovery in Database (KDD)Knowledge Discovery in Database (KDD) Predictive AnalyticsPredictive Analytics Machine LearningMachine Learning Business AnalyticsBusiness Analytics It is the process of finding hidden patterns in data It is the process of finding hidden patterns in data For example, what is the profile of people who buy from us?For example, what is the profile of people who buy from us? Usage of data mining has become widespread recently for various reasons Usage of data mining has become widespread recently for various reasons Typically, businesses find huge increases in profitability as a result of applying data mining Typically, businesses find huge increases in profitability as a result of applying data mining

Some Common Problems Growing business by cross-selling Growing business by cross-selling A retailer can use buying patterns of customers to generate recommendations for new customersA retailer can use buying patterns of customers to generate recommendations for new customers Determine risk of giving a loan to a particular customer Determine risk of giving a loan to a particular customer Profiles of customers who have defaulted in the past are learned and used with new customersProfiles of customers who have defaulted in the past are learned and used with new customers Forecast the likely unemployment level based on its past trend Forecast the likely unemployment level based on its past trend Is a credit card transaction likely to be a fraudulent? Is a credit card transaction likely to be a fraudulent? Is this tumor in a patient’s breast likely malignant? Is this tumor in a patient’s breast likely malignant?

Data Mining Tasks Data mining problems are solved by performing a specific task: Data mining problems are solved by performing a specific task: Given a problem, an analyst should first determine the data mining task that should be performed.Given a problem, an analyst should first determine the data mining task that should be performed. I need to determine whether a customer is likely to default a loan. I can solve this by performing a classification task I need to determine whether a customer is likely to default a loan. I can solve this by performing a classification task There are a number of tasks: There are a number of tasks: ClassificationClassification Association or market basket analysisAssociation or market basket analysis ForecastingForecasting Deviation AnalysisDeviation Analysis Clustering or segmentationClustering or segmentation Sequence analysisSequence analysis RegressionRegression

Data Mining Tasks (cont.) Classification is used to predict which of a few known outcomes a case is likely to be Classification is used to predict which of a few known outcomes a case is likely to be Is this customer likely to default? Has two known outcome “Yes” or “No”Is this customer likely to default? Has two known outcome “Yes” or “No” Association is used to analyze transaction tables and determine which items in the transaction table tend to go together. Example? Association is used to analyze transaction tables and determine which items in the transaction table tend to go together. Example? Forecasting is used to generate new data points in a time series. Example? Forecasting is used to generate new data points in a time series. Example? Deviation analysis is used to determine anomalous data points or outliers Deviation analysis is used to determine anomalous data points or outliers Used by security experts to detect network intrusion attacksUsed by security experts to detect network intrusion attacks Used by insurance companies and credit card companies to detect fraudUsed by insurance companies and credit card companies to detect fraud

Data Mining Tasks (cont.) Clustering or segmentation is used to discover natural grouping in data Clustering or segmentation is used to discover natural grouping in data Sequence analysis discovers sequence patterns in events Sequence analysis discovers sequence patterns in events E.g., purchase of a computer is followed by purchase of a printer, then webcam …E.g., purchase of a computer is followed by purchase of a printer, then webcam … Used by marketing folks to understand and exploit buying habitsUsed by marketing folks to understand and exploit buying habits Used to analyze web clickstream dataUsed to analyze web clickstream data Regression is used to predict numerical values Regression is used to predict numerical values

Data Ming Algorithms Microsoft SSAS provides the following data mining algorithms: Microsoft SSAS provides the following data mining algorithms: Microsoft Decision TreesMicrosoft Decision Trees Microsoft Neural NetworkMicrosoft Neural Network Microsoft Naïve BayesMicrosoft Naïve Bayes Microsoft Association RulesMicrosoft Association Rules Microsoft Time SeriesMicrosoft Time Series Microsoft ClusteringMicrosoft Clustering Microsoft Sequence ClusteringMicrosoft Sequence Clustering Microsoft Linear RegressionMicrosoft Linear Regression Microsoft Logistic RegressionMicrosoft Logistic Regression

Case The thing you are mining or asking questions about is called a case The thing you are mining or asking questions about is called a case The case is often a row in a table; e.g., when studying which customers are likely to default on a loan, each row in the customer table is a caseThe case is often a row in a table; e.g., when studying which customers are likely to default on a loan, each row in the customer table is a case Transaction tables are an example of nested casesTransaction tables are an example of nested cases

Attributes / Case Key Attributes are the variables that are used in the data mining analysis. Attributes are the variables that are used in the data mining analysis. Attributes are often columns in the case tableAttributes are often columns in the case table An attribute can be input or an output An attribute can be input or an output At modeling time, both input and output attributes are providedAt modeling time, both input and output attributes are provided At the prediction time, input attributes are used to predict output attributesAt the prediction time, input attributes are used to predict output attributes Case Key indicates the identity of the case Case Key indicates the identity of the case This is often the primary key or a row indexThis is often the primary key or a row index

Mining Structure / Mining Model A mining structure is a table that contains the columns to be analyzed. It also contains data mining models used to analyze the data. A mining structure is a table that contains the columns to be analyzed. It also contains data mining models used to analyze the data. Mining model defines how the problem is to be modeled. Mining model defines how the problem is to be modeled. Specify which columns to be included in the modelSpecify which columns to be included in the model Specify the algorithm to be usedSpecify the algorithm to be used Define which columns are input and which are outputDefine which columns are input and which are output

Training Models Many data mining algorithms requires historical data to learn patterns from Many data mining algorithms requires historical data to learn patterns from Training the model is also known as processing the model Training the model is also known as processing the model Typically, not all available historical data is used to train the model Typically, not all available historical data is used to train the model A percentage is left for testing purpose. This set is called the testing setA percentage is left for testing purpose. This set is called the testing set The data is used to train the model is called the training setThe data is used to train the model is called the training set

Class Activity_1 High school student historical data – CollegePlan table from DB661 High school student historical data – CollegePlan table from DB661 You are asked to find out what factors influence a high school student to go to college (or not) You are asked to find out what factors influence a high school student to go to college (or not) What data mining task would you perform? What data mining task would you perform? What is the case in this case? What is the case in this case? What is the case key? What is the case key? What algorithm(s) is/are applicable for this task? What algorithm(s) is/are applicable for this task? Which attribute(s) is/are input? Which attribute(s) is/are input? Which attribute(s) is/are output? Which attribute(s) is/are output?

Class Activity_2 Explore vmMSFTYear2008 data in DB661 Explore vmMSFTYear2008 data in DB661 Predict Microsoft stock values in the first week of 2009 (The real data is available at vmMSFTFirstWeek2009) Predict Microsoft stock values in the first week of 2009 (The real data is available at vmMSFTFirstWeek2009) Can you make money from MSFT based on your data mining knowledge? Can you make money from MSFT based on your data mining knowledge?

QUESTIONS??

With a new student table

Results