Data Mining.

Slides:



Advertisements
Similar presentations
An Introduction to Data Mining
Advertisements

Dr. Tahar Kechadi Dr. Joe Carthy
Data Mining By Archana Ketkar.
Data Mining Adrian Tuhtan CS157A Section1.
Data Mining – Intro.
Data Mining: A Closer Look
Introduction to Data Mining Data mining is a rapidly growing field of business analytics focused on better understanding of characteristics and.
Data Mining.
Business Intelligence
CIT 858: Data Mining and Data Warehousing Course Instructor: Bajuna Salehe Web:
TURKISH STATISTICAL INSTITUTE INFORMATION TECHNOLOGIES DEPARTMENT (Muscat, Oman) DATA MINING.
GUHA method in Data Mining Esko Turunen Tampere University of Technology Tampere, Finland.
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Data Mining : Introduction Chapter 1. 2 Index 1. What is Data Mining? 2. Data Mining Functionalities 1. Characterization and Discrimination 2. MIning.
Dr. Awad Khalil Computer Science Department AUC
MAKING THE BUSINESS BETTER Presented By Mohammed Dwikat DATA MINING Presented to Faculty of IT MIS Department An Najah National University.
10 Data Mining. What is Data Mining? “Data Mining is the process of selecting, exploring and modeling large amounts of data to uncover previously unknown.
Shilpa Seth.  What is Data Mining What is Data Mining  Applications of Data Mining Applications of Data Mining  KDD Process KDD Process  Architecture.
Data Mining Techniques As Tools for Analysis of Customer Behavior
Data Mining Chun-Hung Chou
1 An Introduction to Data Mining Hosein Rostani Alireza Zohdi Report 1 for “advance data base” course Supervisor: Dr. Masoud Rahgozar December 2007.
Data Mining Techniques As Tools for Analysis of Customer Behavior Lecture 2:
3 Objects (Views Synonyms Sequences) 4 PL/SQL blocks 5 Procedures Triggers 6 Enhanced SQL programming 7 SQL &.NET applications 8 OEM DB structure 9 DB.
Chapter 1 Introduction to Data Mining
Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu.
INTRODUCTION TO DATA MINING MIS2502 Data Analytics.
Knowledge Discovery and Data Mining Evgueni Smirnov.
DATA MINING 1. 2 Data Mining Extracting or “mining” knowledge from large amounts of data Data mining is the process of autonomously retrieving useful.
Banking on Analytics Dr A S Ramasastri Director, IDRBT.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
CRM - Data mining Perspective. Predicting Who will Buy Here are five primary issues that organizations need to address to satisfy demanding consumers:
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
MIS2502: Data Analytics Advanced Analytics - Introduction.
Conclusions. Why Data Mining? -- Potential Applications Database analysis and decision support – Market analysis and management target marketing, customer.
Academic Year 2014 Spring Academic Year 2014 Spring.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
DATA MINING It is a process of extracting interesting(non trivial, implicit, previously, unknown and useful ) information from any data repository. The.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
Chapter 3 Building Business Intelligence Chapter 3 DATABASES AND DATA WAREHOUSES Building Business Intelligence 6/22/2016 1Management Information Systems.
Data Mining – Introduction (contd…) Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
CS570: Data Mining Spring 2010, TT 1 – 2:15pm Li Xiong.
GROUP 2. GROUP MEMBERSSEXREG NUMBER 1.LAWRENCE TEBANDEKEMale2013/BIT/013 2.MUSA TEBANDEKEMale2013/BIT/014 3.AFUA ANKWASAFemale2013/BIT/020/PS 4.AGGREY.
Prepared by Fayes Salma.  Introduction: Financial Tasks  Data Mining process  Methods in Financial Data mining o Neural Network o Decision Tree  Trading.
Data Mining Functionalities
Data Mining – Intro.
SNS COLLEGE OF TECHNOLOGY
Data Mining: Introduction
MIS2502: Data Analytics Advanced Analytics - Introduction
DATA MINING © Prentice Hall.
MIS 451 Building Business Intelligence Systems
Introduction C.Eng 714 Spring 2010.
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Chapter 3 Introduction to Data Mining
Introduction to Data Mining
Adrian Tuhtan CS157A Section1
Data Mining: Concepts and Techniques Course Outline
Sangeeta Devadiga CS 157B, Spring 2007
Data Analysis.
Data Warehousing and Data Mining
Data Science introduction.
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Data Mining Concepts and Techniques
Data Mining Techniques As Tools for Analysis of Customer Behavior
Data Mining: Concepts and Techniques
Data Mining.
Data Mining: Concepts and Techniques
Kenneth C. Laudon & Jane P. Laudon
Presentation transcript:

Data Mining

What Is Data Mining? Data mining is the principle of extracting the information from large amounts of data. In other words… Data mining (knowledge discovery from data) Extraction of interesting patterns or knowledge from huge amount of data Other names Knowledge discovery (mining) in databases (KDD), knowledge extraction, data/pattern analysis etc.

Data Mining Data Mining is principle of extracting the information from the large amount of data. In other words, we can say that data mining is mining the knowledge from data.

Need of Data Mining There is huge amount of data available in Information Industry.  Analysing this huge amount of data and extracting useful information from it is necessary. In other words, In field of Information technology, we have huge amount of data available that need to be turned into useful information.

Knowledge Discovery/Data Mining Process Here is the list of steps involved in knowledge discovery process: Data Cleaning - In this step the noise and inconsistent data is removed. Data Integration - In this step multiple data sources are combined.( It merges the data from multiple heterogeneous data sources into a coherent data store.) Data Selection - In this step relevant to the analysis task are retrieved from the database.

Data Transformation - In this step data are transformed or consolidated into forms appropriate for mining by performing summary or aggregation operations. Data Mining - In this step intelligent methods are applied in order to extract data patterns. Pattern Evaluation - In this step, data patterns are evaluated. Knowledge Presentation - In this step, knowledge is represented

Knowledge Discovery (KDD) Process Data mining—core of knowledge discovery process Pattern Evaluation Data Mining Task-relevant Data Selection Data Warehouse Data Cleaning Data Integration Databases

Introduction Data mining is the process of analyzing large databases to find useful patterns (data or info.)

Data Mining Tasks(Techniques) Data Mining deals with what kind of data to be mined. There are two kind of functions involved in Data Mining, that are : Descriptive Classification and Prediction

Data Mining Models and Tasks

Descriptive The descriptive function deals with general properties of data in the database. Here is the list of descriptive functions: Class/Concept Description Mining of Frequent Patterns Mining of Associations Mining of Correlations Mining of Clusters

Class/Concepts Description Class/Concepts refers the data to be associated with classes or concepts.  For example, in a company classes of items for sale include computer and printers, and concepts of customers include big spenders and budget spenders. Such descriptions of a class or a concept are called class/concept descriptions.   

Ways of Class/Concepts Description Characterization: provides a concise and succinct summarization of the given collection of data Example: The characteristics of customers who spend more than $1000 a year at All Electronics. The result can be a general profile such as age, employment status or credit ratings.

Ways of Class/Concepts Description Characterization: provides a concise and succinct summarization of the given collection of data Comparison: provides descriptions comparing two or more collections of data. Example: The user may like to compare the general features of software products whose sales increased by 10% in the last year with those whose sales decreased by about 30% in the same duration.

Mining of Frequent Patterns As the name suggests patterns that occur frequently in data. It describes the specific pattern within the data.

Mining of Association/ co-relations Association Analysis: from marketing perspective, determining which items are frequently purchased together within the same transaction. Example: An example is mined from the (some store) All Electronic transactional database. buys (X, “Computers”)  buys (X, “software”) [Support = 1%, confidence = 90% ] X represents customer confidence = 90% , if a customer buys a computer there is a 50% chance that he/she will buy software as well. Support = 1%, means that 1% of all the transactions under analysis showed that computer and software were purchased together.

Cont..  Confidence indicates the number of times the if/then statements have been found to be true. Support is an indication of how frequently the items appear in the database. 

Association rules Association is a data mining function that discovers the probability of the co- occurrence of items in a collection. The relationships between co-occurring items are expressed as association rules or co- relations. Note: In data mining, association rules are useful for analysing and predicting customer behavior.

Mining of Clusters Cluster refers to a group of similar kind of objects.  Cluster analysis refers to forming group of objects that are very similar to each other but are highly different from the objects in other clusters. The goal is to place records into groups where the records in a group are highly similar to each other and dissimilar to records in other groups.

Cont.. Example: A bank may cluster its customers into several groups based on the similarities of their age, income, residence, etc. and the common characteristics of the customers in a group can be used to describe that group of customers. the clusters will help the bank to understand its costumers better and thus provide more suitable products and customized services.

Cluster Analysis

Predictive functions: Classification Regression Outlier Analysis (Deviation Detection) Evolution Analysis

Classification Classification is the process of learning a model that is able to describe different classes of data. Classification model can be represented in various forms such as A decision tree

Tree Structure:

Cont.. Customer renting property> 2 years??? Customer age> 25 years??? Rent property Rent property Buy property

Age? Income?? Class C Class B Class A

Regression (Prediction) Regression is a data mining function that predicts a number. Age, weight, distance, temperature, income, or sales could all be predicted using regression techniques. For example, a regression model could be used to predict children's height, given their age, weight, and other factors.

Cont.. Regression modeling has many applications in business planning, financial forecasting, time series prediction, environmental modeling

Outlier Analysis  Outlier Analysis - The Outliers may be defined as the data objects that do not comply with general behaviour or model of the data available. Such data objects, which are grossly different from or inconsistent with the remaining set of data, are called outliers. The outliers may be of particular interest, such as in the case of fraud detection, where outliers may indicate fraudulent activity. Thus, outlier detection and analysis is an interesting data mining task, referred to as outlier mining or outlier analysis.

Deviation detection Discovering the most significant changes in data A data object that deviates significantly from the normal objects as if it were generated by a different mechanism Deviation detection are different from the noise data Noise is random error or variance in a measured variable Noise should be removed before outlier detection Applications: Credit card fraud detection

Data visualization Data visualization: using graphical methods to show patterns in data. Data visualization systems help users examine large volumes of data and detect patterns visually Can visually encode large amounts of information on a single screen

Evolution Analysis  Evolution Analysis - Evolution Analysis refers to description and model regularities or trends for objects whose behaviour changes over time. Example Stock market predictions: future stock prices

Cont.. Example: Time-series data. If the stock market data (time-series) of the last several years available from the New York Stock exchange and one would like to invest in shares of high tech industrial companies. A data mining study of stock exchange data may identify stock evolution regularities for overall stocks and for the stocks of particular companies. Such regularities may help predict future trends in stock market prices, contributing to one’s decision making regarding stock investments.

Ex: Time Series Analysis Example: Stock Market Predict future values Determine similar patterns over time Classify behavior © Prentice Hall

Applications This information further can be used for various applications such as market analysis, fraud detection, customer retention, production control, science exploration etc.

Market Analysis and Management Customer Profiling - Data Mining helps to determine what kind of people buy what kind of products. Identifying Customer Requirements - Data Mining helps in identifying the best products for different customers. It uses prediction to find the factors that may attract new customers. Cross Market Analysis - Data Mining performs Association/correlations between product sales.

Target Marketing - Data Mining helps to find clusters of model customers who share the same characteristics such as interest, spending habits, income etc. Determining Customer purchasing pattern - Data mining helps in determining customer purchasing pattern. Providing Summary Information - Data Mining provide us various multidimensional summary report

Fraud Detection Data Mining is also used in fields of credit card services and telecommunication to detect fraud. In fraud telephone call it helps to find destination of call, duration of call, time of day or week. 

Corporate Analysis & Risk Management Finance Planning and Asset Evaluation - It involves cash flow analysis and prediction, contingent claim analysis to evaluate assets. Resource Planning - Resource Planning It involves summarizing and comparing the resources and spending. Competition - It involves monitoring competitors and market directions.

ADVANTAGES OF DATA MINING Marketing/Retailing: Data mining can aid direct marketers by providing them with useful and accurate trends about their customers’ purchasing behavior. Banking/Crediting: Data mining can assist financial institutions in areas such as credit reporting and loan information.     Researchers: Data mining can assist researchers by speeding up their data analyzing process; thus, allowing them more time to work on other projects.

DISADVANTAGES OF DATA MINING Security issues: Although companies have a lot of personal information about us available online, they do not have sufficient security systems in place to protect that information.  Misuse of information: Some of the company will answer your phone based on your purchase history. If you have spent a lot of money or buying a lot of product from one company, your call will be answered really soon. So you should not think that your call is really being answer in the order in which it was receive.

Thanks……